Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 79 discussion

Exam question from Amazon's AWS Certified Data Analytics - Specialty

Question #: 79
Topic #: 1

[All AWS Certified Data Analytics - Specialty Questions]

A company has collected more than 100 TB of log files in the last 24 months. The files are stored as raw text in a dedicated Amazon S3 bucket. Each object has a key of the form year-month-day_log_HHmmss.txt where HHmmss represents the time the log file was initially created. A table was created in Amazon Athena that points to the S3 bucket. One-time queries are run against a subset of columns in the table several times an hour.
A data analyst must make changes to reduce the cost of running these queries. Management wants a solution with minimal maintenance overhead.
Which combination of steps should the data analyst take to meet these requirements? (Choose three.)

A. Convert the log files to Apace Avro format.
B. Add a key prefix of the form date=year-month-day/ to the S3 objects to partition the data.
C. Convert the log files to Apache Parquet format.
D. Add a key prefix of the form year-month-day/ to the S3 objects to partition the data.
E. Drop and recreate the table with the PARTITIONED BY clause. Run the ALTER TABLE ADD PARTITION statement.
F. Drop and recreate the table with the PARTITIONED BY clause. Run the MSCK REPAIR TABLE statement.

Show Suggested Answer

Suggested Answer: BCF 🗳️

by VikG12 at May 3, 2021, 5:35 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

Heer

Highly Voted 3 years, 6 months ago

ANSWER:B,C,F OPTION B: Add a key prefix of the form date=year-month-day/ to the S3 objects to partition the data. EXPLAINATION: Your Amazon S3 bucket can support 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per partitioned prefix.So with every partition prefix we get additional support and that is why it is wise to add prefix especially when we have large set of data . LINK:https://aws.amazon.com/premiumsupport/knowledge-center/s3-object-key-naming-pattern/ OPTION C:Convert the log files to Apache Parquet format. EXPLAINATION:Parquet format is columnar based and which improves your query performance when done for Athena . OPTION:F:Drop and recreate the table with the PARTITIONED BY clause. Run the MSCK REPAIR TABLE statement. EXPLAINATION:MSCK REPAIR TABLE compares the partitions in the table metadata and the partitions in S3. If new partitions are present in the S3 location that you specified when you created the table, it adds those partitions to the metadata and to the Athena table.

upvoted 30 times

lakeswimmer

3 years, 4 months ago

Agree - B C F if it was case of removing partitions - D would have been better MSCK REPAIR TABLE only adds partitions to metadata; it does not remove them. To remove partitions from metadata after the partitions have been manually deleted in Amazon S3, run the command ALTER TABLE table-name DROP PARTITION. For more information see ALTER TABLE DROP PARTITION.

upvoted 2 times

...

riyamalin

2 years, 7 months ago

Agree - Answer is BCF

upvoted 1 times

...

Dr_Kiko

Highly Voted 3 years, 5 months ago

Details why B and not D https://docs.aws.amazon.com/athena/latest/ug/partitions.html

upvoted 7 times

...

pk349

Most Recent 1 year, 12 months ago

BCF: I passed the test

upvoted 1 times

...

rocky48

2 years, 9 months ago

Selected Answer: BCF

ANSWER: B,C,F

upvoted 1 times

...

ru4aws

2 years, 9 months ago

Selected Answer: BCF

initially went with BCE but E is wrong as Athena favors "=" hive style partition year=2021/month=01/day=26/ with MSCK repair for non-hive partition style data/2021/01/26/ have to use ALTER TABLE

upvoted 2 times

...

aws2019

3 years, 5 months ago

ans is B,C,F

upvoted 1 times

...

iconara

3 years, 5 months ago

B, C, F is the answer. C, D, E is a valid solution, but would in this case be more work. MSCK REPAIR TABLE scans through the directories on S3 to find partitions, but requires Hive-style partitioning, i. e. date=Y-M-D. You should almost never rely on MSCK REPAIR TABLE, it’s extremely inefficient, but the docs are full of examples using it so an exam would be too. The real way to do this is to use a Y-M-D/ partitioning scheme and partition projection.

upvoted 1 times

...

gunjan4392

3 years, 6 months ago

Can anyone please explain why B and not D? I understand C&F.

upvoted 1 times

Bmaster

3 years, 6 months ago

If you use a prefix of the form 'date=2020-11-11/' , you can use Athena to filter by date field. https://www.mikulskibartosz.name/partitioning-s3-data-by-date/

upvoted 2 times

mickies9

3 years, 6 months ago

Can you please explain why do you need "date=" as a prefix?

upvoted 1 times

Merrick

2 years, 3 months ago

Query possible like "..date >= 2020-11-11"

upvoted 1 times

...

ay12

3 years ago

https://docs.aws.amazon.com/athena/latest/ug/partitions.html

upvoted 1 times

...

VikG12

3 years, 7 months ago

B, C, F

upvoted 5 times

...

Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 79 discussion

Comments

Heer

lakeswimmer

riyamalin

Dr_Kiko

pk349

rocky48

ru4aws

aws2019

iconara

gunjan4392

Bmaster

mickies9

Merrick

ay12

VikG12

SY0-701