exam questions

Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 79 discussion

A company has collected more than 100 TB of log files in the last 24 months. The files are stored as raw text in a dedicated Amazon S3 bucket. Each object has a key of the form year-month-day_log_HHmmss.txt where HHmmss represents the time the log file was initially created. A table was created in Amazon Athena that points to the S3 bucket. One-time queries are run against a subset of columns in the table several times an hour.
A data analyst must make changes to reduce the cost of running these queries. Management wants a solution with minimal maintenance overhead.
Which combination of steps should the data analyst take to meet these requirements? (Choose three.)

  • A. Convert the log files to Apace Avro format.
  • B. Add a key prefix of the form date=year-month-day/ to the S3 objects to partition the data.
  • C. Convert the log files to Apache Parquet format.
  • D. Add a key prefix of the form year-month-day/ to the S3 objects to partition the data.
  • E. Drop and recreate the table with the PARTITIONED BY clause. Run the ALTER TABLE ADD PARTITION statement.
  • F. Drop and recreate the table with the PARTITIONED BY clause. Run the MSCK REPAIR TABLE statement.
Show Suggested Answer Hide Answer
Suggested Answer: BCF 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Heer
Highly Voted 3 years, 6 months ago
ANSWER:B,C,F OPTION B: Add a key prefix of the form date=year-month-day/ to the S3 objects to partition the data. EXPLAINATION: Your Amazon S3 bucket can support 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per partitioned prefix.So with every partition prefix we get additional support and that is why it is wise to add prefix especially when we have large set of data . LINK:https://aws.amazon.com/premiumsupport/knowledge-center/s3-object-key-naming-pattern/ OPTION C:Convert the log files to Apache Parquet format. EXPLAINATION:Parquet format is columnar based and which improves your query performance when done for Athena . OPTION:F:Drop and recreate the table with the PARTITIONED BY clause. Run the MSCK REPAIR TABLE statement. EXPLAINATION:MSCK REPAIR TABLE compares the partitions in the table metadata and the partitions in S3. If new partitions are present in the S3 location that you specified when you created the table, it adds those partitions to the metadata and to the Athena table.
upvoted 30 times
lakeswimmer
3 years, 4 months ago
Agree - B C F if it was case of removing partitions - D would have been better MSCK REPAIR TABLE only adds partitions to metadata; it does not remove them. To remove partitions from metadata after the partitions have been manually deleted in Amazon S3, run the command ALTER TABLE table-name DROP PARTITION. For more information see ALTER TABLE DROP PARTITION.
upvoted 2 times
...
riyamalin
2 years, 7 months ago
Agree - Answer is BCF
upvoted 1 times
...
...
Dr_Kiko
Highly Voted 3 years, 5 months ago
Details why B and not D https://docs.aws.amazon.com/athena/latest/ug/partitions.html
upvoted 7 times
...
pk349
Most Recent 1 year, 12 months ago
BCF: I passed the test
upvoted 1 times
...
rocky48
2 years, 9 months ago
Selected Answer: BCF
ANSWER: B,C,F
upvoted 1 times
...
ru4aws
2 years, 9 months ago
Selected Answer: BCF
initially went with BCE but E is wrong as Athena favors "=" hive style partition year=2021/month=01/day=26/ with MSCK repair for non-hive partition style data/2021/01/26/ have to use ALTER TABLE
upvoted 2 times
...
aws2019
3 years, 5 months ago
ans is B,C,F
upvoted 1 times
...
iconara
3 years, 5 months ago
B, C, F is the answer. C, D, E is a valid solution, but would in this case be more work. MSCK REPAIR TABLE scans through the directories on S3 to find partitions, but requires Hive-style partitioning, i. e. date=Y-M-D. You should almost never rely on MSCK REPAIR TABLE, it’s extremely inefficient, but the docs are full of examples using it so an exam would be too. The real way to do this is to use a Y-M-D/ partitioning scheme and partition projection.
upvoted 1 times
...
gunjan4392
3 years, 6 months ago
Can anyone please explain why B and not D? I understand C&F.
upvoted 1 times
Bmaster
3 years, 6 months ago
If you use a prefix of the form 'date=2020-11-11/' , you can use Athena to filter by date field. https://www.mikulskibartosz.name/partitioning-s3-data-by-date/
upvoted 2 times
mickies9
3 years, 6 months ago
Can you please explain why do you need "date=" as a prefix?
upvoted 1 times
Merrick
2 years, 3 months ago
Query possible like "..date >= 2020-11-11"
upvoted 1 times
...
ay12
3 years ago
https://docs.aws.amazon.com/athena/latest/ug/partitions.html
upvoted 1 times
...
...
...
...
VikG12
3 years, 7 months ago
B, C, F
upvoted 5 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago