exam questions

Exam AWS Certified Data Engineer - Associate DEA-C01 All Questions

View all questions & answers for the AWS Certified Data Engineer - Associate DEA-C01 exam

Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 36 discussion

A data engineer runs Amazon Athena queries on data that is in an Amazon S3 bucket. The Athena queries use AWS Glue Data Catalog as a metadata table.
The data engineer notices that the Athena query plans are experiencing a performance bottleneck. The data engineer determines that the cause of the performance bottleneck is the large number of partitions that are in the S3 bucket. The data engineer must resolve the performance bottleneck and reduce Athena query planning time.
Which solutions will meet these requirements? (Choose two.)

  • A. Create an AWS Glue partition index. Enable partition filtering.
  • B. Bucket the data based on a column that the data have in common in a WHERE clause of the user query.
  • C. Use Athena partition projection based on the S3 bucket prefix.
  • D. Transform the data that is in the S3 bucket to Apache Parquet format.
  • E. Use the Amazon EMR S3DistCP utility to combine smaller objects in the S3 bucket into larger objects.
Show Suggested Answer Hide Answer
Suggested Answer: AC 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
rralucard_
Highly Voted 8 months, 4 weeks ago
Selected Answer: AC
https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/ Optimizing Partition Processing using partition projection Processing partition information can be a bottleneck for Athena queries when you have a very large number of partitions and aren’t using AWS Glue partition indexing. You can use partition projection in Athena to speed up query processing of highly partitioned tables and automate partition management. Partition projection helps minimize this overhead by allowing you to query partitions by calculating partition information rather than retrieving it from a metastore. It eliminates the need to add partitions’ metadata to the AWS Glue table.
upvoted 7 times
...
Mahidbdwh
Most Recent 2 months, 2 weeks ago
Selected Answer: AC
Bucketing not address the problem of having a large number of partitions in the metadata, which is the root cause of the query planning bottleneck. Converting to a columnar format like Apache Parquet will not directly reduce the overhead associated with managing a large number of partitions. Combining small objects will not mitigate the planning overhead that comes from a large number of partitions in the data catalog. Hence A and C
upvoted 2 times
...
SMALLAM
3 months, 2 weeks ago
Selected Answer: AE
https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/
upvoted 1 times
...
pypelyncar
4 months, 2 weeks ago
Selected Answer: AC
Creating an AWS Glue partition index and enabling partition filtering can significantly improve query performance when dealing with large datasets with many partitions. The partition index allows Athena to quickly identify the relevant partitions for a query, reducing the time spent scanning unnecessary data. Partition filtering further optimizes the query by only scanning the partitions that match the filter conditions. Athena partition projection based on the S3 bucket prefix is another effective technique to improve query performance. By leveraging the bucket prefix structure, Athena can prune partitions that are not relevant to the query, reducing the amount of data that needs to be scanned and processed. This approach is particularly useful when the data is organized in a hierarchical structure within the S3 bucket.
upvoted 1 times
...
VerRi
5 months, 1 week ago
Selected Answer: AC
D is not correct because the issue is related to partitioning.
upvoted 1 times
...
HunkyBunky
5 months, 3 weeks ago
Selected Answer: AC
I guess A / C, beucase we faced with - query plans performance bottleneck, so indexing should be improved
upvoted 1 times
...
khchan123
6 months ago
A. Creating an AWS Glue partition index and enabling partition filtering can help improve query performance by allowing Athena to prune unnecessary partitions from the query plan. This can reduce the number of partitions that need to be scanned, resulting in faster query planning times. C. Athena partition projection allows you to define a partition scheme based on the S3 bucket prefix. This can help reduce the number of partitions that need to be scanned, as Athena can use the prefix to determine which partitions are relevant to the query. This can also help improve query performance and reduce planning times.
upvoted 2 times
...
okechi
6 months, 2 weeks ago
The right answer is BD
upvoted 1 times
...
Christina666
6 months, 2 weeks ago
Selected Answer: AD
A. Create an AWS Glue partition index. Enable partition filtering. Targeted Optimization: Partition indexes within the Glue Data Catalog help Athena efficiently identify the relevant partitions, significantly reducing query planning time. Partition filtering further refines the search during query execution. D. Transform the data that is in the S3 bucket to Apache Parquet format. Efficient Columnar Format: Parquet's columnar storage and built-in metadata often allow Athena to skip over large portions of data irrelevant to the query, leading to faster query planning and execution.
upvoted 3 times
...
fceb2c1
7 months ago
Selected Answer: AC
Keyword: Athena query planning time See explanation in the link: https://www.myexamcollection.com/Data-Engineer-Associate-vce-questions.htm B & D are related to analytical queries performance, not about "query planning" performance.
upvoted 4 times
...
ottarg
7 months, 2 weeks ago
Just finished the exam and I went with AD. I agree with GiorgioGss, but the reason why I picked A over C was becaues the table is already using Glue catalog. If we use the indexes, there's no reason to use C as we already have the partitions indexed. No reason to pick B if we have C selected. Thus I picked D with this to optimize the query e.g. if I'm only selecting a subset of the columns.
upvoted 2 times
...
GiorgioGss
7 months, 2 weeks ago
Strange questions.... it can be ABCD
upvoted 1 times
...
rralucard_
8 months, 4 weeks ago
If your table stored in an AWS Glue Data Catalog has tens and hundreds of thousands and millions of partitions, you can enable partition indexes on the table. With partition indexes, only the metadata for the partition value in the query’s filter is retrieved from the catalog instead of retrieving all the partitions’ metadata. The result is faster queries for such highly partitioned tables. The following table compares query runtimes between a partitioned table with no partition indexing and with partition indexing. The table contains approximately 100,000 partitions and uncompressed text data. The orders table is partitioned by the o_custkey column.
upvoted 1 times
...
[Removed]
9 months, 1 week ago
Selected Answer: BD
https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago