Exam AWS Certified Data Engineer - Associate DEA-C01 All Questions

View all questions & answers for the AWS Certified Data Engineer - Associate DEA-C01 exam

Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 36 discussion

Exam question from Amazon's AWS Certified Data Engineer - Associate DEA-C01

Question #: 36
Topic #: 1

[All AWS Certified Data Engineer - Associate DEA-C01 Questions]

A data engineer runs Amazon Athena queries on data that is in an Amazon S3 bucket. The Athena queries use AWS Glue Data Catalog as a metadata table.
The data engineer notices that the Athena query plans are experiencing a performance bottleneck. The data engineer determines that the cause of the performance bottleneck is the large number of partitions that are in the S3 bucket. The data engineer must resolve the performance bottleneck and reduce Athena query planning time.
Which solutions will meet these requirements? (Choose two.)

A. Create an AWS Glue partition index. Enable partition filtering.
B. Bucket the data based on a column that the data have in common in a WHERE clause of the user query.
C. Use Athena partition projection based on the S3 bucket prefix.
D. Transform the data that is in the S3 bucket to Apache Parquet format.
E. Use the Amazon EMR S3DistCP utility to combine smaller objects in the S3 bucket into larger objects.

Show Suggested Answer

Suggested Answer: AC 🗳️

by [deleted] at Jan. 21, 2024, 2:25 a.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

rralucard_

Highly Voted 8 months, 4 weeks ago

Selected Answer: AC

https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/ Optimizing Partition Processing using partition projection Processing partition information can be a bottleneck for Athena queries when you have a very large number of partitions and aren’t using AWS Glue partition indexing. You can use partition projection in Athena to speed up query processing of highly partitioned tables and automate partition management. Partition projection helps minimize this overhead by allowing you to query partitions by calculating partition information rather than retrieving it from a metastore. It eliminates the need to add partitions’ metadata to the AWS Glue table.

upvoted 7 times

...

Mahidbdwh

Most Recent 2 months, 2 weeks ago

Selected Answer: AC

Bucketing not address the problem of having a large number of partitions in the metadata, which is the root cause of the query planning bottleneck. Converting to a columnar format like Apache Parquet will not directly reduce the overhead associated with managing a large number of partitions. Combining small objects will not mitigate the planning overhead that comes from a large number of partitions in the data catalog. Hence A and C

upvoted 2 times

...

SMALLAM

3 months, 2 weeks ago

Selected Answer: AE

https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/

upvoted 1 times

...

pypelyncar

4 months, 2 weeks ago

Selected Answer: AC

Creating an AWS Glue partition index and enabling partition filtering can significantly improve query performance when dealing with large datasets with many partitions. The partition index allows Athena to quickly identify the relevant partitions for a query, reducing the time spent scanning unnecessary data. Partition filtering further optimizes the query by only scanning the partitions that match the filter conditions. Athena partition projection based on the S3 bucket prefix is another effective technique to improve query performance. By leveraging the bucket prefix structure, Athena can prune partitions that are not relevant to the query, reducing the amount of data that needs to be scanned and processed. This approach is particularly useful when the data is organized in a hierarchical structure within the S3 bucket.

upvoted 1 times

...

VerRi

5 months, 1 week ago

Selected Answer: AC

D is not correct because the issue is related to partitioning.

upvoted 1 times

...

HunkyBunky

5 months, 3 weeks ago

Selected Answer: AC

I guess A / C, beucase we faced with - query plans performance bottleneck, so indexing should be improved

upvoted 1 times

...

khchan123

6 months ago

A. Creating an AWS Glue partition index and enabling partition filtering can help improve query performance by allowing Athena to prune unnecessary partitions from the query plan. This can reduce the number of partitions that need to be scanned, resulting in faster query planning times. C. Athena partition projection allows you to define a partition scheme based on the S3 bucket prefix. This can help reduce the number of partitions that need to be scanned, as Athena can use the prefix to determine which partitions are relevant to the query. This can also help improve query performance and reduce planning times.

upvoted 2 times

...

okechi

6 months, 2 weeks ago

The right answer is BD

upvoted 1 times

...

Christina666

6 months, 2 weeks ago

Selected Answer: AD

A. Create an AWS Glue partition index. Enable partition filtering. Targeted Optimization: Partition indexes within the Glue Data Catalog help Athena efficiently identify the relevant partitions, significantly reducing query planning time. Partition filtering further refines the search during query execution. D. Transform the data that is in the S3 bucket to Apache Parquet format. Efficient Columnar Format: Parquet's columnar storage and built-in metadata often allow Athena to skip over large portions of data irrelevant to the query, leading to faster query planning and execution.

upvoted 3 times

...

fceb2c1

7 months ago

Selected Answer: AC

Keyword: Athena query planning time See explanation in the link: https://www.myexamcollection.com/Data-Engineer-Associate-vce-questions.htm B & D are related to analytical queries performance, not about "query planning" performance.

upvoted 4 times

...

ottarg

7 months, 2 weeks ago

Just finished the exam and I went with AD. I agree with GiorgioGss, but the reason why I picked A over C was becaues the table is already using Glue catalog. If we use the indexes, there's no reason to use C as we already have the partitions indexed. No reason to pick B if we have C selected. Thus I picked D with this to optimize the query e.g. if I'm only selecting a subset of the columns.

upvoted 2 times

...

GiorgioGss

7 months, 2 weeks ago

Strange questions.... it can be ABCD

upvoted 1 times

...

rralucard_

8 months, 4 weeks ago

If your table stored in an AWS Glue Data Catalog has tens and hundreds of thousands and millions of partitions, you can enable partition indexes on the table. With partition indexes, only the metadata for the partition value in the query’s filter is retrieved from the catalog instead of retrieving all the partitions’ metadata. The result is faster queries for such highly partitioned tables. The following table compares query runtimes between a partitioned table with no partition indexing and with partition indexing. The table contains approximately 100,000 partitions and uncompressed text data. The orders table is partitioned by the o_custkey column.

upvoted 1 times

...

[Removed]

9 months, 1 week ago

Selected Answer: BD

https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/

upvoted 2 times

...

Exam AWS Certified Data Engineer - Associate DEA-C01 All Questions

View all questions & answers for the AWS Certified Data Engineer - Associate DEA-C01 exam

Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 36 discussion

Comments

rralucard_

Mahidbdwh

SMALLAM

pypelyncar

VerRi

HunkyBunky

khchan123

okechi

Christina666

fceb2c1

ottarg

GiorgioGss

rralucard_

[Removed]

SY0-701