Exam AWS Certified Machine Learning Engineer - Associate MLA-C01 topic 1 question 19 discussion

Exam question from Amazon's AWS Certified Machine Learning Engineer - Associate MLA-C01

Question #: 19
Topic #: 1

[All AWS Certified Machine Learning Engineer - Associate MLA-C01 Questions]

An ML engineer needs to process thousands of existing CSV objects and new CSV objects that are uploaded. The CSV objects are stored in a central Amazon S3 bucket and have the same number of columns. One of the columns is a transaction date. The ML engineer must query the data based on the transaction date.
Which solution will meet these requirements with the LEAST operational overhead?

A. Use an Amazon Athena CREATE TABLE AS SELECT (CTAS) statement to create a table based on the transaction date from data in the central S3 bucket. Query the objects from the table.
B. Create a new S3 bucket for processed data. Set up S3 replication from the central S3 bucket to the new S3 bucket. Use S3 Object Lambda to query the objects based on transaction date.
C. Create a new S3 bucket for processed data. Use AWS Glue for Apache Spark to create a job to query the CSV objects based on transaction date. Configure the job to store the results in the new S3 bucket. Query the objects from the new S3 bucket.
D. Create a new S3 bucket for processed data. Use Amazon Data Firehose to transfer the data from the central S3 bucket to the new S3 bucket. Configure Firehose to run an AWS Lambda function to query the data based on transaction date.

Show Suggested Answer

Suggested Answer: A 🗳️

by GiorgioGss at Nov. 27, 2024, 8:53 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

Rama_Adim

1 month, 1 week ago

Selected Answer: A

None of the answers are correct. Amazon CTAS cannot create a table directly pointing to S3 in a single query/step.

upvoted 1 times

...

ninomfr64

6 months ago

Selected Answer: A

A. Yes, Athena is the right service to query data in S3. B. No, maybe this might also work, but it is quite cumbersome C. No, SparkSQL can be used to query files on data, but it is more work than Athena and creating a new S3 bucket is not needed D. No, Data Firehose cannot consume from S3 directly

upvoted 1 times

...

feelgoodfactor

6 months, 4 weeks ago

Selected Answer: A

Using Amazon Athena with a CREATE TABLE AS SELECT (CTAS) statement is the simplest and most efficient way to query the CSV objects based on the transaction date, while requiring minimal operational effort.

upvoted 1 times

...

motk123

7 months ago

Selected Answer: A

Athena allows direct querying of data stored in Amazon S3 using SQL without requiring data movement or transformation. CTAS (CREATE TABLE AS SELECT): Creates a new table based on a filtered or transformed dataset, such as transaction dates, and stores the results in S3. Why Not the Other Options? B. S3 Object Lambda is designed for on-the-fly data transformation, not querying data efficiently. Adding replication increases complexity without addressing the querying requirement directly. C. Glue is suited for complex ETL workflows, but it introduces significant operational overhead for a task that Athena can handle more easily. D. Firehose is designed for streaming data, not processing large existing datasets.

upvoted 2 times

...

GiorgioGss

7 months, 2 weeks ago

Selected Answer: A

Base usage of CTAS

upvoted 2 times

...