Exam AWS Certified Machine Learning Engineer - Associate MLA-C01 topic 1 question 87 discussion

Exam question from Amazon's AWS Certified Machine Learning Engineer - Associate MLA-C01

Question #: 87
Topic #: 1

[All AWS Certified Machine Learning Engineer - Associate MLA-C01 Questions]

A company has an Amazon S3 bucket that contains 1 ТВ of files from different sources. The S3 bucket contains the following file types in the same S3 folder: CSV, JSON, XLSX, and Apache Parquet.

An ML engineer must implement a solution that uses AWS Glue DataBrew to process the data. The ML engineer also must store the final output in Amazon S3 so that AWS Glue can consume the output in the future.

Which solution will meet these requirements?

A. Use DataBrew to process the existing S3 folder. Store the output in Apache Parquet format.
B. Use DataBrew to process the existing S3 folder. Store the output in AWS Glue Parquet format.
C. Separate the data into a different folder for each file type. Use DataBrew to process each folder individually. Store the output in Apache Parquet format.
D. Separate the data into a different folder for each file type. Use DataBrew to process each folder individually. Store the output in AWS Glue Parquet format.

Show Suggested Answer

Suggested Answer: C 🗳️

by ryuhei at March 1, 2025, 11:26 a.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

aws_Tamilan

2 weeks, 5 days ago

Selected Answer: A

🔑 Keyword: Process mixed file types with AWS Glue DataBrew & store for AWS Glue ✅ Correct Answer: A. Use DataBrew to process the existing S3 folder. Store the output in Apache Parquet format. Why? AWS Glue performs best with Parquet because it is optimized for analytical queries. No need to split data into separate folders—DataBrew can handle mixed file types. Why Others Are Wrong? ❌ B. "AWS Glue Parquet format" is not a valid term. Apache Parquet is the correct format. ❌ C & D. Separating files into different folders is unnecessary—DataBrew can process multiple formats in a single folder.

upvoted 1 times

...

michele_scar

3 weeks, 6 days ago

Selected Answer: A

C implies that you have to re-organize all files (1 TB is a lot). This means a lot of work. For me is A, less performance but without initial overhead of organization

upvoted 1 times

...

eesa

4 weeks, 1 day ago

Selected Answer: C

✅ Explanation: Problem Summary: The data in S3 is mixed file formats: CSV, JSON, XLSX, and Parquet — all in one folder. You need to use AWS Glue DataBrew to process the data. The processed data must be stored in S3 for AWS Glue to consume later. Key Considerations: DataBrew Input Requirements: DataBrew datasets must be in a consistent format (CSV, JSON, XLSX, or Parquet). DataBrew cannot process mixed formats in a single dataset. You must split the data by format. DataBrew Output Format: Apache Parquet is preferred for: Efficient storage Better performance with AWS Glue and other analytics tools Columnar storage benefits in querying and transformations "AWS Glue Parquet format" does not exist — this is a distractor in the answer options.

upvoted 1 times

...

chris_spencer

1 month, 1 week ago

Selected Answer: A

Should be A. C is incorrect because it involve separating the data by file type, which is unnecessary since DataBrew can process various file types within the same folder.

upvoted 2 times

...

ryuhei

1 month, 2 weeks ago

Selected Answer: C

AWS Glue DataBrew can process various file formats (CSV, JSON, XLSX, Parquet) Since DataBrew can handle datasets with multiple file formats, there is no need to separate files into different folders by type. Apache Parquet is an optimal format for AWS Glue Parquet is a columnar format, which is well-suited for AWS Glue and is efficient for later analysis and ML model training. "AWS Glue Parquet format" does not exist Options B and D mention "AWS Glue Parquet format," which is incorrect. Parquet is a standard data format and is not exclusive to AWS Glue. ✅ Conclusion: Option A is the best solution because it allows DataBrew to process all files in the existing S3 folder and store the output in Apache Parquet format, which is efficient and compatible with AWS Glue. 🚀

upvoted 2 times

...

Exam AWS Certified Machine Learning Engineer - Associate MLA-C01 All Questions

View all questions & answers for the AWS Certified Machine Learning Engineer - Associate MLA-C01 exam

Exam AWS Certified Machine Learning Engineer - Associate MLA-C01 topic 1 question 87 discussion

Comments

aws_Tamilan

michele_scar

eesa

chris_spencer

ryuhei

SY0-701