exam questions

Exam AWS Certified Machine Learning Engineer - Associate MLA-C01 All Questions

View all questions & answers for the AWS Certified Machine Learning Engineer - Associate MLA-C01 exam

Exam AWS Certified Machine Learning Engineer - Associate MLA-C01 topic 1 question 87 discussion

A company has an Amazon S3 bucket that contains 1 ТВ of files from different sources. The S3 bucket contains the following file types in the same S3 folder: CSV, JSON, XLSX, and Apache Parquet.

An ML engineer must implement a solution that uses AWS Glue DataBrew to process the data. The ML engineer also must store the final output in Amazon S3 so that AWS Glue can consume the output in the future.

Which solution will meet these requirements?

  • A. Use DataBrew to process the existing S3 folder. Store the output in Apache Parquet format.
  • B. Use DataBrew to process the existing S3 folder. Store the output in AWS Glue Parquet format.
  • C. Separate the data into a different folder for each file type. Use DataBrew to process each folder individually. Store the output in Apache Parquet format.
  • D. Separate the data into a different folder for each file type. Use DataBrew to process each folder individually. Store the output in AWS Glue Parquet format.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
aws_Tamilan
2 weeks, 5 days ago
Selected Answer: A
🔑 Keyword: Process mixed file types with AWS Glue DataBrew & store for AWS Glue ✅ Correct Answer: A. Use DataBrew to process the existing S3 folder. Store the output in Apache Parquet format. Why? AWS Glue performs best with Parquet because it is optimized for analytical queries. No need to split data into separate folders—DataBrew can handle mixed file types. Why Others Are Wrong? ❌ B. "AWS Glue Parquet format" is not a valid term. Apache Parquet is the correct format. ❌ C & D. Separating files into different folders is unnecessary—DataBrew can process multiple formats in a single folder.
upvoted 1 times
...
michele_scar
3 weeks, 6 days ago
Selected Answer: A
C implies that you have to re-organize all files (1 TB is a lot). This means a lot of work. For me is A, less performance but without initial overhead of organization
upvoted 1 times
...
eesa
4 weeks, 1 day ago
Selected Answer: C
✅ Explanation: Problem Summary: The data in S3 is mixed file formats: CSV, JSON, XLSX, and Parquet — all in one folder. You need to use AWS Glue DataBrew to process the data. The processed data must be stored in S3 for AWS Glue to consume later. Key Considerations: DataBrew Input Requirements: DataBrew datasets must be in a consistent format (CSV, JSON, XLSX, or Parquet). DataBrew cannot process mixed formats in a single dataset. You must split the data by format. DataBrew Output Format: Apache Parquet is preferred for: Efficient storage Better performance with AWS Glue and other analytics tools Columnar storage benefits in querying and transformations "AWS Glue Parquet format" does not exist — this is a distractor in the answer options.
upvoted 1 times
...
chris_spencer
1 month, 1 week ago
Selected Answer: A
Should be A. C is incorrect because it involve separating the data by file type, which is unnecessary since DataBrew can process various file types within the same folder.
upvoted 2 times
...
ryuhei
1 month, 2 weeks ago
Selected Answer: C
AWS Glue DataBrew can process various file formats (CSV, JSON, XLSX, Parquet) Since DataBrew can handle datasets with multiple file formats, there is no need to separate files into different folders by type. Apache Parquet is an optimal format for AWS Glue Parquet is a columnar format, which is well-suited for AWS Glue and is efficient for later analysis and ML model training. "AWS Glue Parquet format" does not exist Options B and D mention "AWS Glue Parquet format," which is incorrect. Parquet is a standard data format and is not exclusive to AWS Glue. ✅ Conclusion: Option A is the best solution because it allows DataBrew to process all files in the existing S3 folder and store the output in Apache Parquet format, which is efficient and compatible with AWS Glue. 🚀
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago