Exam AWS Certified Machine Learning Engineer - Associate MLA-C01 topic 1 question 20 discussion

Exam question from Amazon's AWS Certified Machine Learning Engineer - Associate MLA-C01

Question #: 20
Topic #: 1

[All AWS Certified Machine Learning Engineer - Associate MLA-C01 Questions]

A company has a large, unstructured dataset. The dataset includes many duplicate records across several key attributes.
Which solution on AWS will detect duplicates in the dataset with the LEAST code development?

A. Use Amazon Mechanical Turk jobs to detect duplicates.
B. Use Amazon QuickSight ML Insights to build a custom deduplication model.
C. Use Amazon SageMaker Data Wrangler to pre-process and detect duplicates.
D. Use the AWS Glue FindMatches transform to detect duplicates.

Show Suggested Answer

Suggested Answer: D 🗳️

by GiorgioGss at Nov. 27, 2024, 8:55 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

Saransundar

Highly Voted 4 months, 3 weeks ago

Selected Answer: D

AWS Glue FindMatches is specifically designed to identify duplicate or matching records in datasets without requiring labeled training data. It uses machine learning to find fuzzy matches and allows customization to fine-tune the matching process, making it ideal for this scenario.

upvoted 7 times

...

GiorgioGss

Highly Voted 4 months, 4 weeks ago

Selected Answer: D

https://aws.amazon.com/about-aws/whats-new/2021/11/aws-glue-findmatches-new-data-existing-dataset/ "allows you to identify duplicate or matching records in your dataset"

upvoted 6 times

...

conrad2023

Most Recent 2 months, 2 weeks ago

Selected Answer: C

I would argue you can should use Data Wrangler if you want to have the least code development. See https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-data-insights.html#data-wrangler-data-insights-samples "You can remove duplicate samples from the dataset using the Drop duplicates transform under Manage rows."

upvoted 2 times

...

feelgoodfactor

4 months, 1 week ago

Selected Answer: D

The AWS Glue FindMatches transform is the most appropriate solution because it is specifically designed to detect duplicates, requires minimal development effort, and scales efficiently for large datasets.

upvoted 3 times

...

nakidal495

4 months, 3 weeks ago

Selected Answer: A

I'm not sure but I think this is the correct answer.

upvoted 2 times

...

Exam AWS Certified Machine Learning Engineer - Associate MLA-C01 All Questions

View all questions & answers for the AWS Certified Machine Learning Engineer - Associate MLA-C01 exam

Exam AWS Certified Machine Learning Engineer - Associate MLA-C01 topic 1 question 20 discussion

Comments

Saransundar

GiorgioGss

conrad2023

feelgoodfactor

nakidal495

SY0-701