A company has a large, unstructured dataset. The dataset includes many duplicate records across several key attributes. Which solution on AWS will detect duplicates in the dataset with the LEAST code development?
A.
Use Amazon Mechanical Turk jobs to detect duplicates.
B.
Use Amazon QuickSight ML Insights to build a custom deduplication model.
C.
Use Amazon SageMaker Data Wrangler to pre-process and detect duplicates.
D.
Use the AWS Glue FindMatches transform to detect duplicates.
AWS Glue FindMatches is specifically designed to identify duplicate or matching records in datasets without requiring labeled training data. It uses machine learning to find fuzzy matches and allows customization to fine-tune the matching process, making it ideal for this scenario.
The AWS Glue FindMatches transform is the most appropriate solution because it is specifically designed to detect duplicates, requires minimal development effort, and scales efficiently for large datasets.
https://aws.amazon.com/about-aws/whats-new/2021/11/aws-glue-findmatches-new-data-existing-dataset/
"allows you to identify duplicate or matching records in your dataset"
upvoted 4 times
...
Log in to ExamTopics
Sign in:
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one.
So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
Saransundar
Highly Voted 1 month agofeelgoodfactor
Most Recent 3 weeks, 2 days agonakidal495
1 month, 1 week agoGiorgioGss
1 month, 1 week ago