Exam AWS Certified Machine Learning Engineer - Associate MLA-C01 All Questions

View all questions & answers for the AWS Certified Machine Learning Engineer - Associate MLA-C01 exam

Exam AWS Certified Machine Learning Engineer - Associate MLA-C01 topic 1 question 10 discussion

Exam question from Amazon's AWS Certified Machine Learning Engineer - Associate MLA-C01

Question #: 10
Topic #: 1

[All AWS Certified Machine Learning Engineer - Associate MLA-C01 Questions]

Case study -
An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.
The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.
Which AWS service or feature can aggregate the data from the various data sources?

A. Amazon EMR Spark jobs
B. Amazon Kinesis Data Streams
C. Amazon DynamoDB
D. AWS Lake Formation

Show Suggested Answer

Suggested Answer: D 🗳️

by a4002bd at Nov. 27, 2024, 1:48 a.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

tigrex73

Highly Voted 4 months, 4 weeks ago

Selected Answer: A

Amazon EMR with Spark is an excellent choice for aggregating, processing, and transforming large datasets from multiple sources (e.g., Amazon S3 and on-premises MySQL database). Spark jobs can handle both structured and unstructured. While Lake Formation is great for managing data lakes, it doesn’t provide the ETL and data processing capabilities required to aggregate and transform datasets from multiple sources.

upvoted 11 times

...

a4002bd

Highly Voted 5 months ago

Selected Answer: D

Is it D? AWS Lake Formation ? EMR Spark jobs is more manual.

upvoted 7 times

...

Sadrik

Most Recent 1 month ago

Selected Answer: A

EMR with Spark can aggregate large datasets from multiple sources, including S3 and on-premises MySQL.

upvoted 1 times

...

chris_spencer

1 month, 2 weeks ago

Selected Answer: A

A. Amazon EMR Spark jobs

upvoted 1 times

...

djeong95

1 month, 3 weeks ago

Selected Answer: D

The answer is D (Lake Formation). This is because EMR Spark does not natively support getting data from on-prem DB as its data source. You would need DataSync or something else for that. On the other hand, Lake Formation fulfills all use cases documented clearly as links shown below. https://docs.aws.amazon.com/lake-formation/latest/dg/what-is-lake-formation.html#lake-formation-features https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-get-data-in.html

upvoted 2 times

...

gorloff

2 months, 2 weeks ago

Selected Answer: D

Lake Formation ain't only the storage service - it actually an umbrella over most AWS Glue Offerings. And cause it also provides fully serverless Spark, it seems to be better option than EMR. This point is quite tricky as often when someone refers to LF, they mean the governance part only.

upvoted 2 times

...

Udyan

2 months, 2 weeks ago

Selected Answer: D

The correct answer is D. AWS Lake Formation. Explanation: AWS Lake Formation is designed for aggregating, organizing, and securing large datasets from multiple sources (e.g., S3, on-premises databases). It simplifies the creation of a centralized data lake, enabling seamless integration and analysis of diverse data formats. This is particularly useful for tasks like fraud detection, where data comes from different sources. Amazon EMR Spark jobs (Option A) is more suited for large-scale data processing and analytics. While it can process and transform data, it requires more operational effort to configure and manage compared to AWS Lake Formation. Why AWS Lake Formation? Aggregates and organizes data from S3 and MySQL easily. Offers integrated data cataloging for better feature engineering. Reduces operational overhead compared to setting up EMR.

upvoted 2 times

...

shabak

2 months, 3 weeks ago

Selected Answer: D

Lake Formation is the correct answer

upvoted 3 times

...

dbcert87

3 months ago

Selected Answer: D

AWS Lake Formation is a service designed to aggregate, catalog, and manage data from multiple data sources, including on-premises databases and Amazon S3, making it an ideal choice for this scenario. While Amazon EMR with Apache Spark is powerful for processing and analyzing large datasets, it focuses more on data processing than on data aggregation and cataloging. It doesn't inherently manage interdependencies or schema enforcement

upvoted 5 times

...

dbcert87

3 months ago

Selected Answer: A

Amazon EMR is correct answer for aggregation

upvoted 1 times

...

fnuuu

3 months ago

Selected Answer: D

Data Lake is used for data discovery

upvoted 3 times

...

xukun

3 months, 1 week ago

Selected Answer: D

Once you specify where your existing databases are and provide your access credentials, Lake Formation reads the data and its metadata (schema) to understand the contents of the data source. It then imports the data to your new data lake and records the metadata in a central catalog. With Lake Formation, you can import data from MySQL, PostgreSQL, SQL Server, MariaDB, and Oracle databases running in Amazon RDS or hosted in Amazon EC2. Both bulk and incremental data loading are supported. https://docs.aws.amazon.com/lake-formation/latest/dg/what-is-lake-formation.html

upvoted 3 times

...

Makendran

3 months, 2 weeks ago

Selected Answer: A

While AWS Lake Formation could potentially be used in conjunction with other services for data lake management, Amazon EMR with Spark jobs is the most direct and powerful solution for aggregating and processing data from the various sources mentioned in this scenario. It provides the necessary tools to handle the data integration, address the class imbalance, and perform the complex feature engineering that may be required for the fraud detection model.

upvoted 2 times

...

CloudHandsOn

3 months, 3 weeks ago

Selected Answer: D

My first choice was Lake Formation

upvoted 3 times

...

ninomfr64

3 months, 3 weeks ago

Selected Answer: D

Yet another poorly worded AWS certification question. Here is my reasoning, the question is about "aggregate the data from S3 and on-premise mysql" and I do intend "aggregate" as put in the same place, therefore: A. No, while EMR spark job can connect to S3 and MySQL (spark can connect to mysql database), but it is a better tool to process data and then sore them in S3 B. No, KDS it is for delivering streaming data sources to specific destinations (S3, OpenSearch ...) C. No, DynamoDB is a nosql db that is not a great fit here D. Yes, Lake Formation "combine different types of structured and unstructured data into a centralized repository" https://docs.aws.amazon.com/lake-formation/latest/dg/what-is-lake-formation.html and "with Lake Formation, you can import your data using workflows" and as it is based on AWS Glue it supports both S3 and mysql

upvoted 5 times

...

AsankaIshara

4 months ago

Selected Answer: D

Question is which AWS service or feature can aggregate the data from the various data sources? So lake formation

upvoted 3 times

...

breathingcloud

4 months ago

Selected Answer: A

I think it is A, it is more aligned with machine learning model

upvoted 1 times

...

Load full discussion...

Exam AWS Certified Machine Learning Engineer - Associate MLA-C01 All Questions

View all questions & answers for the AWS Certified Machine Learning Engineer - Associate MLA-C01 exam

Exam AWS Certified Machine Learning Engineer - Associate MLA-C01 topic 1 question 10 discussion

Comments

tigrex73

a4002bd

Sadrik

chris_spencer

djeong95

gorloff

Udyan

shabak

dbcert87

dbcert87

fnuuu

xukun

Makendran

CloudHandsOn

ninomfr64

AsankaIshara

breathingcloud

SY0-701