exam questions

Exam AWS Certified Machine Learning Engineer - Associate MLA-C01 All Questions

View all questions & answers for the AWS Certified Machine Learning Engineer - Associate MLA-C01 exam

Exam AWS Certified Machine Learning Engineer - Associate MLA-C01 topic 1 question 10 discussion

Case study -
An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.
The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.
Which AWS service or feature can aggregate the data from the various data sources?

  • A. Amazon EMR Spark jobs
  • B. Amazon Kinesis Data Streams
  • C. Amazon DynamoDB
  • D. AWS Lake Formation
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
tigrex73
Highly Voted 1 month, 2 weeks ago
Selected Answer: A
Amazon EMR with Spark is an excellent choice for aggregating, processing, and transforming large datasets from multiple sources (e.g., Amazon S3 and on-premises MySQL database). Spark jobs can handle both structured and unstructured. While Lake Formation is great for managing data lakes, it doesn’t provide the ETL and data processing capabilities required to aggregate and transform datasets from multiple sources.
upvoted 8 times
...
xukun
Most Recent 1 day, 7 hours ago
Selected Answer: D
Once you specify where your existing databases are and provide your access credentials, Lake Formation reads the data and its metadata (schema) to understand the contents of the data source. It then imports the data to your new data lake and records the metadata in a central catalog. With Lake Formation, you can import data from MySQL, PostgreSQL, SQL Server, MariaDB, and Oracle databases running in Amazon RDS or hosted in Amazon EC2. Both bulk and incremental data loading are supported. https://docs.aws.amazon.com/lake-formation/latest/dg/what-is-lake-formation.html
upvoted 1 times
...
Makendran
1 week ago
Selected Answer: A
While AWS Lake Formation could potentially be used in conjunction with other services for data lake management, Amazon EMR with Spark jobs is the most direct and powerful solution for aggregating and processing data from the various sources mentioned in this scenario. It provides the necessary tools to handle the data integration, address the class imbalance, and perform the complex feature engineering that may be required for the fraud detection model.
upvoted 1 times
...
CloudHandsOn
1 week, 6 days ago
Selected Answer: D
My first choice was Lake Formation
upvoted 1 times
...
ninomfr64
2 weeks, 1 day ago
Selected Answer: D
Yet another poorly worded AWS certification question. Here is my reasoning, the question is about "aggregate the data from S3 and on-premise mysql" and I do intend "aggregate" as put in the same place, therefore: A. No, while EMR spark job can connect to S3 and MySQL (spark can connect to mysql database), but it is a better tool to process data and then sore them in S3 B. No, KDS it is for delivering streaming data sources to specific destinations (S3, OpenSearch ...) C. No, DynamoDB is a nosql db that is not a great fit here D. Yes, Lake Formation "combine different types of structured and unstructured data into a centralized repository" https://docs.aws.amazon.com/lake-formation/latest/dg/what-is-lake-formation.html and "with Lake Formation, you can import your data using workflows" and as it is based on AWS Glue it supports both S3 and mysql
upvoted 2 times
...
AsankaIshara
2 weeks, 4 days ago
Selected Answer: D
Question is which AWS service or feature can aggregate the data from the various data sources? So lake formation
upvoted 1 times
...
breathingcloud
3 weeks ago
Selected Answer: A
I think it is A, it is more aligned with machine learning model
upvoted 1 times
...
AbhayD
3 weeks, 5 days ago
Selected Answer: A
Lake formation can catalog data from various sources, it doesn't provide the data processing capabilities needed for this scenario. EMR is more appropriate in this case.
upvoted 1 times
...
TonyKean888
1 month ago
Selected Answer: D
Data Aggregation: Lake Formation is designed to create a data lake, a centralized repository that stores and manages data from various sources, including S3, relational databases (like MySQL), and other data sources. Data Transformation: It can transform and clean data, making it suitable for analysis and machine learning. This includes handling class imbalance and feature interdependencies. Data Access: It provides a unified interface to access data, simplifying the process of integrating data from different sources into the ML model. While other options like Amazon EMR Spark jobs and Amazon Kinesis Data Streams could be used for data processing and streaming, they are not the most efficient and straightforward solutions for this specific use case. Amazon DynamoDB is a NoSQL database, not designed for batch data processing and aggregation. Therefore, AWS Lake Formation is the best choice to aggregate and prepare the data for the ML model. ref:https://docs.aws.amazon.com/lake-formation/latest/dg/what-is-lake-formation.html
upvoted 2 times
...
LR2023
1 month, 1 week ago
Selected Answer: D
Lake formation would be a better choice over EMR as it involes complexity setting up . For data aggregation and ETL processes, especially involving multiple data sources and ensuring data quality and security, AWS Lake Formation or Amazon Glue are more specialized and suitable option
upvoted 2 times
...
GiorgioGss
1 month, 2 weeks ago
Selected Answer: A
I would go with EMR Spark jobs just because I think Lake Formation is not designed for feature engineering. Spark is.
upvoted 1 times
...
a4002bd
1 month, 2 weeks ago
Selected Answer: D
Is it D? AWS Lake Formation ? EMR Spark jobs is more manual.
upvoted 4 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago