exam questions

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 22 discussion

A retail chain has been ingesting purchasing records from its network of 20,000 stores to Amazon S3 using Amazon Kinesis Data Firehose. To support training an improved machine learning model, training records will require new but simple transformations, and some attributes will be combined. The model needs to be retrained daily.
Given the large number of stores and the legacy data ingestion, which change will require the LEAST amount of development effort?

  • A. Require that the stores to switch to capturing their data locally on AWS Storage Gateway for loading into Amazon S3, then use AWS Glue to do the transformation.
  • B. Deploy an Amazon EMR cluster running Apache Spark with the transformation logic, and have the cluster run each day on the accumulating records in Amazon S3, outputting new/transformed records to Amazon S3.
  • C. Spin up a fleet of Amazon EC2 instances with the transformation logic, have them transform the data records accumulating on Amazon S3, and output the transformed records to Amazon S3.
  • D. Insert an Amazon Kinesis Data Analytics stream downstream of the Kinesis Data Firehose stream that transforms raw record attributes into simple transformed values using SQL.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
cybe001
Highly Voted 2 years, 7 months ago
D is correct. Question has '"simple transformations, and some attributes will be combined" and Least development effort. Kinesis analytics can get data from Firehose, transform and write to S3 https://docs.aws.amazon.com/kinesisanalytics/latest/java/examples-s3.html
upvoted 48 times
mawsman
2 years, 6 months ago
Best explanation here, kudos.
upvoted 4 times
...
kakalotka
2 years, 5 months ago
I can't find any information that indicate Kinesis data analytics taking data from firehose
upvoted 2 times
...
...
Huy
Highly Voted 2 years, 5 months ago
The best way to transform data is before it arrives to S3 so D should be best answer. But D is not completed. It should have another Firehose to deliver results to S3.
upvoted 9 times
...
JonSno
Most Recent 2 months, 1 week ago
Selected Answer: D
D. Insert an Amazon Kinesis Data Analytics stream downstream of the Kinesis Data Firehose stream that transforms raw record attributes into simple transformed values using SQL. Explanation: Since the data is already flowing through Amazon Kinesis Data Firehose, the least development effort solution is to use Amazon Kinesis Data Analytics, which supports SQL-based transformations on streaming data without requiring new infrastructure. Why is this the best choice? No major architectural changes – Data continues flowing from stores into Kinesis Data Firehose and then to Amazon S3. Simple SQL transformations – Since the changes are simple (e.g., attribute combinations), SQL is sufficient. Low operational overhead – No need to manage clusters or instances. Real-time processing – Transformed records immediately enter Amazon S3 for training.
upvoted 1 times
...
CKS1210
10 months, 1 week ago
Ans is D Amazon Kinesis Data Analytics provides a serverless option for real-time data processing using SQL queries. In this case, by inserting a Kinesis Data Analytics stream downstream of the Kinesis Data Firehose stream, the retail chain can easily perform the required simple transformations on the ingested purchasing records.
upvoted 1 times
...
Valcilio
1 year, 1 month ago
Selected Answer: D
The best answer is to use a lambda, but the letter D can do it very good too in the absence of the lambda option.
upvoted 2 times
...
cloud_trail
2 years, 6 months ago
I go with D. A tough question, though. And C are definitely out. They key to the question is that it does not say that the transformed data needs to be stored again in S3. It just needs to be sent to the model for training after being transformed. So a Kinesis Data Analytics stream is appropriate to do the transformation.
upvoted 1 times
...
harmanbirstudy
2 years, 6 months ago
Legacy data -- Firehose -- Kinesis Analytics -- S3.This happens in near real time before the data ends up in S3. --Legacy data -- Firehose -- S3 is already happening (mentioned in first line in question), adding Kinesis Data Analytics to do simple transformation joins using SQL on the incoming data is the LEAST amount of work needed. Kinesis Data analytics can write o S3. here is the AWS link with working example.Even Though Udemy tutorial said it cannot write directly to S3 :) . https://docs.aws.amazon.com/kinesisanalytics/latest/java/examples-s3.html
upvoted 6 times
...
gamaX
2 years, 6 months ago
It seems that LEAST developmnet effort: https://aws.amazon.com/fr/blogs/big-data/preprocessing-data-in-amazon-kinesis-analytics-with-aws-lambda/ and GRETAST development effort: https://aws.amazon.com/fr/blogs/big-data/optimizing-downstream-data-processing-with-amazon-kinesis-data-firehose-and-amazon-emr-running-apache-spark/
upvoted 1 times
...
HaiHN
2 years, 6 months ago
It's D https://aws.amazon.com/blogs/big-data/preprocessing-data-in-amazon-kinesis-analytics-with-aws-lambda/ "In some scenarios, you may need to enhance your streaming data with additional information, before you perform your SQL analysis. Kinesis Analytics gives you the ability to use data from Amazon S3 in your Kinesis Analytics application, using the Reference Data feature. However, you cannot use other data sources from within your SQL query."
upvoted 1 times
h_sahu
2 years, 6 months ago
I believe, kinesis should be used only in case of live data stream and this is not the case here. So as per me D shouldn't be the answer. I think A should be the answer as AWS storage gateway is something which is used alongwith on premise applications to move data to s3. Then glue can be used to transform the data.
upvoted 1 times
cloud_trail
2 years, 5 months ago
With option A, you would be changing the legacy data ingestion, a huge development effort. Remember, you're talking about 20,000 stores.
upvoted 2 times
...
...
...
hans1234
2 years, 6 months ago
It is D.
upvoted 1 times
...
dikers
2 years, 6 months ago
I think the answer is D, because require the LEAST amount of development effort.
upvoted 1 times
...
roytruong
2 years, 6 months ago
it's D, kinesis analytic can easily connect with firehose
upvoted 2 times
...
dreemswang
2 years, 6 months ago
why not A. it seems good to me
upvoted 2 times
ExamTaker123456789
2 years, 6 months ago
"require stores to capture data locally using S3 gateway" - for 20k stores this creates a HUUUGE operational overhead and development effort, definitely wrong
upvoted 3 times
...
...
PRC
2 years, 7 months ago
D is correct...rest all need some kind of manual intervention as well as they are not simple..Firehose allows transformation as well as moving into S3
upvoted 6 times
...
devsean
2 years, 7 months ago
I think the answer is B. D would be correct if they didn't want to transform the legacy data from before the switch, but it seems like they do. Choosing D would mean that you'd have to use an EC2 instance or something else to transform the legacy data along with adding the Kinesis data analytics functionality. Also, there is no real-time requirement so daily transformation is fine.
upvoted 3 times
hailiang
2 years, 6 months ago
Its D, because with KDA you can transform the data with SQL while with EMR you need to write code, considering the requirement of "least development effort", so D
upvoted 3 times
...
...
devsean
2 years, 7 months ago
I think the answer is B. D would be correct if they didn't want to transform the legacy data from before the switch, but it seems like they do. Choosing D would mean that you'd have to use an EC2 instance or something else to transform the legacy data along with adding the Kinesis data analytics functionality. Also, there is no real-time requirement so daily transformation is fine.
upvoted 7 times
ADVIT
10 months ago
"LEAST amount of development effort" , EMR is no complicated to LEAST
upvoted 1 times
...
ZSun
1 year ago
If the question is "least cost" then B, but the question is "least develope effort, then you want to keep original architeture. I agree that for daily ETL instead of real-time, and large dataset, B is better option.
upvoted 1 times
...
HaiHN
2 years, 6 months ago
You can use Lambda instead of EC2. So D should be OK. https://aws.amazon.com/blogs/big-data/preprocessing-data-in-amazon-kinesis-analytics-with-aws-lambda/
upvoted 1 times
...
...
am7
2 years, 7 months ago
can be B
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago