exam questions

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 3 discussion

A Mobile Network Operator is building an analytics platform to analyze and optimize a company's operations using Amazon Athena and Amazon S3.
The source systems send data in .CSV format in real time. The Data Engineering team wants to transform the data to the Apache Parquet format before storing it on Amazon S3.
Which solution takes the LEAST effort to implement?

  • A. Ingest .CSV data using Apache Kafka Streams on Amazon EC2 instances and use Kafka Connect S3 to serialize data as Parquet
  • B. Ingest .CSV data from Amazon Kinesis Data Streams and use Amazon Glue to convert data into Parquet.
  • C. Ingest .CSV data using Apache Spark Structured Streaming in an Amazon EMR cluster and use Apache Spark to convert data into Parquet.
  • D. Ingest .CSV data from Amazon Kinesis Data Streams and use Amazon Kinesis Data Firehose to convert data into Parquet.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
DonaldCMLIN
Highly Voted 3 years, 7 months ago
Answer is B
upvoted 31 times
Antriksh
3 years, 6 months ago
you cannot use AWS glue for streaming data. Clearly B is incorrect.
upvoted 3 times
scuzzy2010
3 years, 6 months ago
Even if the exam's answer is based on solution before AWS implemented the capability of AWS glue to process streaming data, this answer is still correct as Kinesis would output the data to S3 and Glue will pick it up from there and covert to parquet. Question does not say data must be converted to parquet in real time, it only says the csv data is received as a stream in real time.
upvoted 2 times
GeeBeeEl
3 years, 6 months ago
Actually question says "The source systems send data in CSV format in real time The Data Engineering team wants to transform the data to the Apache Parquet format before storing it on Amazon S3" same as saying data must be converted real time
upvoted 5 times
...
...
zzeng
3 years, 6 months ago
AWS Glue can do it now (2020 May) https://aws.amazon.com/jp/blogs/news/new-serverless-streaming-etl-with-aws-glue/
upvoted 6 times
hamimelon
2 years, 4 months ago
This link is in Japanese
upvoted 3 times
...
...
...
OmarSaadEldien
3 years, 6 months ago
the Approve Of B https://aws.amazon.com/blogs/aws/new-serverless-streaming-etl-with-aws-glue/
upvoted 7 times
...
...
vetal
Highly Voted 3 years, 7 months ago
D is wrong as kinesis firehose can convert from JSON to parquet but here we have CSV. B is correct and here is another proof link: https://medium.com/searce/convert-csv-json-files-to-apache-parquet-using-aws-glue-a760d177b45f
upvoted 24 times
zzeng
3 years, 6 months ago
https://docs.aws.amazon.com/firehose/latest/dev/record-format-conversion.html You are right. https://docs.aws.amazon.com/firehose/latest/dev/record-format-conversion.html If you want to convert an input format other than JSON, such as comma-separated values (CSV) or structured text, you can use AWS Lambda to transform it to JSON first
upvoted 8 times
samy666
2 years, 11 months ago
But there is no Lambda in D
upvoted 2 times
AdolinKholin
2 years, 7 months ago
But there's a D in Lambda
upvoted 3 times
...
...
...
...
LalBSingh
Most Recent 2 months, 1 week ago
Selected Answer: D
Kinesis Data Firehose supports real-time streaming ingestion and can automatically convert CSV to Parquet before storing it in S3.
upvoted 3 times
...
JonSno
2 months, 1 week ago
Selected Answer: D
Amazon Kinesis Data Streams + Amazon Kinesis Data Firehose Effort: Lowest effort Why? Amazon Kinesis Data Firehose natively supports real-time CSV ingestion and automatic conversion to Parquet. Fully managed, serverless, and directly integrates with Amazon S3. Requires zero infrastructure management compared to other solutions.
upvoted 1 times
JonSno
2 months, 1 week ago
I take this back .. ans shd be B.. on researching further it is JSON or ORC to Parque that KDS supports.. So answer is B - not optimal but close to suitable . Amazon Kinesis Data Streams + AWS Glue AWS Glue can batch-process CSV and convert it to Parquet for S3. However, Glue is batch-oriented, not real-time.
upvoted 1 times
...
...
liquen14
2 months, 1 week ago
Selected Answer: B
Although I'd go with Glue and option B I'm pretty sure that this is one of those "15 unscored questions that do not affect your score. AWS collects information about performance on these unscored questions to evaluate these questions for future use as scored questions" Just for fun I asked perplexity, chatgpt, gemini, deepseek and claude: all gave D as first response When I pointed out that "according to this https://docs.aws.amazon.com/firehose/latest/dev/record-format-conversion.html Kinesis can't convert directly cvs to parquet. It needs a Lambda" each model responded in a different way (some of them contradictory). My reasoning is that D (Kinesis + Firehose) is incorrect because Firehose does not support direct CSV-to-Parquet conversion and needs a Lambda not mentioned in the option. But discussing about questions like this one is nothing but I big waste of time ;-P
upvoted 1 times
...
AbimbolaOlaniran
4 months, 1 week ago
Selected Answer: D
D Kinesis Data Firehose is designed specifically for streaming data delivery to destinations like S3. It has built-in support for data format conversion, including CSV to Parquet. This eliminates the need for managing separate transformation services like Glue or Spark. The setup is significantly simpler: you configure a Firehose delivery stream, specify the data format conversion, and point it to your S3 bucket. Therefore, option D requires the least implementation effort because it leverages a fully managed service (Kinesis Data Firehose) with built-in functionality for data format conversion.
upvoted 1 times
...
venksters
4 months, 1 week ago
Selected Answer: B
Amazon Kinesis Data Firehose can only convert from JSON to Apache Parquet or Apache ORC before storing the data in Amazon S3.
upvoted 1 times
...
TinTinAWS
7 months ago
Answer B, Yes, Amazon Kinesis Data Firehose can convert CSV to Apache Parquet, but you need to use a Lambda function to transform the CSV to JSON first: here the question is least effort to build, so B is the right answer with least effort to build the solution
upvoted 1 times
...
Keya
7 months ago
Selected Answer: B
Use Amazon Kinesis Data Streams to ingest customer data and configure a Kinesis Data Firehose delivery stream as a consumer to convert the data into Apache Parquet is incorrect. Although this could be a valid solution, it entails more development effort as Kinesis Data Firehose does not support converting CSV files directly into Apache Parquet, unlike JSON.
upvoted 2 times
...
geoan13
7 months ago
Selected Answer: B
Amazon Kinesis Data Firehose can convert the format of your input data from JSON to Apache Parquet or Apache ORC before storing the data in Amazon S3. Parquet and ORC are columnar data formats that save space and enable faster queries compared to row-oriented formats like JSON. If you want to convert an input format other than JSON, such as comma-separated values (CSV) or structured text, you can use AWS Lambda to transform it to JSON first. https://docs.aws.amazon.com/firehose/latest/dev/record-format-conversion.html
upvoted 1 times
...
rav009
11 months, 3 weeks ago
Selected Answer: D
Between B and D chose D. Because Firehose can't handle csv directly.
upvoted 1 times
rav009
11 months, 3 weeks ago
Between B and D chose B. Because Firehose can't handle csv directly.
upvoted 1 times
...
...
s_k_aws
1 year, 1 month ago
Answer is B. https://docs.aws.amazon.com/firehose/latest/dev/record-format-conversion.html "If you want to convert an input format other than JSON, such as comma-separated values (CSV) or structured text, you can use AWS Lambda to transform it to JSON first."
upvoted 1 times
...
chewasa
1 year, 1 month ago
Selected Answer: B
u need glue to convert to parquet
upvoted 1 times
...
0c47783
1 year, 1 month ago
D for sure, Firehose can convert csv to parquet
upvoted 3 times
...
vkbajoria
1 year, 2 months ago
Answer is unfortunately B. firehose cannot convert coma separated CSV to parquet directly.
upvoted 1 times
...
kyuhuck
1 year, 2 months ago
Selected Answer: D
b is not goog but - >given the context of "finding the solution that requires the least effort to implement," option D is the most suitable choice. Ingesting data from Amazon Kinesis Data Streams and using Amazon Kinesis Data Firehose to convert the data to Parquet format is a serverless approach. It allows for automatic data transformation and storage in Amazon S3 without the need for additional development or management of data conversion logic. Therefore, under the given conditions, option D is considered the solution that requires the "least effort" to implement
upvoted 3 times
shammous
8 months, 3 weeks ago
Kinesis Data Firehose doesn't convert anything, it rather calls a lambda function to do so which is the overhead we want to avoid. B is the correct answer.
upvoted 1 times
...
...
kyuhuck
1 year, 2 months ago
Selected Answer: D
Amazon Kinesis Data Streams is a service that can capture, store, and process streaming data in real time. Amazon Kinesis Data Firehose is a service that can deliver streaming data to various destinations, such as Amazon S3, Amazon Redshift, or Amazon Elasticsearch Service. Amazon Kinesis Data Firehose can also transform the data before delivering it, such as converting the data format, compressing the data, or encrypting the data. One of the supported data formats that Amazon Kinesis Data Firehose can convert to is Apache Parquet, which is a columnar storage format that can improve the performance and cost-efficiency of analytics queries. By using Amazon Kinesis Data Streams and Amazon Kinesis Data Firehose, the Mobile Network Operator can ingest the .CSV data from the source systems and use Amazon Kinesis Data Firehose to convert the data into Parquet before storing it on Amazon S3
upvoted 2 times
Jonfernz
1 year, 2 months ago
Firehose cannot natively do the conversion. It requires a Lambda function for that purpose.
upvoted 3 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago