exam questions

Exam AWS Certified Solutions Architect - Associate SAA-C03 All Questions

View all questions & answers for the AWS Certified Solutions Architect - Associate SAA-C03 exam

Exam AWS Certified Solutions Architect - Associate SAA-C03 topic 1 question 214 discussion

A company’s reporting system delivers hundreds of .csv files to an Amazon S3 bucket each day. The company must convert these files to Apache Parquet format and must store the files in a transformed data bucket.

Which solution will meet these requirements with the LEAST development effort?

  • A. Create an Amazon EMR cluster with Apache Spark installed. Write a Spark application to transform the data. Use EMR File System (EMRFS) to write files to the transformed data bucket.
  • B. Create an AWS Glue crawler to discover the data. Create an AWS Glue extract, transform, and load (ETL) job to transform the data. Specify the transformed data bucket in the output step.
  • C. Use AWS Batch to create a job definition with Bash syntax to transform the data and output the data to the transformed data bucket. Use the job definition to submit a job. Specify an array job as the job type.
  • D. Create an AWS Lambda function to transform the data and output the data to the transformed data bucket. Configure an event notification for the S3 bucket. Specify the Lambda function as the destination for the event notification.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Babba
Highly Voted 1 year, 6 months ago
Selected Answer: B
It looks like AWS Glue allows fully managed CSV to Parquet conversion jobs: https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/three-aws-glue-etl-job-types-for-converting-data-to-apache-parquet.html
upvoted 20 times
awsgeek75
7 months, 1 week ago
A text book use case: https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/three-aws-glue-etl-job-types-for-converting-data-to-apache-parquet.html#three-aws-glue-etl-job-types-for-converting-data-to-apache-parquet-epics B is the correct answer.
upvoted 2 times
...
...
cookieMr
Highly Voted 1 year, 1 month ago
Selected Answer: B
AWS Glue is a fully managed ETL service that simplifies the process of preparing and transforming data for analytics. Using AWS Glue requires minimal development effort compared to the other options. Option A requires more development effort as it involves writing a Spark application to transform the data. It also introduces additional infrastructure management with the EMR cluster. Option C requires writing and managing custom Bash scripts for data transformation. It requires more manual effort and does not provide the built-in capabilities of AWS Glue for data transformation. Option D requires developing and managing a custom Lambda for data transformation. While Lambda can handle the transformation, it requires more effort compared to AWS Glue, which is specifically designed for ETL operations. Therefore, option B provides the easiest and least development effort by leveraging AWS Glue's capabilities for data discovery, transformation, and output to the transformed data bucket.
upvoted 9 times
...
iamroyalty_k
Most Recent 1 week, 2 days ago
Selected Answer: B
AWS Glue offers a serverless, automated, and cost-effective solution with minimal development and operational effort, making it the best choice for this use case. Why not the other options? A. Amazon EMR cluster with Apache Spark: While EMR and Spark can handle this task, it requires more setup, maintenance, and development effort compared to AWS Glue. Managing the cluster introduces operational overhead. C. AWS Batch with Bash job definition: Using AWS Batch for this would require creating custom Bash scripts for the transformation and managing jobs. This introduces more complexity and development effort than AWS Glue D. AWS Lambda with S3 event notifications: Lambda is suitable for lightweight, real-time processing. However, converting hundreds of .csv files into Parquet format could exceed Lambda's execution time and resource limits, leading to scalability challenges.
upvoted 1 times
...
lofzee
2 months, 1 week ago
Selected Answer: B
AWS Glue and parquet go hand in hand
upvoted 2 times
...
zinabu
4 months ago
i will go with answer B cause: You can use AWS Glue to write ETL jobs in a Python shell environment. You can also create both batch and streaming ETL jobs by using Python (PySpark) or Scala in a managed Apache Spark environment. Apache Parquet is built to support efficient compression and encoding schemes. It can speed up your analytics workloads because it stores data in a columnar fashion. Converting data to Parquet can save you storage space, cost, and time in the longer run
upvoted 3 times
...
Rido4good
6 months, 3 weeks ago
D I think people are forgetting the question says "Low Overhead".
upvoted 1 times
awsgeek75
6 months, 3 weeks ago
Pray tell, how is a Lambda less overhead than B or even A?
upvoted 3 times
...
...
nileeka97
10 months, 2 weeks ago
Selected Answer: B
Parquet format ========> Amazon Glue
upvoted 4 times
...
Guru4Cloud
11 months ago
Selected Answer: B
B. Create an AWS Glue crawler to discover the data. Create an AWS Glue extract, transform, and load (ETL) job to transform the data. Specify the transformed data bucket in the output step.
upvoted 3 times
...
markw92
1 year, 1 month ago
Least development effort means lambda. Glue also works but more overhead and cost. A simple lambda like this https://github.com/ayshaysha/aws-csv-to-parquet-converter/blob/main/csv-parquet-converter.py can be used to convert as soon as you see files in s3 bucket.
upvoted 4 times
...
achevez85
1 year, 5 months ago
Selected Answer: B
https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/three-aws-glue-etl-job-types-for-converting-data-to-apache-parquet.html
upvoted 3 times
...
Training4aBetterLife
1 year, 6 months ago
Selected Answer: B
S3 provides a single control to automatically encrypt all new objects in a bucket with SSE-S3 or SSE-KMS. Unfortunately, these controls only affect new objects. If your bucket already contains millions of unencrypted objects, then turning on automatic encryption does not make your bucket secure as the unencrypted objects remain. For S3 buckets with a large number of objects (millions to billions), use Amazon S3 Inventory to get a list of the unencrypted objects, and Amazon S3 Batch Operations to encrypt the large number of old, unencrypted files.
upvoted 2 times
Training4aBetterLife
1 year, 6 months ago
Versioning: When you overwrite an S3 object, it results in a new object version in the bucket. However, this will not remove the old unencrypted versions of the object. If you do not delete the old version of your newly encrypted objects, you will be charged for the storage of both versions of the objects. S3 Lifecycle If you want to remove these unencrypted versions, use S3 Lifecycle to expire previous versions of objects. When you add a Lifecycle configuration to a bucket, the configuration rules apply to both existing objects and objects added later. C is missing this step, which I believe is what makes B the better choice. B includes the functionality of encrypting the old unencrypted objects via Batch Operations, whereas, Versioning does not address the old unencrypted objects.
upvoted 1 times
Training4aBetterLife
1 year, 6 months ago
Please delete this. I was meaning to place this response on a different question.
upvoted 2 times
...
...
Training4aBetterLife
1 year, 6 months ago
Please delete this. I was meaning to place this response on a different question.
upvoted 1 times
...
...
Rudraman
1 year, 6 months ago
ETL = Glue
upvoted 4 times
...
Aninina
1 year, 6 months ago
Selected Answer: B
B is the correct answer
upvoted 2 times
...
techhb
1 year, 6 months ago
Selected Answer: B
AWS Glue Crawler is for ETL
upvoted 2 times
...
kbaruu
1 year, 6 months ago
Selected Answer: B
The correct answer is B
upvoted 2 times
...
Mamiololo
1 year, 6 months ago
B is the answer
upvoted 3 times
...
swolfgang
1 year, 6 months ago
Selected Answer: B
ıt should be b
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago