Exam AWS Certified Solutions Architect - Associate SAA-C03 All Questions

View all questions & answers for the AWS Certified Solutions Architect - Associate SAA-C03 exam

Exam AWS Certified Solutions Architect - Associate SAA-C03 topic 1 question 214 discussion

Exam question from Amazon's AWS Certified Solutions Architect - Associate SAA-C03

Question #: 214
Topic #: 1

[All AWS Certified Solutions Architect - Associate SAA-C03 Questions]

A company’s reporting system delivers hundreds of .csv files to an Amazon S3 bucket each day. The company must convert these files to Apache Parquet format and must store the files in a transformed data bucket.

Which solution will meet these requirements with the LEAST development effort?

A. Create an Amazon EMR cluster with Apache Spark installed. Write a Spark application to transform the data. Use EMR File System (EMRFS) to write files to the transformed data bucket.
B. Create an AWS Glue crawler to discover the data. Create an AWS Glue extract, transform, and load (ETL) job to transform the data. Specify the transformed data bucket in the output step.
C. Use AWS Batch to create a job definition with Bash syntax to transform the data and output the data to the transformed data bucket. Use the job definition to submit a job. Specify an array job as the job type.
D. Create an AWS Lambda function to transform the data and output the data to the transformed data bucket. Configure an event notification for the S3 bucket. Specify the Lambda function as the destination for the event notification.

Show Suggested Answer

Suggested Answer: B 🗳️

by Babba at Jan. 14, 2023, 8:50 a.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

Babba

Highly Voted 1 year, 9 months ago

Selected Answer: B

It looks like AWS Glue allows fully managed CSV to Parquet conversion jobs: https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/three-aws-glue-etl-job-types-for-converting-data-to-apache-parquet.html

upvoted 20 times

awsgeek75

9 months, 3 weeks ago

A text book use case: https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/three-aws-glue-etl-job-types-for-converting-data-to-apache-parquet.html#three-aws-glue-etl-job-types-for-converting-data-to-apache-parquet-epics B is the correct answer.

upvoted 2 times

...

cookieMr

Highly Voted 1 year, 3 months ago

Selected Answer: B

AWS Glue is a fully managed ETL service that simplifies the process of preparing and transforming data for analytics. Using AWS Glue requires minimal development effort compared to the other options. Option A requires more development effort as it involves writing a Spark application to transform the data. It also introduces additional infrastructure management with the EMR cluster. Option C requires writing and managing custom Bash scripts for data transformation. It requires more manual effort and does not provide the built-in capabilities of AWS Glue for data transformation. Option D requires developing and managing a custom Lambda for data transformation. While Lambda can handle the transformation, it requires more effort compared to AWS Glue, which is specifically designed for ETL operations. Therefore, option B provides the easiest and least development effort by leveraging AWS Glue's capabilities for data discovery, transformation, and output to the transformed data bucket.

upvoted 10 times

...

satyaammm

Most Recent 2 months ago

Selected Answer: B

AWS Glue is designed for ETL and this scenario.

upvoted 1 times

...

iamroyalty_k

2 months, 4 weeks ago

Selected Answer: B

AWS Glue offers a serverless, automated, and cost-effective solution with minimal development and operational effort, making it the best choice for this use case. Why not the other options? A. Amazon EMR cluster with Apache Spark: While EMR and Spark can handle this task, it requires more setup, maintenance, and development effort compared to AWS Glue. Managing the cluster introduces operational overhead. C. AWS Batch with Bash job definition: Using AWS Batch for this would require creating custom Bash scripts for the transformation and managing jobs. This introduces more complexity and development effort than AWS Glue D. AWS Lambda with S3 event notifications: Lambda is suitable for lightweight, real-time processing. However, converting hundreds of .csv files into Parquet format could exceed Lambda's execution time and resource limits, leading to scalability challenges.

upvoted 1 times

...

lofzee

4 months, 3 weeks ago

Selected Answer: B

AWS Glue and parquet go hand in hand

upvoted 2 times

...

zinabu

6 months, 3 weeks ago

i will go with answer B cause: You can use AWS Glue to write ETL jobs in a Python shell environment. You can also create both batch and streaming ETL jobs by using Python (PySpark) or Scala in a managed Apache Spark environment. Apache Parquet is built to support efficient compression and encoding schemes. It can speed up your analytics workloads because it stores data in a columnar fashion. Converting data to Parquet can save you storage space, cost, and time in the longer run

upvoted 3 times

...

Rido4good

9 months, 1 week ago

D I think people are forgetting the question says "Low Overhead".

upvoted 1 times

awsgeek75

9 months, 1 week ago

Pray tell, how is a Lambda less overhead than B or even A?

upvoted 3 times

...

nileeka97

1 year, 1 month ago

Selected Answer: B

Parquet format ========> Amazon Glue

upvoted 4 times

...

Guru4Cloud

1 year, 1 month ago

Selected Answer: B

B. Create an AWS Glue crawler to discover the data. Create an AWS Glue extract, transform, and load (ETL) job to transform the data. Specify the transformed data bucket in the output step.

upvoted 3 times

...

markw92

1 year, 4 months ago

Least development effort means lambda. Glue also works but more overhead and cost. A simple lambda like this https://github.com/ayshaysha/aws-csv-to-parquet-converter/blob/main/csv-parquet-converter.py can be used to convert as soon as you see files in s3 bucket.

upvoted 4 times

...

achevez85

1 year, 7 months ago

Selected Answer: B

https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/three-aws-glue-etl-job-types-for-converting-data-to-apache-parquet.html

upvoted 3 times

...

Training4aBetterLife

1 year, 9 months ago

Selected Answer: B

S3 provides a single control to automatically encrypt all new objects in a bucket with SSE-S3 or SSE-KMS. Unfortunately, these controls only affect new objects. If your bucket already contains millions of unencrypted objects, then turning on automatic encryption does not make your bucket secure as the unencrypted objects remain. For S3 buckets with a large number of objects (millions to billions), use Amazon S3 Inventory to get a list of the unencrypted objects, and Amazon S3 Batch Operations to encrypt the large number of old, unencrypted files.

upvoted 2 times

Training4aBetterLife

1 year, 9 months ago

Versioning: When you overwrite an S3 object, it results in a new object version in the bucket. However, this will not remove the old unencrypted versions of the object. If you do not delete the old version of your newly encrypted objects, you will be charged for the storage of both versions of the objects. S3 Lifecycle If you want to remove these unencrypted versions, use S3 Lifecycle to expire previous versions of objects. When you add a Lifecycle configuration to a bucket, the configuration rules apply to both existing objects and objects added later. C is missing this step, which I believe is what makes B the better choice. B includes the functionality of encrypting the old unencrypted objects via Batch Operations, whereas, Versioning does not address the old unencrypted objects.

upvoted 1 times

Training4aBetterLife

1 year, 9 months ago

Please delete this. I was meaning to place this response on a different question.

upvoted 2 times

...

Training4aBetterLife

1 year, 9 months ago

Please delete this. I was meaning to place this response on a different question.

upvoted 1 times

...

Rudraman

1 year, 9 months ago

ETL = Glue

upvoted 4 times

...

Aninina

1 year, 9 months ago

Selected Answer: B

B is the correct answer

upvoted 2 times

...

techhb

1 year, 9 months ago

Selected Answer: B

AWS Glue Crawler is for ETL

upvoted 2 times

...

kbaruu

1 year, 9 months ago

Selected Answer: B

The correct answer is B

upvoted 2 times

...

Mamiololo

1 year, 9 months ago

B is the answer

upvoted 3 times

...

Load full discussion...

Exam AWS Certified Solutions Architect - Associate SAA-C03 All Questions

View all questions & answers for the AWS Certified Solutions Architect - Associate SAA-C03 exam

Exam AWS Certified Solutions Architect - Associate SAA-C03 topic 1 question 214 discussion

Comments

Babba

awsgeek75

cookieMr

satyaammm

iamroyalty_k

lofzee

zinabu

Rido4good

awsgeek75

nileeka97

Guru4Cloud

markw92

achevez85

Training4aBetterLife

Training4aBetterLife

Training4aBetterLife

Training4aBetterLife

Rudraman

Aninina

techhb

kbaruu

Mamiololo

SY0-701