Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam AWS Certified Solutions Architect - Associate SAA-C03 All Questions

View all questions & answers for the AWS Certified Solutions Architect - Associate SAA-C03 exam

Exam AWS Certified Solutions Architect - Associate SAA-C03 topic 1 question 214 discussion

A company’s reporting system delivers hundreds of .csv files to an Amazon S3 bucket each day. The company must convert these files to Apache Parquet format and must store the files in a transformed data bucket.

Which solution will meet these requirements with the LEAST development effort?

  • A. Create an Amazon EMR cluster with Apache Spark installed. Write a Spark application to transform the data. Use EMR File System (EMRFS) to write files to the transformed data bucket.
  • B. Create an AWS Glue crawler to discover the data. Create an AWS Glue extract, transform, and load (ETL) job to transform the data. Specify the transformed data bucket in the output step.
  • C. Use AWS Batch to create a job definition with Bash syntax to transform the data and output the data to the transformed data bucket. Use the job definition to submit a job. Specify an array job as the job type.
  • D. Create an AWS Lambda function to transform the data and output the data to the transformed data bucket. Configure an event notification for the S3 bucket. Specify the Lambda function as the destination for the event notification.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
Babba
Highly Voted 1 year, 9 months ago
Selected Answer: B
It looks like AWS Glue allows fully managed CSV to Parquet conversion jobs: https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/three-aws-glue-etl-job-types-for-converting-data-to-apache-parquet.html
upvoted 18 times
awsgeek75
9 months, 2 weeks ago
A text book use case: https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/three-aws-glue-etl-job-types-for-converting-data-to-apache-parquet.html#three-aws-glue-etl-job-types-for-converting-data-to-apache-parquet-epics B is the correct answer.
upvoted 2 times
...
...
cookieMr
Highly Voted 1 year, 3 months ago
Selected Answer: B
AWS Glue is a fully managed ETL service that simplifies the process of preparing and transforming data for analytics. Using AWS Glue requires minimal development effort compared to the other options. Option A requires more development effort as it involves writing a Spark application to transform the data. It also introduces additional infrastructure management with the EMR cluster. Option C requires writing and managing custom Bash scripts for data transformation. It requires more manual effort and does not provide the built-in capabilities of AWS Glue for data transformation. Option D requires developing and managing a custom Lambda for data transformation. While Lambda can handle the transformation, it requires more effort compared to AWS Glue, which is specifically designed for ETL operations. Therefore, option B provides the easiest and least development effort by leveraging AWS Glue's capabilities for data discovery, transformation, and output to the transformed data bucket.
upvoted 8 times
...
lofzee
Most Recent 4 months, 2 weeks ago
Selected Answer: B
AWS Glue and parquet go hand in hand
upvoted 1 times
...
zinabu
6 months, 2 weeks ago
i will go with answer B cause: You can use AWS Glue to write ETL jobs in a Python shell environment. You can also create both batch and streaming ETL jobs by using Python (PySpark) or Scala in a managed Apache Spark environment. Apache Parquet is built to support efficient compression and encoding schemes. It can speed up your analytics workloads because it stores data in a columnar fashion. Converting data to Parquet can save you storage space, cost, and time in the longer run
upvoted 1 times
...
Rido4good
9 months ago
D I think people are forgetting the question says "Low Overhead".
upvoted 1 times
awsgeek75
9 months ago
Pray tell, how is a Lambda less overhead than B or even A?
upvoted 2 times
...
...
nileeka97
1 year ago
Selected Answer: B
Parquet format ========> Amazon Glue
upvoted 3 times
...
Guru4Cloud
1 year, 1 month ago
Selected Answer: B
B. Create an AWS Glue crawler to discover the data. Create an AWS Glue extract, transform, and load (ETL) job to transform the data. Specify the transformed data bucket in the output step.
upvoted 2 times
...
markw92
1 year, 4 months ago
Least development effort means lambda. Glue also works but more overhead and cost. A simple lambda like this https://github.com/ayshaysha/aws-csv-to-parquet-converter/blob/main/csv-parquet-converter.py can be used to convert as soon as you see files in s3 bucket.
upvoted 3 times
...
achevez85
1 year, 7 months ago
Selected Answer: B
https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/three-aws-glue-etl-job-types-for-converting-data-to-apache-parquet.html
upvoted 2 times
...
Training4aBetterLife
1 year, 8 months ago
Selected Answer: B
S3 provides a single control to automatically encrypt all new objects in a bucket with SSE-S3 or SSE-KMS. Unfortunately, these controls only affect new objects. If your bucket already contains millions of unencrypted objects, then turning on automatic encryption does not make your bucket secure as the unencrypted objects remain. For S3 buckets with a large number of objects (millions to billions), use Amazon S3 Inventory to get a list of the unencrypted objects, and Amazon S3 Batch Operations to encrypt the large number of old, unencrypted files.
upvoted 2 times
Training4aBetterLife
1 year, 8 months ago
Please delete this. I was meaning to place this response on a different question.
upvoted 1 times
...
Training4aBetterLife
1 year, 8 months ago
Versioning: When you overwrite an S3 object, it results in a new object version in the bucket. However, this will not remove the old unencrypted versions of the object. If you do not delete the old version of your newly encrypted objects, you will be charged for the storage of both versions of the objects. S3 Lifecycle If you want to remove these unencrypted versions, use S3 Lifecycle to expire previous versions of objects. When you add a Lifecycle configuration to a bucket, the configuration rules apply to both existing objects and objects added later. C is missing this step, which I believe is what makes B the better choice. B includes the functionality of encrypting the old unencrypted objects via Batch Operations, whereas, Versioning does not address the old unencrypted objects.
upvoted 1 times
Training4aBetterLife
1 year, 8 months ago
Please delete this. I was meaning to place this response on a different question.
upvoted 2 times
...
...
...
Rudraman
1 year, 9 months ago
ETL = Glue
upvoted 3 times
...
Aninina
1 year, 9 months ago
Selected Answer: B
B is the correct answer
upvoted 1 times
...
techhb
1 year, 9 months ago
Selected Answer: B
AWS Glue Crawler is for ETL
upvoted 1 times
...
kbaruu
1 year, 9 months ago
Selected Answer: B
The correct answer is B
upvoted 1 times
...
Mamiololo
1 year, 9 months ago
B is the answer
upvoted 2 times
...
swolfgang
1 year, 9 months ago
Selected Answer: B
ıt should be b
upvoted 1 times
...
marcioicebr
1 year, 9 months ago
Selected Answer: B
De acordo com a documentação, a resposta certa é B. https://docs.aws.amazon.com/pt_br/prescriptive-guidance/latest/patterns/three-aws-glue-etl-job-types-for-converting-data-to-apache-parquet.html
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...