Exam AWS Certified Solutions Architect - Associate SAA-C03 All Questions

View all questions & answers for the AWS Certified Solutions Architect - Associate SAA-C03 exam

Exam AWS Certified Solutions Architect - Associate SAA-C03 topic 1 question 258 discussion

Exam question from Amazon's AWS Certified Solutions Architect - Associate SAA-C03

Question #: 258
Topic #: 1

[All AWS Certified Solutions Architect - Associate SAA-C03 Questions]

A company has an application that places hundreds of .csv files into an Amazon S3 bucket every hour. The files are 1 GB in size. Each time a file is uploaded, the company needs to convert the file to Apache Parquet format and place the output file into an S3 bucket.

Which solution will meet these requirements with the LEAST operational overhead?

A. Create an AWS Lambda function to download the .csv files, convert the files to Parquet format, and place the output files in an S3 bucket. Invoke the Lambda function for each S3 PUT event.
B. Create an Apache Spark job to read the .csv files, convert the files to Parquet format, and place the output files in an S3 bucket. Create an AWS Lambda function for each S3 PUT event to invoke the Spark job.
C. Create an AWS Glue table and an AWS Glue crawler for the S3 bucket where the application places the .csv files. Schedule an AWS Lambda function to periodically use Amazon Athena to query the AWS Glue table, convert the query results into Parquet format, and place the output files into an S3 bucket.
D. Create an AWS Glue extract, transform, and load (ETL) job to convert the .csv files to Parquet format and place the output files into an S3 bucket. Create an AWS Lambda function for each S3 PUT event to invoke the ETL job.

Show Suggested Answer

Suggested Answer: D 🗳️

by bamishr at Jan. 13, 2023, 1:09 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

Parsons

Highly Voted 2 years, 5 months ago

Selected Answer: D

No, D should be correct. "LEAST operational overhead" => Should you fully manage service like Glue instead of manually like the answer A.

upvoted 16 times

awsgeek75

1 year, 6 months ago

I also think it's D but remember that D requires writing ETL logic in AWS Glue (nothing in question says how complex it will be). AWS Lambda for CSV could be simple also (imagine NodeJS and millions of libraries support or Python's parsing) so both could be operationally on par to each other. Logically D makes more sense but in practice, AWS Glue rarely works with out of the box ETL and becomes a maintenance overhead in itself.

upvoted 1 times

...

Guru4Cloud

Highly Voted 1 year, 9 months ago

Selected Answer: D

This solution meets the requirements with the least operational overhead because AWS Glue is a fully managed ETL service that makes it easy to move data between data stores. AWS Glue can read .csv files from an S3 bucket and write the data into Parquet format in another S3 bucket. The AWS Lambda function can be triggered by an S3 PUT event when a new .csv file is uploaded, and it can start the AWS Glue ETL job to convert the file to Parquet format. This solution does not require managing any servers or clusters, which reduces operational overhead.

upvoted 5 times

...

TariqKipkemei

Most Recent 1 year, 9 months ago

Selected Answer: D

AWS Glue can run your extract, transform, and load (ETL) jobs as new data arrives. For example, you can configure AWS Glue to initiate your ETL jobs to run as soon as new data becomes available in Amazon Simple Storage Service (S3). Clearly you don't need a lambda function to initiate the ETL job. https://aws.amazon.com/glue/#:~:text=to%20initiate%20your-,ETL,-jobs%20to%20run Option A requires writing code to perform the file conversion. In the exam option D would the best answer.

upvoted 4 times

...

cookieMr

2 years ago

D is correct

upvoted 2 times

...

cookieMr

2 years ago

A. introduces significant operational overhead. This approach requires managing the Lambda, handling concurrency, and ensuring proper error handling for large file sizes, which can be challenging. B. adds unnecessary complexity and operational overhead. Managing the Spark job, handling scalability, and coordinating the Lambda invocations for each file upload can be cumbersome. C. introduces additional complexity and may not be the most efficient solution. It involves managing Glue resources, scheduling Lambda, and querying data even when no new files are uploaded. Option D leverages AWS Glue's ETL capabilities, allowing you to define and execute a data transformation job at scale. By invoking the ETL job using an Lambda function for each S3 PUT event, you can ensure that files are efficiently converted to Parquet format without the need for manual intervention. This approach minimizes operational overhead and provides a streamlined and scalable solution.

upvoted 5 times

...

F629

2 years ago

Selected Answer: A

Both A and D can works, but A is more simple. It's more close to the "Least Operational effort".

upvoted 1 times

pentium75

1 year, 6 months ago

Creating, maintaining and supporting custom code that does the same as a ready-made serverless service is NEVER "least operational effort".

upvoted 2 times

...

pentium75

1 year, 6 months ago

Oh, and A can't handle 1 GB files.

upvoted 1 times

jaswantn

1 year, 4 months ago

Now Lambda support 1 GB to 10 GB.

upvoted 2 times

JA2018

7 months, 2 weeks ago

But what if the file conversion process exceeds more than 15 minutes for each file? How does Lambda fits into this picture?

upvoted 2 times

...

shanwford

2 years, 2 months ago

Selected Answer: D

The maximum size for a Lambda event payload is 256 KB - so (A) didn't work with 1GB Files. Glue is recommended for the Parquet Transformation of AWS.

upvoted 3 times

...

jennyka76

2 years, 4 months ago

ANS - d https://aws.amazon.com/blogs/database/how-to-extract-transform-and-load-data-for-analytic-processing-using-aws-glue-part-2/ - READ ARTICLE -

upvoted 3 times

...

aws4myself

2 years, 5 months ago

Here A is the correct answer. The reason here is the least operational overhead. A ==> S3 - Lambda - S3 D ==> S3 - Lambda - Glue - S3 Also, glue cannot convert on fly automatically, you need to write some code there. If you write the same code in lambda it will convert the same and push the file to S3 Lambda has max memory of 128 MB to 10 GB. So, it can handle it easily. And we need to consider cost also, glue cost is more. Hope many from this forum realize these differences.

upvoted 5 times

JA2018

7 months, 2 weeks ago

But what if the file conversion process exceeds more than 15 minutes for each file? How does Lambda fits into this picture?

upvoted 1 times

JA2018

7 months, 2 weeks ago

IIRC, many folks on this forum mentioned that, rule of thumbs for AWS exams, opt for AWS managed services...

upvoted 1 times

...

LuckyAro

2 years, 5 months ago

We also need to stay with the question, cost was not a consideration in the question.

upvoted 2 times

...

nder

2 years, 4 months ago

Cost is not a factor. AWS Glue is a fully managed service therefore, it's the least operational overhead

upvoted 5 times

...

JayBee65

2 years, 5 months ago

A is unlikely to work as Lambda may struggle with 1GB size: "< 64 MB, beyond which lambda is likely to hit memory caps", see https://stackoverflow.com/questions/41504095/creating-a-parquet-file-on-aws-lambda-function

upvoted 3 times

...