Exam AWS Certified Data Engineer - Associate DEA-C01 All Questions

View all questions & answers for the AWS Certified Data Engineer - Associate DEA-C01 exam

Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 65 discussion

Exam question from Amazon's AWS Certified Data Engineer - Associate DEA-C01

Question #: 65
Topic #: 1

[All AWS Certified Data Engineer - Associate DEA-C01 Questions]

A data engineer must build an extract, transform, and load (ETL) pipeline to process and load data from 10 source systems into 10 tables that are in an Amazon Redshift database. All the source systems generate .csv, JSON, or Apache Parquet files every 15 minutes. The source systems all deliver files into one Amazon S3 bucket. The file sizes range from 10 MB to 20 GB. The ETL pipeline must function correctly despite changes to the data schema.
Which data pipeline solutions will meet these requirements? (Choose two.)

A. Use an Amazon EventBridge rule to run an AWS Glue job every 15 minutes. Configure the AWS Glue job to process and load the data into the Amazon Redshift tables.
B. Use an Amazon EventBridge rule to invoke an AWS Glue workflow job every 15 minutes. Configure the AWS Glue workflow to have an on-demand trigger that runs an AWS Glue crawler and then runs an AWS Glue job when the crawler finishes running successfully. Configure the AWS Glue job to process and load the data into the Amazon Redshift tables.
C. Configure an AWS Lambda function to invoke an AWS Glue crawler when a file is loaded into the S3 bucket. Configure an AWS Glue job to process and load the data into the Amazon Redshift tables. Create a second Lambda function to run the AWS Glue job. Create an Amazon EventBridge rule to invoke the second Lambda function when the AWS Glue crawler finishes running successfully.
D. Configure an AWS Lambda function to invoke an AWS Glue workflow when a file is loaded into the S3 bucket. Configure the AWS Glue workflow to have an on-demand trigger that runs an AWS Glue crawler and then runs an AWS Glue job when the crawler finishes running successfully. Configure the AWS Glue job to process and load the data into the Amazon Redshift tables.
E. Configure an AWS Lambda function to invoke an AWS Glue job when a file is loaded into the S3 bucket. Configure the AWS Glue job to read the files from the S3 bucket into an Apache Spark DataFrame. Configure the AWS Glue job to also put smaller partitions of the DataFrame into an Amazon Kinesis Data Firehose delivery stream. Configure the delivery stream to load data into the Amazon Redshift tables.

Show Suggested Answer

Suggested Answer: BD 🗳️

by evntdrvn76 at Feb. 3, 2024, 5:21 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

rralucard_

Highly Voted 8 months, 3 weeks ago

Selected Answer: BD

Option B: Amazon EventBridge Rule with AWS Glue Workflow Job Every 15 Minutes - for its streamlined process, automated scheduling, and ability to handle schema changes. Option D: AWS Lambda to Invoke AWS Glue Workflow When a File is Loaded - for its responsiveness to file arrival and adaptability to schema changes, though it is slightly more complex than option B.

upvoted 10 times

Felix_G

7 months, 3 weeks ago

D is incorrect! Options C, D and E have issues like unnecessary complexity, latency due to triggers, or limitations in handling large file sizes. So B and A are the best and most robust options that meet all the requirements.

upvoted 1 times

Luke97

6 months, 3 weeks ago

A is NOT correct. The question said "The ETL pipeline must function correctly despite changes to the data schema", therefore run Glue crawler is necessary to handle schema changes.

upvoted 5 times

...

HagarTheHorrible

Most Recent 4 months ago

Selected Answer: BD

change od schema is the key

upvoted 1 times

...

valuedate

5 months ago

Selected Answer: BD

eventbridge rule or event trigger

upvoted 1 times

...

Ousseyni

6 months, 1 week ago

Selected Answer: AE

ChatCGT sid A and E

upvoted 1 times

valuedate

5 months ago

ChatCGT? ahahaha. A is NOT correct and E its too complex

upvoted 2 times

...

tgv

4 months, 3 weeks ago

You should double check your information.

upvoted 2 times

...

Christina666

6 months, 1 week ago

Selected Answer: BD

eventbridge rule or event trigger

upvoted 1 times

...

arvehisa

6 months, 2 weeks ago

I don't think this pipeline should be triggered by an s3 file upload. However seems A cannot handle the data schema change. if s3 trigger is good, then C and E are unnessessarily complexed. so I would go with B & D (despite the s3 trigger)

upvoted 1 times

...

lucas_rfsb

6 months, 3 weeks ago

Selected Answer: BD

I will go with BD

upvoted 3 times

...

Felix_G

7 months, 3 weeks ago

Selected Answer: AB The two data pipeline solutions that will meet the requirements are: A. Use an Amazon EventBridge rule to run an AWS Glue job every 15 minutes. Configure the AWS Glue job to process and load the data into the Amazon Redshift tables. B. Use an Amazon EventBridge rule to invoke an AWS Glue workflow job every 15 minutes. Configure the AWS Glue workflow to have an on-demand trigger that runs an AWS Glue crawler and then runs an AWS Glue job when the crawler finishes running successfully. Configure the AWS Glue job to process and load the data into the Amazon Redshift tables. These solutions leverage AWS Glue to process and load the data from different file formats in the S3 bucket into the Amazon Redshift tables, while also handling changes to the data schema.

upvoted 2 times

chris_spencer

6 months, 1 week ago

A is incorret, it doesn't take care to update the data catalog.

upvoted 1 times

...

evntdrvn76

8 months, 3 weeks ago

The correct answers are A. Use an Amazon EventBridge rule to run an AWS Glue job every 15 minutes. Configure the AWS Glue job to process and load the data into the Amazon Redshift tables and B. Use an Amazon EventBridge rule to invoke an AWS Glue workflow job every 15 minutes. Configure the AWS Glue workflow to have an on-demand trigger that runs an AWS Glue crawler and then runs an AWS Glue job when the crawler finishes running successfully. Configure the AWS Glue job to process and load the data into the Amazon Redshift tables. These solutions automate the ETL pipeline with minimal operational overhead.

upvoted 1 times

...

Exam AWS Certified Data Engineer - Associate DEA-C01 All Questions

View all questions & answers for the AWS Certified Data Engineer - Associate DEA-C01 exam

Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 65 discussion

Comments

rralucard_

Felix_G

Luke97

HagarTheHorrible

valuedate

Ousseyni

valuedate

tgv

Christina666

arvehisa

lucas_rfsb

Felix_G

chris_spencer

evntdrvn76

SY0-701