exam questions

Exam AWS Certified Data Engineer - Associate DEA-C01 All Questions

View all questions & answers for the AWS Certified Data Engineer - Associate DEA-C01 exam

Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 65 discussion

A data engineer must build an extract, transform, and load (ETL) pipeline to process and load data from 10 source systems into 10 tables that are in an Amazon Redshift database. All the source systems generate .csv, JSON, or Apache Parquet files every 15 minutes. The source systems all deliver files into one Amazon S3 bucket. The file sizes range from 10 MB to 20 GB. The ETL pipeline must function correctly despite changes to the data schema.
Which data pipeline solutions will meet these requirements? (Choose two.)

  • A. Use an Amazon EventBridge rule to run an AWS Glue job every 15 minutes. Configure the AWS Glue job to process and load the data into the Amazon Redshift tables.
  • B. Use an Amazon EventBridge rule to invoke an AWS Glue workflow job every 15 minutes. Configure the AWS Glue workflow to have an on-demand trigger that runs an AWS Glue crawler and then runs an AWS Glue job when the crawler finishes running successfully. Configure the AWS Glue job to process and load the data into the Amazon Redshift tables.
  • C. Configure an AWS Lambda function to invoke an AWS Glue crawler when a file is loaded into the S3 bucket. Configure an AWS Glue job to process and load the data into the Amazon Redshift tables. Create a second Lambda function to run the AWS Glue job. Create an Amazon EventBridge rule to invoke the second Lambda function when the AWS Glue crawler finishes running successfully.
  • D. Configure an AWS Lambda function to invoke an AWS Glue workflow when a file is loaded into the S3 bucket. Configure the AWS Glue workflow to have an on-demand trigger that runs an AWS Glue crawler and then runs an AWS Glue job when the crawler finishes running successfully. Configure the AWS Glue job to process and load the data into the Amazon Redshift tables.
  • E. Configure an AWS Lambda function to invoke an AWS Glue job when a file is loaded into the S3 bucket. Configure the AWS Glue job to read the files from the S3 bucket into an Apache Spark DataFrame. Configure the AWS Glue job to also put smaller partitions of the DataFrame into an Amazon Kinesis Data Firehose delivery stream. Configure the delivery stream to load data into the Amazon Redshift tables.
Show Suggested Answer Hide Answer
Suggested Answer: BD 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
rralucard_
Highly Voted 8 months, 3 weeks ago
Selected Answer: BD
Option B: Amazon EventBridge Rule with AWS Glue Workflow Job Every 15 Minutes - for its streamlined process, automated scheduling, and ability to handle schema changes. Option D: AWS Lambda to Invoke AWS Glue Workflow When a File is Loaded - for its responsiveness to file arrival and adaptability to schema changes, though it is slightly more complex than option B.
upvoted 10 times
Felix_G
7 months, 3 weeks ago
D is incorrect! Options C, D and E have issues like unnecessary complexity, latency due to triggers, or limitations in handling large file sizes. So B and A are the best and most robust options that meet all the requirements.
upvoted 1 times
Luke97
6 months, 3 weeks ago
A is NOT correct. The question said "The ETL pipeline must function correctly despite changes to the data schema", therefore run Glue crawler is necessary to handle schema changes.
upvoted 5 times
...
...
...
HagarTheHorrible
Most Recent 4 months ago
Selected Answer: BD
change od schema is the key
upvoted 1 times
...
valuedate
5 months ago
Selected Answer: BD
eventbridge rule or event trigger
upvoted 1 times
...
Ousseyni
6 months, 1 week ago
Selected Answer: AE
ChatCGT sid A and E
upvoted 1 times
valuedate
5 months ago
ChatCGT? ahahaha. A is NOT correct and E its too complex
upvoted 2 times
...
tgv
4 months, 3 weeks ago
You should double check your information.
upvoted 2 times
...
...
Christina666
6 months, 1 week ago
Selected Answer: BD
eventbridge rule or event trigger
upvoted 1 times
...
arvehisa
6 months, 2 weeks ago
I don't think this pipeline should be triggered by an s3 file upload. However seems A cannot handle the data schema change. if s3 trigger is good, then C and E are unnessessarily complexed. so I would go with B & D (despite the s3 trigger)
upvoted 1 times
...
lucas_rfsb
6 months, 3 weeks ago
Selected Answer: BD
I will go with BD
upvoted 3 times
...
Felix_G
7 months, 3 weeks ago
Selected Answer: AB The two data pipeline solutions that will meet the requirements are: A. Use an Amazon EventBridge rule to run an AWS Glue job every 15 minutes. Configure the AWS Glue job to process and load the data into the Amazon Redshift tables. B. Use an Amazon EventBridge rule to invoke an AWS Glue workflow job every 15 minutes. Configure the AWS Glue workflow to have an on-demand trigger that runs an AWS Glue crawler and then runs an AWS Glue job when the crawler finishes running successfully. Configure the AWS Glue job to process and load the data into the Amazon Redshift tables. These solutions leverage AWS Glue to process and load the data from different file formats in the S3 bucket into the Amazon Redshift tables, while also handling changes to the data schema.
upvoted 2 times
chris_spencer
6 months, 1 week ago
A is incorret, it doesn't take care to update the data catalog.
upvoted 1 times
...
...
evntdrvn76
8 months, 3 weeks ago
The correct answers are A. Use an Amazon EventBridge rule to run an AWS Glue job every 15 minutes. Configure the AWS Glue job to process and load the data into the Amazon Redshift tables and B. Use an Amazon EventBridge rule to invoke an AWS Glue workflow job every 15 minutes. Configure the AWS Glue workflow to have an on-demand trigger that runs an AWS Glue crawler and then runs an AWS Glue job when the crawler finishes running successfully. Configure the AWS Glue job to process and load the data into the Amazon Redshift tables. These solutions automate the ETL pipeline with minimal operational overhead.
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago