exam questions

Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 40 discussion

A company has developed several AWS Glue jobs to validate and transform its data from Amazon S3 and load it into Amazon RDS for MySQL in batches once every day. The ETL jobs read the S3 data using a DynamicFrame. Currently, the ETL developers are experiencing challenges in processing only the incremental data on every run, as the AWS Glue job processes all the S3 input data on each run.
Which approach would allow the developers to solve the issue with minimal coding effort?

  • A. Have the ETL jobs read the data from Amazon S3 using a DataFrame.
  • B. Enable job bookmarks on the AWS Glue jobs.
  • C. Create custom logic on the ETL jobs to track the processed S3 objects.
  • D. Have the ETL jobs delete the processed objects or data from Amazon S3 after each run.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
paul0099
Highly Voted 3 years, 7 months ago
It is B
upvoted 21 times
...
Shraddha
Highly Voted 3 years, 6 months ago
Ans B This is a textbook question. https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html
upvoted 8 times
...
pk349
Most Recent 1 year, 11 months ago
B: I passed the test
upvoted 2 times
...
AwsNewPeople
2 years, 1 month ago
Selected Answer: B
The correct approach to solve the issue with minimal coding effort would be to enable job bookmarks on the AWS Glue jobs. Enabling job bookmarks on the AWS Glue jobs would allow the ETL job to keep track of the last processed record in the data source. This way, on the next run, the job will only process the new or updated data that was added to the source since the last successful run, thus processing only the incremental data. Using DataFrame instead of DynamicFrame or custom logic to track processed S3 objects could require significant coding effort and may not be the most efficient approach. Deleting processed objects or data from Amazon S3 after each run may not be ideal since it may result in loss of valuable historical data. Therefore, enabling job bookmarks is the most appropriate approach to solve the issue with minimal coding effort.
upvoted 5 times
...
cloudlearnerhere
2 years, 5 months ago
Selected Answer: B
Correct answer is B as AWS Glue can be used to export the data incrementally using job bookmarks with coding required. AWS Glue tracks data that has already been processed during a previous run of an ETL job by persisting state information from the job run. This persisted state information is called a job bookmark. Job bookmarks help AWS Glue maintain state information and prevent the reprocessing of old data. With job bookmarks, you can process new data when rerunning on a scheduled interval. A job bookmark is composed of the states for various elements of jobs, such as sources, transformations, and targets. For example, your ETL job might read new partitions in an Amazon S3 file. AWS Glue tracks which partitions the job has processed successfully to prevent duplicate processing and duplicate data in the job's target data store. Job bookmarks are implemented for JDBC data sources, the Relationalize transform, and some Amazon Simple Storage Service (Amazon S3) sources.
upvoted 2 times
...
Arka_01
2 years, 7 months ago
Selected Answer: B
For incremental data, Job bookmark is the built-in feature for Glue.
upvoted 1 times
...
Arka_01
2 years, 7 months ago
For incremental data, Job bookmark is the built-in option to choose for Glue.
upvoted 1 times
...
rocky48
2 years, 9 months ago
Selected Answer: B
B is correct
upvoted 1 times
...
Bik000
2 years, 11 months ago
Selected Answer: B
Answer is B
upvoted 2 times
...
Mobeen_Mehdi
3 years, 5 months ago
its strongly B as book mark only take new data it stops processing preprocessed data
upvoted 4 times
...
rosnl
3 years, 6 months ago
The answer is B, the hint is in the wording 'only in incremental data'.
upvoted 1 times
...
Billhardy
3 years, 6 months ago
Ans B
upvoted 2 times
...
brfc
3 years, 6 months ago
although B is the obvious answer the part of the question that says minimal coding effort suggests it might be D.
upvoted 1 times
gopi_data_guy
2 years, 3 months ago
There is no code change effort you just need to enabled job bookmark. Removing processed data from S3 is the worst option as you are simply loosing the data from your datalake
upvoted 1 times
...
...
lostsoul07
3 years, 6 months ago
B is the right answer
upvoted 1 times
...
BillyC
3 years, 6 months ago
My answer is B
upvoted 1 times
...
syu31svc
3 years, 6 months ago
Job bookmarks help AWS Glue maintain state information and prevent the reprocessing of old data so answer is B 100%
upvoted 2 times
...
Paitan
3 years, 6 months ago
Job Bookmarks should do the trick. So option B.
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago