Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 40 discussion

Exam question from Amazon's AWS Certified Data Analytics - Specialty

Question #: 40
Topic #: 1

[All AWS Certified Data Analytics - Specialty Questions]

A company has developed several AWS Glue jobs to validate and transform its data from Amazon S3 and load it into Amazon RDS for MySQL in batches once every day. The ETL jobs read the S3 data using a DynamicFrame. Currently, the ETL developers are experiencing challenges in processing only the incremental data on every run, as the AWS Glue job processes all the S3 input data on each run.
Which approach would allow the developers to solve the issue with minimal coding effort?

A. Have the ETL jobs read the data from Amazon S3 using a DataFrame.
B. Enable job bookmarks on the AWS Glue jobs.
C. Create custom logic on the ETL jobs to track the processed S3 objects.
D. Have the ETL jobs delete the processed objects or data from Amazon S3 after each run.

Show Suggested Answer

Suggested Answer: B 🗳️

by paul0099 at Aug. 16, 2020, 12:05 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

paul0099

Highly Voted 3 years, 7 months ago

It is B

upvoted 21 times

...

Shraddha

Highly Voted 3 years, 6 months ago

Ans B This is a textbook question. https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html

upvoted 8 times

...

pk349

Most Recent 1 year, 11 months ago

B: I passed the test

upvoted 2 times

...

AwsNewPeople

2 years, 1 month ago

Selected Answer: B

The correct approach to solve the issue with minimal coding effort would be to enable job bookmarks on the AWS Glue jobs. Enabling job bookmarks on the AWS Glue jobs would allow the ETL job to keep track of the last processed record in the data source. This way, on the next run, the job will only process the new or updated data that was added to the source since the last successful run, thus processing only the incremental data. Using DataFrame instead of DynamicFrame or custom logic to track processed S3 objects could require significant coding effort and may not be the most efficient approach. Deleting processed objects or data from Amazon S3 after each run may not be ideal since it may result in loss of valuable historical data. Therefore, enabling job bookmarks is the most appropriate approach to solve the issue with minimal coding effort.

upvoted 5 times

...

cloudlearnerhere

2 years, 5 months ago

Selected Answer: B

Correct answer is B as AWS Glue can be used to export the data incrementally using job bookmarks with coding required. AWS Glue tracks data that has already been processed during a previous run of an ETL job by persisting state information from the job run. This persisted state information is called a job bookmark. Job bookmarks help AWS Glue maintain state information and prevent the reprocessing of old data. With job bookmarks, you can process new data when rerunning on a scheduled interval. A job bookmark is composed of the states for various elements of jobs, such as sources, transformations, and targets. For example, your ETL job might read new partitions in an Amazon S3 file. AWS Glue tracks which partitions the job has processed successfully to prevent duplicate processing and duplicate data in the job's target data store. Job bookmarks are implemented for JDBC data sources, the Relationalize transform, and some Amazon Simple Storage Service (Amazon S3) sources.

upvoted 2 times

...

Arka_01

2 years, 7 months ago

Selected Answer: B

For incremental data, Job bookmark is the built-in feature for Glue.

upvoted 1 times

...

Arka_01

2 years, 7 months ago

For incremental data, Job bookmark is the built-in option to choose for Glue.

upvoted 1 times

...

rocky48

2 years, 9 months ago

Selected Answer: B

B is correct

upvoted 1 times

...

Bik000

2 years, 11 months ago

Selected Answer: B

Answer is B

upvoted 2 times

...

Mobeen_Mehdi

3 years, 5 months ago

its strongly B as book mark only take new data it stops processing preprocessed data

upvoted 4 times

...

rosnl

3 years, 6 months ago

The answer is B, the hint is in the wording 'only in incremental data'.

upvoted 1 times

...

Billhardy

3 years, 6 months ago

Ans B

upvoted 2 times

...

brfc

3 years, 6 months ago

although B is the obvious answer the part of the question that says minimal coding effort suggests it might be D.

upvoted 1 times

gopi_data_guy

2 years, 3 months ago

There is no code change effort you just need to enabled job bookmark. Removing processed data from S3 is the worst option as you are simply loosing the data from your datalake

upvoted 1 times

...

lostsoul07

3 years, 6 months ago

B is the right answer

upvoted 1 times

...

BillyC

3 years, 6 months ago

My answer is B

upvoted 1 times

...

syu31svc

3 years, 6 months ago

Job bookmarks help AWS Glue maintain state information and prevent the reprocessing of old data so answer is B 100%

upvoted 2 times

...

Paitan

3 years, 6 months ago

Job Bookmarks should do the trick. So option B.

upvoted 1 times

...

Load full discussion...

Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 40 discussion

Comments

paul0099

Shraddha

pk349

AwsNewPeople

cloudlearnerhere

Arka_01

Arka_01

rocky48

Bik000

Mobeen_Mehdi

rosnl

Billhardy

brfc

gopi_data_guy

lostsoul07

BillyC

syu31svc

Paitan

SY0-701