Exam AWS Certified Data Engineer - Associate DEA-C01 All Questions

View all questions & answers for the AWS Certified Data Engineer - Associate DEA-C01 exam

Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 108 discussion

Exam question from Amazon's AWS Certified Data Engineer - Associate DEA-C01

Question #: 108
Topic #: 1

[All AWS Certified Data Engineer - Associate DEA-C01 Questions]

A data engineer needs to debug an AWS Glue job that reads from Amazon S3 and writes to Amazon Redshift. The data engineer enabled the bookmark feature for the AWS Glue job.
The data engineer has set the maximum concurrency for the AWS Glue job to 1.

The AWS Glue job is successfully writing the output to Amazon Redshift. However, the Amazon S3 files that were loaded during previous runs of the AWS Glue job are being reprocessed by subsequent runs.

What is the likely reason the AWS Glue job is reprocessing the files?

A. The AWS Glue job does not have the s3:GetObjectAcl permission that is required for bookmarks to work correctly.
B. The maximum concurrency for the AWS Glue job is set to 1.
C. The data engineer incorrectly specified an older version of AWS Glue for the Glue job.
D. The AWS Glue job does not have a required commit statement.

Show Suggested Answer

Suggested Answer: D 🗳️

by Bmaster at June 29, 2024, 8:15 a.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

lool

Highly Voted 9 months, 3 weeks ago

Selected Answer: D

https://docs.aws.amazon.com/glue/latest/dg/glue-troubleshooting-errors.html#error-job-bookmarks-reprocess-data

upvoted 8 times

...

bonds

Most Recent 1 week, 1 day ago

Selected Answer: D

Bookmarks need explicit commit after job is complete. If commit did not go through, the state/bookmark did not get saved, and hence the reprocessing of all the previous objects.

upvoted 1 times

...

AgboolaKun

5 months, 3 weeks ago

Selected Answer: D

A "commit" statement within your AWS Glue job script is absolutely required to update the job bookmark and properly track processed data, preventing the reprocessing of old data when running the job again; essentially, if you don't include the commit statement, the job will not remember where it left off and may process data multiple times. For more information about job.commit(), please reference this documentation - https://docs.aws.amazon.com/glue/latest/dg/glue-troubleshooting-errors.html#error-job-bookmarks-reprocess-data

upvoted 2 times

...

rsmf

6 months, 1 week ago

Selected Answer: D

It's B the right answer

upvoted 2 times

...

mohamedTR

6 months, 1 week ago

Selected Answer: A

Commit statements are relevant to transactional operations in databases like Redshift but are not related to S3 bookmarks or Glue’s tracking mechanism for processed files.

upvoted 2 times

...

proserv

6 months, 3 weeks ago

Selected Answer: D

Ensure that your job run script ends with the following commit: job.commit() When you include this object, AWS Glue records the timestamp and path of the job run. If you run the job again with the same path, AWS Glue processes only the new files. If you don't include this object and job bookmarks are enabled, the job reprocesses the already processed files along with the new files and creates redundancy in the job's target data store. https://docs.aws.amazon.com/glue/latest/dg/glue-troubleshooting-errors.html#error-job-bookmarks-reprocess-data

upvoted 2 times

...

azure_bimonster

7 months, 1 week ago

Selected Answer: A

I would go with A option

upvoted 1 times

...

EJGisME

7 months, 3 weeks ago

Selected Answer: A

A. The AWS Glue job does not have the s3:GetObjectAcl permission that is required for bookmarks to work correctly.

upvoted 1 times

...

mzansikiller

8 months, 1 week ago

Selected Answer: A

Answer A this is a job bookmarks permissions issue

upvoted 1 times

...

antun3ra

8 months, 3 weeks ago

Selected Answer: A

For AWS Glue bookmarks to function correctly, the job needs the necessary permissions to read and write bookmark data, including the s3:GetObjectAcl permission. If these permissions are not correctly set, the job may not be able to track which files have already been processed, leading to reprocessing of previously processed files.

upvoted 4 times

...

andrologin

9 months, 2 weeks ago

Selected Answer: D

AWS Glue Job requires the commit statement to save the last successful run/processing

upvoted 2 times

...

HunkyBunky

10 months ago

Selected Answer: D

For me - D looks correct

upvoted 3 times

...

Alagong

10 months ago

Selected Answer: A

The commit statement (Option D) is not required for AWS Glue jobs. AWS Glue commits any open transactions to the database when all the script statements finish running.

upvoted 3 times

andrologin

9 months, 2 weeks ago

It is the commit statement that ensures AWS saves the last successful processing

upvoted 1 times

...

HunkyBunky

9 months, 4 weeks ago

I've not found any information that s3:GetObjectACL is necessary for Glue bookmarks, so I'm pretty sure that A is wrong

upvoted 1 times

...

Bmaster

10 months ago

D is good https://docs.aws.amazon.com/glue/latest/dg/glue-troubleshooting-errors.html#error-job-bookmarks-reprocess-data

upvoted 4 times

...

Exam AWS Certified Data Engineer - Associate DEA-C01 All Questions

View all questions & answers for the AWS Certified Data Engineer - Associate DEA-C01 exam

Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 108 discussion

Comments

lool

bonds

AgboolaKun

rsmf

mohamedTR

proserv

azure_bimonster

EJGisME

mzansikiller

antun3ra

andrologin

HunkyBunky

Alagong

andrologin

HunkyBunky

Bmaster

SY0-701