Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam AWS Certified Solutions Architect - Associate SAA-C03 All Questions

View all questions & answers for the AWS Certified Solutions Architect - Associate SAA-C03 exam

Exam AWS Certified Solutions Architect - Associate SAA-C03 topic 1 question 103 discussion

A company has an AWS Glue extract, transform, and load (ETL) job that runs every day at the same time. The job processes XML data that is in an Amazon S3 bucket. New data is added to the S3 bucket every day. A solutions architect notices that AWS Glue is processing all the data during each run.
What should the solutions architect do to prevent AWS Glue from reprocessing old data?

  • A. Edit the job to use job bookmarks.
  • B. Edit the job to delete data after the data is processed.
  • C. Edit the job by setting the NumberOfWorkers field to 1.
  • D. Use a FindMatches machine learning (ML) transform.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
123jhl0
Highly Voted 1 year, 12 months ago
Selected Answer: A
This is the purpose of bookmarks: "AWS Glue tracks data that has already been processed during a previous run of an ETL job by persisting state information from the job run. This persisted state information is called a job bookmark. Job bookmarks help AWS Glue maintain state information and prevent the reprocessing of old data." https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html
upvoted 48 times
...
cookieMr
Highly Voted 1 year, 3 months ago
Selected Answer: A
A. Job bookmarks in Glue allow you to track the last-processed data in a job. By enabling job bookmarks, Glue keeps track of the processed data and automatically resumes processing from where it left off in subsequent job runs. B. Results in the permanent removal of the data from the S3, making it unavailable for future job runs. This is not desirable if the data needs to be retained or used for subsequent analysis. C.It would only affect the parallelism of the job but would not address the issue of reprocessing old data. It does not provide a mechanism to track the processed data or skip already processed data. D. It is not directly related to preventing Glue from reprocessing old data. The FindMatches transform is used for identifying and matching duplicate or matching records in a dataset. While it can be used in data processing pipelines, it does not address the specific requirement of avoiding reprocessing old data in this scenario.
upvoted 9 times
...
awsgeek75
Most Recent 9 months ago
Selected Answer: A
B: Glue can delete DataSet but this option is too vague to consider or too open to mean anything C: Won't help with repeated ETL. This property affects parallelism D: Too vague
upvoted 1 times
...
Ruffyit
11 months, 3 weeks ago
https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html
upvoted 2 times
...
Guru4Cloud
1 year, 2 months ago
Selected Answer: A
The best solution is to edit the AWS Glue job to use job bookmarks. Job bookmarks allow AWS Glue ETL jobs to track which data has already been processed during previous runs. This prevents reprocessing of old data. Deleting the data after processing would cause the data to be lost and unavailable for future processing. Reducing the number of workers may improve performance but does not prevent reprocessing of old data. Using a FindMatches ML transform is used for record matching, not preventing reprocessing. So the solutions architect should enable job bookmarks in the AWS Glue job configuration. This will allow the ETL job to keep track of processed data and only transform the new data added since the last run.
upvoted 1 times
...
bedwal2020
1 year, 5 months ago
Selected Answer: A
Job bookmark to make sure that the glue job will not process already processed files.
upvoted 1 times
...
Heric
1 year, 6 months ago
Selected Answer: A
Job bookmarks are used in AWS Glue ETL jobs to keep track of the data that has already been processed in a previous job run. With bookmarks enabled, AWS Glue will read the bookmark information from the previous job run and will only process the new data that has been added to the data source since the last job run. This saves time and reduces costs by eliminating the need to reprocess old data. Therefore, a solutions architect should edit the AWS Glue ETL job to use job bookmarks so that it will only process new data added to the S3 bucket since the last job run.
upvoted 2 times
...
linux_admin
1 year, 6 months ago
Selected Answer: A
Job bookmarks enable AWS Glue to track the data that has been processed in a previous run of the job. With job bookmarks enabled, AWS Glue will only process new data that has been added to the S3 bucket since the previous run of the job, rather than reprocessing all data every time the job runs.
upvoted 2 times
...
gustavtd
1 year, 9 months ago
Delete files in S3 freely is not good. so B is not correct,
upvoted 1 times
...
techhb
1 year, 9 months ago
Selected Answer: A
A is correct
upvoted 1 times
...
Buruguduystunstugudunstuy
1 year, 9 months ago
Selected Answer: A
Option A. Edit the job to use job bookmarks. Job bookmarks in AWS Glue allow the ETL job to track the data that has been processed and to skip data that has already been processed. This can prevent AWS Glue from reprocessing old data and can improve the performance of the ETL job by only processing new data. To use job bookmarks, the solutions architect can edit the job and set the "Use job bookmark" option to "True". The ETL job will then use the job bookmark to track the data that has been processed and skip data that has already been processed in subsequent runs.
upvoted 3 times
...
career360guru
1 year, 10 months ago
Selected Answer: A
Option A
upvoted 1 times
...
SilentMilli
1 year, 10 months ago
Selected Answer: A
It's obviously A. Bookmarks serve this purpose
upvoted 1 times
...
Wpcorgan
1 year, 10 months ago
A is correct
upvoted 2 times
...
LeGloupier
1 year, 12 months ago
Selected Answer: A
A https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html
upvoted 3 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...