exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 11 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 11
Topic #: 1
[All Certified Data Engineer Professional Questions]

The data engineering team has configured a job to process customer requests to be forgotten (have their data deleted). All user data that needs to be deleted is stored in Delta Lake tables using default table settings.
The team has decided to process all deletions from the previous week as a batch job at 1am each Sunday. The total duration of this job is less than one hour. Every Monday at 3am, a batch job executes a series of VACUUM commands on all Delta Lake tables throughout the organization.
The compliance officer has recently learned about Delta Lake's time travel functionality. They are concerned that this might allow continued access to deleted data.
Assuming all delete logic is correctly implemented, which statement correctly addresses this concern?

  • A. Because the VACUUM command permanently deletes all files containing deleted records, deleted records may be accessible with time travel for around 24 hours.
  • B. Because the default data retention threshold is 24 hours, data files containing deleted records will be retained until the VACUUM job is run the following day.
  • C. Because Delta Lake time travel provides full access to the entire history of a table, deleted records can always be recreated by users with full admin privileges.
  • D. Because Delta Lake's delete statements have ACID guarantees, deleted records will be permanently purged from all storage systems as soon as a delete job completes.
  • E. Because the default data retention threshold is 7 days, data files containing deleted records will be retained until the VACUUM job is run 8 days later.
Show Suggested Answer Hide Answer
Suggested Answer: E 🗳️


Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Highly Voted 1 year, 6 months ago
Answer is E, default retention period is 7 days https://learn.microsoft.com/en-us/azure/databricks/delta/vacuum
upvoted 17 times
Highly Voted 11 months, 2 weeks ago
Selected Answer: A
The answer has to be A. The deletion is done on Sunday 1am and then the next day Monday 3am, VACUUM was initiated, so one can only time travel for about 24 hours.
upvoted 11 times
5 months ago
The default retention threshold for time travel is 7 days. VACUUM which is executed on Monday 3 am will remove history for changes where time travel has expired for previous 7 days.
upvoted 7 times
3 months, 3 weeks ago
upvoted 2 times
Most Recent 2 days, 19 hours ago
Selected Answer: E
E is correct, because the VACUUM retention is of 168h or 7 days, after statments of deletions.
upvoted 1 times
1 month ago
Selected Answer: E
Saturday delete puts the deleted records in the transaction log. The retention clock starts ticking. Since the default for keeping the history is 168 hours (7 days), by no means the following Monday vacuum removes since the clock did not reach 168 hours (7 days) - it is at hour 26.
upvoted 1 times
1 month, 3 weeks ago
Selected Answer: E
I was also thinking in the same way that data will be deleted immediate after the vaccum command is run but it actually logically deletes the data and not physically till the 7 day from the ask of vaccum command. so E is perfect.
upvoted 1 times
2 months ago
Selected Answer: E
Answer E. The retention period for time travel queries in Delta Lake is controlled by a 7-day default, not 24 hours. Hence, the statement (Option A) that deleted records may be accessible for around 24 hours is incorrect in the context of Delta Lake's default retention period.
upvoted 1 times
3 months, 3 weeks ago
Selected Answer: E
Default retention period is 7 days so the vacuum command won't delete the files corresponding to deleted rows at Sunday 1 am but the ones of the previous week instead.
upvoted 1 times
4 months ago
Selected Answer: E
Delta Lake's default retention threshold for old data files (which allows time travel) is 7 days. This means that even after records are deleted, the files that previously contained those records are kept for 7 days before they are eligible for permanent deletion by the VACUUM command. The VACUUM command is responsible for permanently deleting the old data files after the retention period. Since the job runs every Monday, this means that data deleted during the previous week will not be fully purged until after the retention period has passed (which would be 8 days after the deletion, considering the weekly processing).
upvoted 2 times
4 months, 1 week ago
Selected Answer: E
Delete job is running as batch job for all requests made current week on Sunday & Vacuum is ran next day . Since there is no mention of change is retention period then it is 7 days. Vacuum will delete data older than 7 days, i.e. it will delete data of previous week & not current week. Current weeks data will be removed in next week’s vacuum job.
upvoted 1 times
5 months, 3 weeks ago
Selected Answer: E
From the documentation. "The default retention threshold for data files after running VACUUM is 7 days." It doesn't matter if VACUUM is ran the following day, the retention period on a default setup is still 7 days after they do the VACUUM on Monday.
upvoted 3 times
7 months, 1 week ago
Selected Answer: A
They expect the deleted records for the previous week to be deleted Sunday from 1am to 2am. Then the next day(Monday) at 3am approx 24hrs later, the vacuum command is ran. This means the records from the previous week are only around for 24ish hours before they are removed with the vacuum command. They aren't waiting 8 days to run the command, there fore E is wrong.
upvoted 3 times
4 months, 1 week ago
This week's vacuum will remove data of the previous week's delete command since default retention has not changed.
upvoted 2 times
8 months ago
E. Because the default data retention threshold is 7 days, data files containing deleted records will be retained until the VACUUM job is run 8 days later.
upvoted 1 times
8 months, 3 weeks ago
Selected Answer: E
Default retention period is 7 days so newly deleted data on Sunday will be available for next 7 days (even if vacuum was run on Monday as it will delete 7 days old data and not the data that was loaded yesterday "Sunday" )
upvoted 1 times
9 months, 1 week ago
Selected Answer: E
The default retention threshold for data files after running VACUUM is 7 days.
upvoted 1 times
10 months ago
Selected Answer: E
Answer is E
upvoted 1 times
10 months ago
Selected Answer: A
Si bien la data es borrada (DELETE) el domingo, aún se puede recuperar ella mediante time traveling, sólo el día siguiente (lunes) se eliminará esta posibilidad debido a que se ejecuta el VACUUM, en consecuencia la data se podrá recuperar en ese lapso de 24 horas aprox
upvoted 2 times
12 months ago
Selected Answer: E
if i v0: create table, v1: insert 2 reocrds, v2: insert 2 record, v3: delete 2 records, and then run the vacuum command (with default 7 day retention), the delete records will be there and you can access using SELECT * FROM delta_table VERSION AS OF 2;
upvoted 1 times
Community vote distribution
A (35%)
C (25%)
B (20%)
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

Loading ...
Someone Bought Contributor Access for:
London, 1 minute ago