Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 11 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 11
Topic #: 1
[All Certified Data Engineer Professional Questions]

The data engineering team has configured a job to process customer requests to be forgotten (have their data deleted). All user data that needs to be deleted is stored in Delta Lake tables using default table settings.
The team has decided to process all deletions from the previous week as a batch job at 1am each Sunday. The total duration of this job is less than one hour. Every Monday at 3am, a batch job executes a series of VACUUM commands on all Delta Lake tables throughout the organization.
The compliance officer has recently learned about Delta Lake's time travel functionality. They are concerned that this might allow continued access to deleted data.
Assuming all delete logic is correctly implemented, which statement correctly addresses this concern?

  • A. Because the VACUUM command permanently deletes all files containing deleted records, deleted records may be accessible with time travel for around 24 hours.
  • B. Because the default data retention threshold is 24 hours, data files containing deleted records will be retained until the VACUUM job is run the following day.
  • C. Because Delta Lake time travel provides full access to the entire history of a table, deleted records can always be recreated by users with full admin privileges.
  • D. Because Delta Lake's delete statements have ACID guarantees, deleted records will be permanently purged from all storage systems as soon as a delete job completes.
  • E. Because the default data retention threshold is 7 days, data files containing deleted records will be retained until the VACUUM job is run 8 days later.
Show Suggested Answer Hide Answer
Suggested Answer: E 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
asmayassineg
Highly Voted 1 year, 3 months ago
Answer is E, default retention period is 7 days https://learn.microsoft.com/en-us/azure/databricks/delta/vacuum
upvoted 14 times
...
mardigras
Highly Voted 9 months ago
Selected Answer: A
The answer has to be A. The deletion is done on Sunday 1am and then the next day Monday 3am, VACUUM was initiated, so one can only time travel for about 24 hours.
upvoted 8 times
csrazdan
2 months, 2 weeks ago
The default retention threshold for time travel is 7 days. VACUUM which is executed on Monday 3 am will remove history for changes where time travel has expired for previous 7 days.
upvoted 4 times
benni_ale
1 month, 1 week ago
Exactly!
upvoted 1 times
...
...
...
benni_ale
Most Recent 1 month, 1 week ago
Selected Answer: E
Default retention period is 7 days so the vacuum command won't delete the files corresponding to deleted rows at Sunday 1 am but the ones of the previous week instead.
upvoted 1 times
...
tangerine141
1 month, 3 weeks ago
Selected Answer: E
Delta Lake's default retention threshold for old data files (which allows time travel) is 7 days. This means that even after records are deleted, the files that previously contained those records are kept for 7 days before they are eligible for permanent deletion by the VACUUM command. The VACUUM command is responsible for permanently deleting the old data files after the retention period. Since the job runs every Monday, this means that data deleted during the previous week will not be fully purged until after the retention period has passed (which would be 8 days after the deletion, considering the weekly processing).
upvoted 2 times
...
akashdesarda
1 month, 3 weeks ago
Selected Answer: E
Delete job is running as batch job for all requests made current week on Sunday & Vacuum is ran next day . Since there is no mention of change is retention period then it is 7 days. Vacuum will delete data older than 7 days, i.e. it will delete data of previous week & not current week. Current weeks data will be removed in next week’s vacuum job.
upvoted 1 times
...
fe3b2fc
3 months, 1 week ago
Selected Answer: E
From the documentation. "The default retention threshold for data files after running VACUUM is 7 days." It doesn't matter if VACUUM is ran the following day, the retention period on a default setup is still 7 days after they do the VACUUM on Monday.
upvoted 3 times
...
03355a2
5 months ago
Selected Answer: A
They expect the deleted records for the previous week to be deleted Sunday from 1am to 2am. Then the next day(Monday) at 3am approx 24hrs later, the vacuum command is ran. This means the records from the previous week are only around for 24ish hours before they are removed with the vacuum command. They aren't waiting 8 days to run the command, there fore E is wrong.
upvoted 3 times
akashdesarda
1 month, 3 weeks ago
This week's vacuum will remove data of the previous week's delete command since default retention has not changed.
upvoted 1 times
...
...
imatheushenrique
5 months, 3 weeks ago
E. Because the default data retention threshold is 7 days, data files containing deleted records will be retained until the VACUUM job is run 8 days later.
upvoted 1 times
...
coercion
6 months, 1 week ago
Selected Answer: E
Default retention period is 7 days so newly deleted data on Sunday will be available for next 7 days (even if vacuum was run on Monday as it will delete 7 days old data and not the data that was loaded yesterday "Sunday" )
upvoted 1 times
...
Tayari
6 months, 4 weeks ago
Selected Answer: E
The default retention threshold for data files after running VACUUM is 7 days.
upvoted 1 times
...
hedbergare
7 months, 2 weeks ago
Selected Answer: E
Answer is E
upvoted 1 times
...
juliom6
7 months, 2 weeks ago
Selected Answer: A
Si bien la data es borrada (DELETE) el domingo, aún se puede recuperar ella mediante time traveling, sólo el día siguiente (lunes) se eliminará esta posibilidad debido a que se ejecuta el VACUUM, en consecuencia la data se podrá recuperar en ese lapso de 24 horas aprox
upvoted 2 times
...
RiktRikt007
9 months, 2 weeks ago
Selected Answer: E
if i v0: create table, v1: insert 2 reocrds, v2: insert 2 record, v3: delete 2 records, and then run the vacuum command (with default 7 day retention), the delete records will be there and you can access using SELECT * FROM delta_table VERSION AS OF 2;
upvoted 1 times
...
spaceexplorer
10 months ago
Selected Answer: E
Answer is E
upvoted 1 times
...
kz_data
10 months, 2 weeks ago
Selected Answer: E
Answer is E
upvoted 1 times
...
kz_data
10 months, 2 weeks ago
Answer is E as the default retention period is 7 days
upvoted 1 times
...
RafaelCFC
10 months, 2 weeks ago
Selected Answer: E
Correct according to the documentation: https://docs.databricks.com/en/sql/language-manual/delta-vacuum.html
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...