Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 92 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 92
Topic #: 1
[All Certified Data Engineer Professional Questions]

In order to prevent accidental commits to production data, a senior data engineer has instituted a policy that all development work will reference clones of Delta Lake tables. After testing both DEEP and SHALLOW CLONE, development tables are created using SHALLOW CLONE.

A few weeks after initial table creation, the cloned versions of several tables implemented as Type 1 Slowly Changing Dimension (SCD) stop working. The transaction logs for the source tables show that VACUUM was run the day before.

Which statement describes why the cloned tables are no longer working?

  • A. Because Type 1 changes overwrite existing records, Delta Lake cannot guarantee data consistency for cloned tables.
  • B. Running VACUUM automatically invalidates any shallow clones of a table; DEEP CLONE should always be used when a cloned table will be repeatedly queried.
  • C. Tables created with SHALLOW CLONE are automatically deleted after their default retention threshold of 7 days.
  • D. The metadata created by the CLONE operation is referencing data files that were purged as invalid by the VACUUM command.
  • E. The data files compacted by VACUUM are not tracked by the cloned metadata; running REFRESH on the cloned table will pull in recent changes.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
alexvno
Highly Voted 11 months, 1 week ago
Selected Answer: D
Shallow clone: only duplicates the metadata of the table being cloned; the data files of the table itself are not copied. These clones are cheaper to create but are not self-contained and depend on the source from which they were cloned as the source of data. If the files in the source that the clone depends on are removed, for example with VACUUM, a shallow clone may become unusable. Therefore, shallow clones are typically used for short-lived use cases such as testing and experimentation.
upvoted 6 times
...
benni_ale
Most Recent 1 month, 1 week ago
I was not sure whether B or D but somehow I think that running VACUUM comand does not invalidate SHALLOW CLONEs . I mean its just that the data referenced by the clone is no longer present. It can still happen that a SHALLOW CLONE is working even after a VACUUM command run on the cloned table (origin) . So B is not completely correct.
upvoted 1 times
...
vctrhugo
9 months, 3 weeks ago
Selected Answer: D
In Delta Lake, the VACUUM command deletes data files that are no longer referenced by a Delta table and are older than the retention threshold. When a table is cloned using SHALLOW CLONE, the clone references the same data files as the original table but creates a new transaction log. If VACUUM is run on the original table, it can delete data files that are still being referenced by the cloned table’s metadata, causing the cloned table to stop working. This is because the VACUUM command doesn’t know about the cloned table’s references to the data files. Therefore, it’s important to be cautious when running VACUUM on tables that have clones.
upvoted 1 times
...
spaceexplorer
10 months ago
Selected Answer: D
D is correct
upvoted 1 times
...
AzureDE2522
1 year ago
Selected Answer: D
Please refer: https://docs.databricks.com/en/delta/clone.html#what-are-the-semantics-of-delta-clone-operations
upvoted 2 times
...
60ties
1 year ago
Selected Answer: B
B is best
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...