Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 17 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 17
Topic #: 1
[All Certified Data Engineer Professional Questions]

A production workload incrementally applies updates from an external Change Data Capture feed to a Delta Lake table as an always-on Structured Stream job. When data was initially migrated for this table, OPTIMIZE was executed and most data files were resized to 1 GB. Auto Optimize and Auto Compaction were both turned on for the streaming production job. Recent review of data files shows that most data files are under 64 MB, although each partition in the table contains at least 1 GB of data and the total table size is over 10 TB.
Which of the following likely explains these smaller file sizes?

  • A. Databricks has autotuned to a smaller target file size to reduce duration of MERGE operations
  • B. Z-order indices calculated on the table are preventing file compaction
  • C. Bloom filter indices calculated on the table are preventing file compaction
  • D. Databricks has autotuned to a smaller target file size based on the overall size of data in the table
  • E. Databricks has autotuned to a smaller target file size based on the amount of data in each partition
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
cotardo2077
Highly Voted 1 year, 2 months ago
Selected Answer: A
https://docs.databricks.com/en/delta/tune-file-size.html#autotune-table 'Autotune file size based on workload'
upvoted 9 times
...
Melik3
Most Recent 3 months, 3 weeks ago
Selected Answer: A
It is important here to understand the difference between the partition size and the data files. the partition size is 1GB which is caused by OPTIMIZE and also expected. In each partition are data files. Databricks did an attuning to these datafile and resized them to a small size to be able to do MERGE statements efficiently that's why A is the correct answer
upvoted 3 times
...
imatheushenrique
5 months, 3 weeks ago
One of the purposes of a optimize execution is the gain in merge oprations, so: A. Databricks has autotuned to a smaller target file size to reduce duration of MERGE operations
upvoted 1 times
...
RiktRikt007
9 months, 2 weeks ago
how A is correct ? While Databricks does have autotuning capabilities, it primarily considers the table size. In this case, the table is over 10 TB, which would typically lead to a target file size of 1 GB, not under 64 MB.
upvoted 2 times
...
PrashantTiwari
9 months, 2 weeks ago
The target file size is based on the current size of the Delta table. For tables smaller than 2.56 TB, the autotuned target file size is 256 MB. For tables with a size between 2.56 TB and 10 TB, the target size will grow linearly from 256 MB to 1 GB. For tables larger than 10 TB, the target file size is 1 GB. Correct answer is A
upvoted 2 times
...
AziLa
10 months, 1 week ago
correct ans is A
upvoted 1 times
...
Jay_98_11
10 months, 2 weeks ago
Selected Answer: A
A is correct
upvoted 2 times
...
kz_data
10 months, 2 weeks ago
Selected Answer: A
correct answer is A
upvoted 1 times
...
BIKRAM063
1 year ago
Selected Answer: A
Auto Optimize reduces file size less than 128MB to facilitate quick merge
upvoted 1 times
...
sen411
1 year, 1 month ago
E is the right answer, because the question is why there are small files
upvoted 1 times
...
sturcu
1 year, 1 month ago
Selected Answer: A
Correct
upvoted 1 times
...
azurearch
1 year, 2 months ago
A is correct answer
upvoted 1 times
...
Eertyy
1 year, 2 months ago
E is right answer
upvoted 4 times
Eertyy
1 year, 2 months ago
option A is correct answer as , option E is the likely explanation for the smaller file sizes
upvoted 1 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...