exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 17 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 17
Topic #: 1
[All Certified Data Engineer Professional Questions]

A production workload incrementally applies updates from an external Change Data Capture feed to a Delta Lake table as an always-on Structured Stream job. When data was initially migrated for this table, OPTIMIZE was executed and most data files were resized to 1 GB. Auto Optimize and Auto Compaction were both turned on for the streaming production job. Recent review of data files shows that most data files are under 64 MB, although each partition in the table contains at least 1 GB of data and the total table size is over 10 TB.
Which of the following likely explains these smaller file sizes?

  • A. Databricks has autotuned to a smaller target file size to reduce duration of MERGE operations
  • B. Z-order indices calculated on the table are preventing file compaction
  • C. Bloom filter indices calculated on the table are preventing file compaction
  • D. Databricks has autotuned to a smaller target file size based on the overall size of data in the table
  • E. Databricks has autotuned to a smaller target file size based on the amount of data in each partition
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
cotardo2077
Highly Voted 1 year, 5 months ago
Selected Answer: A
https://docs.databricks.com/en/delta/tune-file-size.html#autotune-table 'Autotune file size based on workload'
upvoted 12 times
meatpoof
1 week, 4 days ago
Your source doesn't support your answer. It doesn't mention anything about autotuning to increase the speed of merges
upvoted 1 times
...
...
Melik3
Most Recent 6 months ago
Selected Answer: A
It is important here to understand the difference between the partition size and the data files. the partition size is 1GB which is caused by OPTIMIZE and also expected. In each partition are data files. Databricks did an attuning to these datafile and resized them to a small size to be able to do MERGE statements efficiently that's why A is the correct answer
upvoted 3 times
...
imatheushenrique
8 months, 1 week ago
One of the purposes of a optimize execution is the gain in merge oprations, so: A. Databricks has autotuned to a smaller target file size to reduce duration of MERGE operations
upvoted 1 times
...
RiktRikt007
12 months ago
how A is correct ? While Databricks does have autotuning capabilities, it primarily considers the table size. In this case, the table is over 10 TB, which would typically lead to a target file size of 1 GB, not under 64 MB.
upvoted 2 times
...
PrashantTiwari
12 months ago
The target file size is based on the current size of the Delta table. For tables smaller than 2.56 TB, the autotuned target file size is 256 MB. For tables with a size between 2.56 TB and 10 TB, the target size will grow linearly from 256 MB to 1 GB. For tables larger than 10 TB, the target file size is 1 GB. Correct answer is A
upvoted 2 times
...
AziLa
1 year ago
correct ans is A
upvoted 1 times
...
Jay_98_11
1 year ago
Selected Answer: A
A is correct
upvoted 2 times
...
kz_data
1 year ago
Selected Answer: A
correct answer is A
upvoted 1 times
...
BIKRAM063
1 year, 3 months ago
Selected Answer: A
Auto Optimize reduces file size less than 128MB to facilitate quick merge
upvoted 1 times
...
sen411
1 year, 3 months ago
E is the right answer, because the question is why there are small files
upvoted 1 times
...
sturcu
1 year, 3 months ago
Selected Answer: A
Correct
upvoted 1 times
...
azurearch
1 year, 4 months ago
A is correct answer
upvoted 1 times
...
Eertyy
1 year, 5 months ago
E is right answer
upvoted 4 times
Eertyy
1 year, 4 months ago
option A is correct answer as , option E is the likely explanation for the smaller file sizes
upvoted 1 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago