Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 22 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 22
Topic #: 1
[All Certified Data Engineer Professional Questions]

Which statement describes Delta Lake Auto Compaction?

  • A. An asynchronous job runs after the write completes to detect if files could be further compacted; if yes, an OPTIMIZE job is executed toward a default of 1 GB.
  • B. Before a Jobs cluster terminates, OPTIMIZE is executed on all tables modified during the most recent job.
  • C. Optimized writes use logical partitions instead of directory partitions; because partition boundaries are only represented in metadata, fewer small files are written.
  • D. Data is queued in a messaging bus instead of committing data directly to memory; all data is committed from the messaging bus in one batch once the job is complete.
  • E. An asynchronous job runs after the write completes to detect if files could be further compacted; if yes, an OPTIMIZE job is executed toward a default of 128 MB.
Show Suggested Answer Hide Answer
Suggested Answer: E 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
vish9
3 weeks ago
There appears to be a typo in databricks documentation
upvoted 1 times
...
rrprofessional
3 weeks, 3 days ago
Enable auto compaction. By default will use 128 MB as the target file size.
upvoted 1 times
...
akashdesarda
1 month, 3 weeks ago
Selected Answer: B
If you go through this docs - then one thing is clear that it is not async job, so we have to eliminate A & C. D is wrong. It has no special job wrt the partition. Also file size 0f 128 MB is legacy config, latest one is dynamic. So we are left with B
upvoted 2 times
...
pk07
1 month, 4 weeks ago
Selected Answer: E
https://docs.databricks.com/en/delta/tune-file-size.html
upvoted 1 times
...
partha1022
3 months, 1 week ago
Selected Answer: B
Auto compaction is synchronous job.
upvoted 2 times
...
Shailly
4 months ago
Selected Answer: B
A and E are wrong because auto compaction is synchronous operation! I vote for B As per documentation - "Auto compaction occurs after a write to a table has succeeded and runs synchronously on the cluster that has performed the write. Auto compaction only compacts files that haven’t been compacted previously." https://docs.delta.io/latest/optimizations-oss.html
upvoted 4 times
...
imatheushenrique
5 months, 3 weeks ago
E. An asynchronous job runs after the write completes to detect if files could be further compacted; if yes, an OPTIMIZE job is executed toward a default of 128 MB. https://community.databricks.com/t5/data-engineering/what-is-the-difference-between-optimize-and-auto-optimize/td-p/21189
upvoted 1 times
...
ojudz08
9 months, 2 weeks ago
Selected Answer: E
E is the answer. Enable the settings uses the 128 MB as the target file size https://learn.microsoft.com/en-us/azure/databricks/delta/tune-file-size
upvoted 2 times
...
DAN_H
9 months, 4 weeks ago
Selected Answer: E
default file size is 128MB in auto compaction
upvoted 1 times
...
kz_data
10 months, 2 weeks ago
E is correct as the default file size is 128MB in auto compaction, not 1GB as normal OPTIMIZE statement.
upvoted 1 times
...
IWantCerts
10 months, 2 weeks ago
Selected Answer: E
128MB is the default.
upvoted 1 times
...
Yogi05
11 months ago
Question is more on auto compaction hence the answer is E, as default size or auto compaction is 128 mb
upvoted 1 times
...
hamzaKhribi
11 months, 4 weeks ago
Selected Answer: E
Optimize default target file size is 1Gb, however in this question we are dealing with auto compaction. Which when enabled runs optimize with 128MB file size by default.
upvoted 1 times
...
aragorn_brego
1 year ago
Selected Answer: A
Delta Lake's Auto Compaction feature is designed to improve the efficiency of data storage by reducing the number of small files in a Delta table. After data is written to a Delta table, an asynchronous job can be triggered to evaluate the file sizes. If it determines that there are a significant number of small files, it will automatically run the OPTIMIZE command, which coalesces these small files into larger ones, typically aiming for files around 1 GB in size for optimal performance. E is incorrect because the statement is similar to A but with an incorrect default file size target.
upvoted 4 times
Kill9
5 months ago
Table property delta.autoOptimize.autoCompact target 128 mb. For table property delta.tuneFileSizesForRewrites, tables larger than 10 TB, the target file size is 1 GB. https://learn.microsoft.com/en-us/azure/databricks/delta/tune-file-size
upvoted 1 times
...
...
BIKRAM063
1 year ago
Selected Answer: E
E is correct. Auto compact tries to optimize to a file size of 128MB
upvoted 1 times
...
sturcu
1 year, 1 month ago
Selected Answer: E
E is the best feet, although databricks says that auto compaction runs runs synchronously
upvoted 3 times
...
Eertyy
1 year, 2 months ago
correct answer is e
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...