Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Go to Exam

Exam Certified Data Engineer Professional topic 1 question 22 discussion

Actual exam question from Databricks's Certified Data Engineer Professional

Question #: 22
Topic #: 1

[All Certified Data Engineer Professional Questions]

Which statement describes Delta Lake Auto Compaction?

A. An asynchronous job runs after the write completes to detect if files could be further compacted; if yes, an OPTIMIZE job is executed toward a default of 1 GB.
B. Before a Jobs cluster terminates, OPTIMIZE is executed on all tables modified during the most recent job.
C. Optimized writes use logical partitions instead of directory partitions; because partition boundaries are only represented in metadata, fewer small files are written.
D. Data is queued in a messaging bus instead of committing data directly to memory; all data is committed from the messaging bus in one batch once the job is complete.
E. An asynchronous job runs after the write completes to detect if files could be further compacted; if yes, an OPTIMIZE job is executed toward a default of 128 MB.

Show Suggested Answer

Suggested Answer: E 🗳️

by 8605246 at Aug. 6, 2023, 10:48 a.m.

Comments

Submit Cancel

aragorn_brego

Highly Voted 1 year, 5 months ago

Selected Answer: A

Delta Lake's Auto Compaction feature is designed to improve the efficiency of data storage by reducing the number of small files in a Delta table. After data is written to a Delta table, an asynchronous job can be triggered to evaluate the file sizes. If it determines that there are a significant number of small files, it will automatically run the OPTIMIZE command, which coalesces these small files into larger ones, typically aiming for files around 1 GB in size for optimal performance. E is incorrect because the statement is similar to A but with an incorrect default file size target.

upvoted 5 times

Kill9

10 months, 1 week ago

Table property delta.autoOptimize.autoCompact target 128 mb. For table property delta.tuneFileSizesForRewrites, tables larger than 10 TB, the target file size is 1 GB. https://learn.microsoft.com/en-us/azure/databricks/delta/tune-file-size

upvoted 2 times

...

RandomForest

Most Recent 3 months, 2 weeks ago

Selected Answer: E

Delta Lake Auto Compaction is a feature that automatically detects opportunities to optimize small files. When a write operation is completed, an asynchronous job assesses whether the resulting files can be compacted into larger files (the default target size is 128 MB). If compaction is needed, the system executes an OPTIMIZE job in the background to improve file size and query performance. This feature reduces the overhead of managing small files manually and improves storage and query efficiency. It aligns with Delta Lake's goal of simplifying and optimizing data lake performance.

upvoted 2 times

...

mwynn

3 months, 3 weeks ago

Selected Answer: E

I think it is E because they are just asking us to generally describe the feature - here's some info I gleaned from a DB Academy video: ○ Compact small files on write with auto-optimize (tries to achieve file size of 128 MB) ○ Auto-Compact launches a new job after execution of first Spark job (i.e. async), where it will try to compress files closer to 128 MB

upvoted 4 times

pallazoj

2 months, 2 weeks ago

This is true. I just heard the same statement in Databricks Academy video. Advanced Data Engineering with Databricks/Section5/Lesson1:Designing the foundation from 4:00 into the video!

upvoted 1 times

...

Nicks_name

4 months, 2 weeks ago

Selected Answer: E

typo in databricks documentation about sync job, but default size is explicitly mentioned as 128

upvoted 1 times

...

carah

4 months, 3 weeks ago

Selected Answer: B

Table property: delta.autoOptimize.autoCompact B. correct, although https://docs.databricks.com/en/delta/tune-file-size.html#auto-compaction-for-delta-lake-on-databricks does not mention OPTIMIZE, it is best option A., E. wrong, auto compaction runs synchronously C. wrong, it describes Table setting: delta.autoOptimize.optimizeWrite D. wrong, not related to file compaction

upvoted 3 times

arekm

3 months, 4 weeks ago

The problem I have with B is that is says - on all tables. That depends on whether we use spark settings or table settings. However, I still believe the asynchronous in A and E was meant to be synchronous (it is a typo). If it was not, then you are right :)

upvoted 1 times

...

vish9

5 months, 4 weeks ago

There appears to be a typo in databricks documentation

upvoted 3 times

...

rrprofessional

6 months ago

Enable auto compaction. By default will use 128 MB as the target file size.

upvoted 1 times

...

akashdesarda

6 months, 4 weeks ago

Selected Answer: B

If you go through this docs - then one thing is clear that it is not async job, so we have to eliminate A & C. D is wrong. It has no special job wrt the partition. Also file size 0f 128 MB is legacy config, latest one is dynamic. So we are left with B

upvoted 3 times

mouthwash

3 months, 3 weeks ago

This. Don't be fooled by the typo answers, typo is inserted for a reason. It makes the answer wrong.

upvoted 1 times

...

pk07

7 months ago

Selected Answer: E

https://docs.databricks.com/en/delta/tune-file-size.html

upvoted 2 times

...

partha1022

8 months, 2 weeks ago

Selected Answer: B

Auto compaction is synchronous job.

upvoted 2 times

...

Shailly

9 months, 1 week ago

Selected Answer: B

A and E are wrong because auto compaction is synchronous operation! I vote for B As per documentation - "Auto compaction occurs after a write to a table has succeeded and runs synchronously on the cluster that has performed the write. Auto compaction only compacts files that haven’t been compacted previously." https://docs.delta.io/latest/optimizations-oss.html

upvoted 4 times

...

imatheushenrique

11 months ago

E. An asynchronous job runs after the write completes to detect if files could be further compacted; if yes, an OPTIMIZE job is executed toward a default of 128 MB. https://community.databricks.com/t5/data-engineering/what-is-the-difference-between-optimize-and-auto-optimize/td-p/21189

upvoted 1 times

...

ojudz08

1 year, 2 months ago

Selected Answer: E

E is the answer. Enable the settings uses the 128 MB as the target file size https://learn.microsoft.com/en-us/azure/databricks/delta/tune-file-size

upvoted 2 times

...

DAN_H

1 year, 2 months ago

Selected Answer: E

default file size is 128MB in auto compaction

upvoted 1 times

...

kz_data

1 year, 3 months ago

E is correct as the default file size is 128MB in auto compaction, not 1GB as normal OPTIMIZE statement.

upvoted 1 times

...

IWantCerts

1 year, 3 months ago

Selected Answer: E

128MB is the default.

upvoted 1 times

...

Yogi05

1 year, 4 months ago

Question is more on auto compaction hence the answer is E, as default size or auto compaction is 128 mb

upvoted 1 times

...

Load full discussion...

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 22 discussion

Comments

aragorn_brego

Kill9

RandomForest

mwynn

pallazoj

Nicks_name

carah

arekm

vish9

rrprofessional

akashdesarda

mouthwash

pk07

partha1022

Shailly

imatheushenrique

ojudz08

DAN_H

kz_data

IWantCerts

Yogi05

SY0-701