Exam Certified Data Engineer Professional topic 1 question 108 discussion

Actual exam question from Databricks's Certified Data Engineer Professional

Question #: 108
Topic #: 1

[All Certified Data Engineer Professional Questions]

Which statement describes the default execution mode for Databricks Auto Loader?

A. Cloud vendor-specific queue storage and notification services are configured to track newly arriving files; the target table is materialized by directly querying all valid files in the source directory.
B. New files are identified by listing the input directory; the target table is materialized by directly querying all valid files in the source directory.
C. Webhooks trigger a Databricks job to run anytime new data arrives in a source directory; new data are automatically merged into target tables using rules inferred from the data.
D. New files are identified by listing the input directory; new files are incrementally and idempotently loaded into the target Delta Lake table.
E. Cloud vendor-specific queue storage and notification services are configured to track newly arriving files; new files are incrementally and idempotently loaded into the target Delta Lake table.

Show Suggested Answer

Suggested Answer: D 🗳️

by spaceexplorer at Jan. 25, 2024, 5:15 p.m.

Comments

Submit Cancel

vctrhugo

Highly Voted 1 year, 4 months ago

Selected Answer: D

"Auto Loader uses directory listing mode by default. In directory listing mode, Auto Loader identifies new files by listing the input directory." https://learn.microsoft.com/en-us/azure/databricks/ingestion/auto-loader/directory-listing-mode

upvoted 7 times

...

KadELbied

Most Recent 1 month, 3 weeks ago

Selected Answer: D

Surtetly D

upvoted 1 times

...

arekm

6 months ago

Selected Answer: D

D - Autoloader supports: 1. directory listing mode 2. file notification mode The first option is the default. Answer E describes the second option.

upvoted 2 times

...

Rinscy

1 year, 5 months ago

D definitely ! Auto Loader is an optimized file source that overcomes all the above limitations and provides a seamless way for data teams to load the raw data at low cost and latency with minimal DevOps effort. You just need to provide a source directory path and start a streaming job. The new structured streaming source, called "cloudFiles", will automatically set up file notification services that subscribe file events from the input directory and process new files as they arrive, with the option of also processing existing files in that directory.

upvoted 2 times

csrazdan

10 months ago

Correct answer is D. However, listing the input directory is the default way of identifying new files for auto loader. Cloud Native Notification services can be used but this is not default setting for auto loader.

upvoted 1 times

...

ranith

1 year, 5 months ago

https://docs.databricks.com/en/ingestion/auto-loader/options.html#:~:text=By%20default%2C%20Auto%20Loader%20makes,as%20true%20or%20false%20respectively. Selected answer: D

upvoted 1 times

...