Exam Certified Data Engineer Associate All Questions

View all questions & answers for the Certified Data Engineer Associate exam

Exam Certified Data Engineer Associate topic 1 question 34 discussion

Actual exam question from Databricks's Certified Data Engineer Associate

Question #: 34
Topic #: 1

[All Certified Data Engineer Associate Questions]

A data engineer is designing a data pipeline. The source system generates files in a shared directory that is also used by other processes. As a result, the files should be kept as is and will accumulate in the directory. The data engineer needs to identify which files are new since the previous run in the pipeline, and set up the pipeline to only ingest those new files with each run.
Which of the following tools can the data engineer use to solve this problem?

A. Unity Catalog
B. Delta Lake
C. Databricks SQL
D. Data Explorer
E. Auto Loader

Show Suggested Answer

Suggested Answer: E 🗳️

by XiltroX at April 2, 2023, 2:20 p.m.

Comments

Submit Cancel

806e7d2

5 months ago

Selected Answer: E

Auto Loader is a feature in Databricks that is specifically designed to efficiently ingest new files incrementally from cloud storage directories. It can handle the scenario where files accumulate in a shared directory, and you want to ingest only the new files since the previous run without reprocessing the entire dataset. Auto Loader uses file notification services to track new files that appear in the directory, enabling incremental processing of those files as they are added. It also supports schema inference and automatically manages the state of the files that have been processed, so you don't need to manually track which files have been ingested.

upvoted 1 times

...

80370eb

8 months, 2 weeks ago

Selected Answer: E

E. Auto Loader Auto Loader is designed to incrementally ingest new data files as they appear in a directory, making it ideal for scenarios where files accumulate and need to be ingested without reprocessing previously ingested files. It automatically tracks which files have already been processed, ensuring that only new files are ingested with each pipeline run.

upvoted 1 times

...

benni_ale

11 months, 3 weeks ago

Selected Answer: E

E is correct

upvoted 1 times

...

SerGrey

1 year, 3 months ago

Selected Answer: E

E is correct

upvoted 1 times

...

Huroye

1 year, 5 months ago

the data engineer needs to identify which files are new since the previous run. This seems to be an analysis effort. If that is the case, and I might be wrong, then DB SQL is the correct answer.

upvoted 1 times

...

DavidRou

1 year, 5 months ago

Selected Answer: E

Autoloader can help if you want to ingest data incrementally.

upvoted 1 times

...

AndreFR

1 year, 8 months ago

Selected Answer: E

Auto Loader incrementally and efficiently processes new data files as they arrive in cloud storage without any additional setup. https://docs.databricks.com/en/ingestion/auto-loader/index.html

upvoted 2 times

...

surrabhi_4

2 years ago

Selected Answer: E

option E

upvoted 3 times

...

XiltroX

2 years ago

Selected Answer: E

E is the correct answer.

upvoted 4 times

...

Exam Certified Data Engineer Associate All Questions

View all questions & answers for the Certified Data Engineer Associate exam

Exam Certified Data Engineer Associate topic 1 question 34 discussion

Comments

806e7d2

80370eb

benni_ale

SerGrey

Huroye

DavidRou

AndreFR

surrabhi_4

XiltroX

SY0-701