Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Data Engineer Associate All Questions

View all questions & answers for the Certified Data Engineer Associate exam

Exam Certified Data Engineer Associate topic 1 question 34 discussion

Actual exam question from Databricks's Certified Data Engineer Associate
Question #: 34
Topic #: 1
[All Certified Data Engineer Associate Questions]

A data engineer is designing a data pipeline. The source system generates files in a shared directory that is also used by other processes. As a result, the files should be kept as is and will accumulate in the directory. The data engineer needs to identify which files are new since the previous run in the pipeline, and set up the pipeline to only ingest those new files with each run.
Which of the following tools can the data engineer use to solve this problem?

  • A. Unity Catalog
  • B. Delta Lake
  • C. Databricks SQL
  • D. Data Explorer
  • E. Auto Loader
Show Suggested Answer Hide Answer
Suggested Answer: E 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
806e7d2
2 days, 23 hours ago
Selected Answer: E
Auto Loader is a feature in Databricks that is specifically designed to efficiently ingest new files incrementally from cloud storage directories. It can handle the scenario where files accumulate in a shared directory, and you want to ingest only the new files since the previous run without reprocessing the entire dataset. Auto Loader uses file notification services to track new files that appear in the directory, enabling incremental processing of those files as they are added. It also supports schema inference and automatically manages the state of the files that have been processed, so you don't need to manually track which files have been ingested.
upvoted 1 times
...
80370eb
3 months, 2 weeks ago
Selected Answer: E
E. Auto Loader Auto Loader is designed to incrementally ingest new data files as they appear in a directory, making it ideal for scenarios where files accumulate and need to be ingested without reprocessing previously ingested files. It automatically tracks which files have already been processed, ensuring that only new files are ingested with each pipeline run.
upvoted 1 times
...
benni_ale
6 months, 3 weeks ago
Selected Answer: E
E is correct
upvoted 1 times
...
SerGrey
10 months, 2 weeks ago
Selected Answer: E
E is correct
upvoted 1 times
...
Huroye
1 year ago
the data engineer needs to identify which files are new since the previous run. This seems to be an analysis effort. If that is the case, and I might be wrong, then DB SQL is the correct answer.
upvoted 1 times
...
DavidRou
1 year ago
Selected Answer: E
Autoloader can help if you want to ingest data incrementally.
upvoted 1 times
...
AndreFR
1 year, 3 months ago
Selected Answer: E
Auto Loader incrementally and efficiently processes new data files as they arrive in cloud storage without any additional setup. https://docs.databricks.com/en/ingestion/auto-loader/index.html
upvoted 2 times
...
surrabhi_4
1 year, 7 months ago
Selected Answer: E
option E
upvoted 3 times
...
XiltroX
1 year, 7 months ago
Selected Answer: E
E is the correct answer.
upvoted 4 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...