Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 77 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 77
Topic #: 1
[All Certified Data Engineer Professional Questions]

In order to facilitate near real-time workloads, a data engineer is creating a helper function to leverage the schema detection and evolution functionality of Databricks Auto Loader. The desired function will automatically detect the schema of the source directly, incrementally process JSON files as they arrive in a source directory, and automatically evolve the schema of the table when new fields are detected.

The function is displayed below with a blank:



Which response correctly fills in the blank to meet the specified requirements?

  • A.
  • B.
  • C.
  • D.
  • E.
Show Suggested Answer Hide Answer
Suggested Answer: E 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
35fd6dd
3 months, 2 weeks ago
Selected Answer: E
write is not for spark streaming
upvoted 2 times
...
Freyr
6 months ago
Selected Answer: E
Reference: https://docs.databricks.com/en/ingestion/auto-loader/schema.html writeStream: Ensures real-time streaming write capabilities, which is essential f or near real-time workloads. checkpointLocation: Necessary for fault tolerance and tracking progress. mergeSchema: Ensures automatic schema evolution, allowing new columns to be detected and added to the target table. Why Option 'C ' is incorrect? Uses write instead of writeStream, which is for batch processing, making it inappropriate for real-time streaming. Why Option 'B ' is incorrect? Although it includes checkpointLocation and mergeSchema, the addition of trigger(once=True) is not necessary in this context, and it is better suited for batch-like processing. Reference: https://docs.databricks.com/en/ingestion/auto-loader/schema.html
upvoted 2 times
...
vikram12apr
8 months, 3 weeks ago
Selected Answer: E
streamRead & StreamWrite shares the schema using checkpoint location so cloudFiles.schemaLocation needs to be same for checkpointLocation so that we dont need to specify it manually also mergeSchema True make sure if any new column detected , it will be added in the target table https://docs.databricks.com/en/ingestion/auto-loader/schema.html
upvoted 2 times
...
hal2401me
8 months, 3 weeks ago
Selected Answer: E
https://notebooks.databricks.com/demos/auto-loader/01-Auto-loader-schema-evolution-Ingestion.html
upvoted 2 times
...
aragorn_brego
1 year ago
Selected Answer: E
This response correctly fills in the blank to meet the specified requirements of using Databricks Auto Loader for automatic schema detection and evolution in a near real-time streaming context.
upvoted 1 times
...
AzureDE2522
1 year ago
Selected Answer: E
Please refer: https://docs.databricks.com/en/ingestion/auto-loader/schema.html
upvoted 3 times
...
Dileepvikram
1 year ago
It does not mention to write as stream, it mentions to write incrementally, so option C looks correct for me
upvoted 1 times
...
mouad_attaqi
1 year, 1 month ago
Selected Answer: E
Correct answer is E, it is a streaming write, and the default outputMode is Append (so if it's optional in this case)
upvoted 2 times
...
sturcu
1 year, 1 month ago
there is a type in the statement. Is it schema or checkpoint ? Provided answer is not correct. It has to be a writestream, with mode append
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...