Exam Certified Data Engineer Associate topic 1 question 31 discussion

Actual exam question from Databricks's Certified Data Engineer Associate

Question #: 31
Topic #: 1

[All Certified Data Engineer Associate Questions]

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.
The cade block used by the data engineer is below:

If the data engineer only wants the query to execute a micro-batch to process data every 5 seconds, which of the following lines of code should the data engineer use to fill in the blank?

A. trigger("5 seconds")
B. trigger()
C. trigger(once="5 seconds")
D. trigger(processingTime="5 seconds")
E. trigger(continuous="5 seconds")

Show Suggested Answer

Suggested Answer: D 🗳️

by XiltroX at April 2, 2023, 2:14 p.m.

Comments

Submit Cancel

4be8126

Highly Voted 2 years, 3 months ago

Selected Answer: D

The correct line of code to fill in the blank to execute a micro-batch to process data every 5 seconds is: D. trigger(processingTime="5 seconds") Option A ("trigger("5 seconds")") would not work because it does not specify that the trigger should be a processing time trigger, which is necessary to trigger a micro-batch processing at regular intervals. Option B ("trigger()") would not work because it would use the default trigger, which is not a processing time trigger. Option C ("trigger(once="5 seconds")") would not work because it would only trigger the query once, not at regular intervals. Option E ("trigger(continuous="5 seconds")") would not work because it would trigger the query to run continuously, without any pauses in between, which is not what the data engineer wants.

upvoted 6 times

azurean

1 month, 2 weeks ago

In Databricks Runtime 11.3 LTS and above, the Trigger.Once setting is deprecated. Databricks recommends you use Trigger.AvailableNow for all incremental batch processing workloads.

upvoted 1 times

...