Exam Certified Data Engineer Professional topic 1 question 137 discussion

Actual exam question from Databricks's Certified Data Engineer Professional

Question #: 137
Topic #: 1

[All Certified Data Engineer Professional Questions]

Which statement regarding stream-static joins and static Delta tables is correct?

A. The checkpoint directory will be used to track updates to the static Delta table.
B. Each microbatch of a stream-static join will use the most recent version of the static Delta table as of the job's initialization.
C. The checkpoint directory will be used to track state information for the unique keys present in the join.
D. Stream-static joins cannot use static Delta tables because of consistency issues.

Show Suggested Answer

Suggested Answer: B 🗳️

by MDWPartners at May 29, 2024, 7:24 p.m.

Comments

Submit Cancel

arekm

3 months, 2 weeks ago

Selected Answer: B

All answers are wrong: A - checkpoint directory to track changes to Delta table? B - microbatch uses the state of the table at the time the query is executed, not at initialization C - unique keys? - stream-static joins are not stateful, so we are only looking at the current batch of records D - you can totally have stream-static joins, see: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#support-matrix-for-joins-in-streaming-queries I believe they made a typo in the B, that seems to be the only logical explanation.

upvoted 2 times

...

benni_ale

5 months, 2 weeks ago

Selected Answer: A

If you look at question 18 you find that the correct solution should be Each microbatch of a stream-static join will use the most recent version of the static Delta table as of each microbatch. This is not listed here meaning that B could not be correct leading to A being the only possible solution.... The wrong part about B is that the latest version of the static delta table is returned at each micro-batch rather than as of job initialisation...

upvoted 1 times

...

MDWPartners

10 months, 3 weeks ago

Selected Answer: B

When Databricks processes a micro-batch of data in a stream-static join, the latest valid version of data from the static Delta table joins with the records present in the current micro-batch. Because the join is stateless, you do not need to configure watermarking and can process results with low latency. The data in the static Delta table used in the join should be slowly-changing. https://docs.databricks.com/en/transform/join.html#stream-static

upvoted 3 times

...

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 137 discussion

Comments

arekm

benni_ale

MDWPartners

SY0-701