exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 137 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 137
Topic #: 1
[All Certified Data Engineer Professional Questions]

Which statement regarding stream-static joins and static Delta tables is correct?

  • A. The checkpoint directory will be used to track updates to the static Delta table.
  • B. Each microbatch of a stream-static join will use the most recent version of the static Delta table as of the job's initialization.
  • C. The checkpoint directory will be used to track state information for the unique keys present in the join.
  • D. Stream-static joins cannot use static Delta tables because of consistency issues.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️


Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
1 month ago
Selected Answer: B
All answers are wrong: A - checkpoint directory to track changes to Delta table? B - microbatch uses the state of the table at the time the query is executed, not at initialization C - unique keys? - stream-static joins are not stateful, so we are only looking at the current batch of records D - you can totally have stream-static joins, see: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#support-matrix-for-joins-in-streaming-queries I believe they made a typo in the B, that seems to be the only logical explanation.
upvoted 2 times
3 months ago
Selected Answer: A
If you look at question 18 you find that the correct solution should be Each microbatch of a stream-static join will use the most recent version of the static Delta table as of each microbatch. This is not listed here meaning that B could not be correct leading to A being the only possible solution.... The wrong part about B is that the latest version of the static delta table is returned at each micro-batch rather than as of job initialisation...
upvoted 1 times
8 months, 1 week ago
Selected Answer: B
When Databricks processes a micro-batch of data in a stream-static join, the latest valid version of data from the static Delta table joins with the records present in the current micro-batch. Because the join is stateless, you do not need to configure watermarking and can process results with low latency. The data in the static Delta table used in the join should be slowly-changing. https://docs.databricks.com/en/transform/join.html#stream-static
upvoted 3 times
Community vote distribution
A (35%)
C (25%)
B (20%)
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

Loading ...
Someone Bought Contributor Access for:
London, 1 minute ago