Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 71 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 71
Topic #: 1
[All Certified Data Engineer Professional Questions]

A junior data engineer has been asked to develop a streaming data pipeline with a grouped aggregation using DataFrame df. The pipeline needs to calculate the average humidity and average temperature for each non-overlapping five-minute interval. Incremental state information should be maintained for 10 minutes for late-arriving data.

Streaming DataFrame df has the following schema:

"device_id INT, event_time TIMESTAMP, temp FLOAT, humidity FLOAT"

Code block:



Choose the response that correctly fills in the blank within the code block to complete this task.

  • A. withWatermark("event_time", "10 minutes")
  • B. awaitArrival("event_time", "10 minutes")
  • C. await("event_time + ‘10 minutes'")
  • D. slidingWindow("event_time", "10 minutes")
  • E. delayWrite("event_time", "10 minutes")
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
sturcu
Highly Voted 1 year, 1 month ago
Selected Answer: A
withWatermark. There sliding window is doe through the window function
upvoted 7 times
...
aragorn_brego
Highly Voted 1 year ago
Selected Answer: A
To handle late-arriving data in a streaming aggregation, you need to specify a watermark, which tells the streaming query how long to wait for late data. The withWatermark method is used for this purpose in Spark Structured Streaming. It defines the threshold for how late the data can be relative to the latest data that has been seen in the same window.
upvoted 5 times
...
71dfab9
Most Recent 3 months, 1 week ago
Selected Answer: A
The withWatermark method is used in streaming DataFrames when processing real-time data streams. This method helps in managing stateful operations, such as aggregations, by specifying a time column to use for watermarking. Watermarking is a mechanism to handle late data (data that arrives later than expected) by defining a threshold time window beyond which late data is considered too late to be included in aggregations. The slidingWindow function mentioned in D is not a standard function in Databricks or Apache Spark.
upvoted 1 times
...
Dileepvikram
1 year ago
Answer is A
upvoted 3 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...