Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 138 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 138
Topic #: 1
[All Certified Data Engineer Professional Questions]

A junior data engineer has been asked to develop a streaming data pipeline with a grouped aggregation using DataFrame df. The pipeline needs to calculate the average humidity and average temperature for each non-overlapping five-minute interval. Events are recorded once per minute per device.

Streaming DataFrame df has the following schema:

"device_id INT, event_time TIMESTAMP, temp FLOAT, humidity FLOAT"

Code block:



Which line of code correctly fills in the blank within the code block to complete this task?

  • A. to_interval("event_time", "5 minutes").alias("time")
  • B. window("event_time", "5 minutes").alias("time")
  • C. "event_time"
  • D. lag("event_time", "10 minutes").alias("time")
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
m79590530
1 month ago
Selected Answer: B
This is the standard syntax to do non-overlapping time interval Window-ed grouping by the time field in a dataset in Structured Streaming. .withWatermatk() function defines the staging buffers after which delayed records will be dropped/ignored.
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...