exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 120 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 120
Topic #: 1
[All Certified Data Engineer Professional Questions]

The business reporting team requires that data for their dashboards be updated every hour. The total processing time for the pipeline that extracts, transforms, and loads the data for their pipeline runs in 10 minutes.

Assuming normal operating conditions, which configuration will meet their service-level agreement requirements with the lowest cost?

  • A. Configure a job that executes every time new data lands in a given directory
  • B. Schedule a job to execute the pipeline once an hour on a new job cluster
  • C. Schedule a Structured Streaming job with a trigger interval of 60 minutes
  • D. Schedule a job to execute the pipeline once an hour on a dedicated interactive cluster
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Sriramiyer92
1 month, 3 weeks ago
Selected Answer: B
Key words: updated every hour, pipeline runs in 10 minutes - Simple job cluster should do the job.
upvoted 1 times
...
Colje
3 months, 3 weeks ago
Selected Answer: B
The correct answer is B. Schedule a job to execute the pipeline once an hour on a new job cluster. Explanation: In this scenario, the business reporting team needs the data to be updated every hour, and the processing time for the pipeline takes 10 minutes. To meet this requirement with the lowest cost, the best option is to schedule the job to run once per hour using a new job cluster. A job cluster is created specifically for the duration of the job, and once the job finishes, the cluster is terminated. This is cost-efficient because resources are only consumed while the job is running, and the cluster does not stay active when it is not needed.
upvoted 1 times
arekm
1 month ago
I agree with B. However, C would also work from the cost perspective provided you set the timeout to a low value. That would still be more resource consumption than B, which disposes of the cluster as soon as the job is done. Moreover, C does not mention "job cluster" which is kind of an implied best practice. However, always better to be explicit.
upvoted 1 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago