exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 62 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 62
Topic #: 1
[All Certified Data Engineer Professional Questions]

The business reporting team requires that data for their dashboards be updated every hour. The total processing time for the pipeline that extracts transforms, and loads the data for their pipeline runs in 10 minutes.

Assuming normal operating conditions, which configuration will meet their service-level agreement requirements with the lowest cost?

  • A. Manually trigger a job anytime the business reporting team refreshes their dashboards
  • B. Schedule a job to execute the pipeline once an hour on a new job cluster
  • C. Schedule a Structured Streaming job with a trigger interval of 60 minutes
  • D. Schedule a job to execute the pipeline once an hour on a dedicated interactive cluster
  • E. Configure a job that executes every time new data lands in a given directory
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
divingbell17
Highly Voted 1 year, 1 month ago
Selected Answer: B
B is correct I think. With option C, the cluster remains on 24/7 with trigger = 60 mins which is more costly If there is an option with structure streaming with trigger = availablenow, and job scheduled per hour, that would be even more efficient. https://www.databricks.com/blog/2017/05/22/running-streaming-jobs-day-10x-cost-savings.html
upvoted 10 times
arekm
1 month ago
Always a job cluster.
upvoted 1 times
...
...
robodog
Most Recent 5 months, 2 weeks ago
Selected Answer: C
C. The lowest cost is obtained by using job cluster
upvoted 1 times
robodog
5 months, 2 weeks ago
B answer i mean
upvoted 2 times
...
...
Curious76
11 months, 2 weeks ago
Selected Answer: C
Databricks recommends using Structured Streaming with trigger AvailableNow for incremental workloads that do not have low latency requirements.
upvoted 2 times
...
spaceexplorer
1 year ago
Selected Answer: B
B is correct
upvoted 4 times
...
alexvno
1 year, 1 month ago
Selected Answer: B
B : Job cluster is cheap , hourly = 60 minutes
upvoted 4 times
...
aragorn_brego
1 year, 2 months ago
Selected Answer: B
Scheduling a job to execute the pipeline on an hourly basis aligns with the requirement for data to be updated every hour. Using a job cluster (which is brought up for the job and torn down upon completion) rather than a dedicated interactive cluster will usually be more cost-effective. This is because you are only paying for the compute resources when the job is running, which is 10 minutes out of every hour, rather than paying for an interactive cluster that would be up and running (and incurring costs) continuously.
upvoted 2 times
...
ofed
1 year, 2 months ago
It's either B or D. I think B, because we want the lowest cost.
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago