exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 112 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 112
Topic #: 1
[All Certified Data Engineer Professional Questions]

Each configuration below is identical to the extent that each cluster has 400 GB total of RAM 160 total cores and only one Executor per VM.

Given an extremely long-running job for which completion must be guaranteed, which cluster configuration will be able to guarantee completion of the job in light of one or more VM failures?

  • A. • Total VMs: 8
    • 50 GB per Executor
    • 20 Cores / Executor
  • B. • Total VMs: 16
    • 25 GB per Executor
    • 10 Cores / Executor
  • C. • Total VMs: 1
    • 400 GB per Executor
    • 160 Cores/Executor
  • D. • Total VMs: 4
    • 100 GB per Executor
    • 40 Cores / Executor
  • E. • Total VMs: 2
    • 200 GB per Executor
    • 80 Cores / Executor
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
91d511b
5 days, 19 hours ago
Selected Answer: B
Total VMs = 16 Resources per VM: 25 GB RAM and 10 cores per executor. Impact of a VM Failure: Losing one VM means losing only 6.25% of the cluster’s resources. Fault Tolerance: Excellent fault tolerance. The cluster can handle multiple VM failures (up to ~3 VMs) and still function effectively. Best Balance: With smaller VMs, the job remains highly fault-tolerant while using resources efficiently.
upvoted 1 times
...
shaojunni
4 months, 2 weeks ago
16 core provides more redundancy, fault tolerance and more parallelism. But if dataset is huge, 8VM maybe better. The question is missing some information.
upvoted 2 times
...
c00ccb7
7 months ago
Selected Answer: B
This setup ensures that the job can continue running and complete even if some VMs fail, as there are more VMs available to handle the workload
upvoted 2 times
...
ChayV
8 months, 2 weeks ago
Selected Answer: B
If VM is down, performance is degraded, so opting for vm's which has distributed memory per executor and optimal cores per executor.
upvoted 3 times
...
hal2401me
10 months, 4 weeks ago
Selected Answer: B
in my exam today, i chose B, 16VM, because the "extremely long-run".
upvoted 4 times
ThoBustos
9 months, 2 weeks ago
do you have link to databricks doc?
upvoted 1 times
practicioner
5 months, 3 weeks ago
I have no link for Databricks doc. It's just a logic. The more VMs we have, the more robust our pipeline is.
upvoted 1 times
arekm
1 month ago
So long as the data partition fits into a smaller VM. But we don't have that information. From the perspective of failures of multiple machines, the move of them the better :)
upvoted 1 times
...
...
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago