Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 112 discussion

Actual exam question from Databricks's Certified Data Engineer Professional

Question #: 112
Topic #: 1

[All Certified Data Engineer Professional Questions]

Each configuration below is identical to the extent that each cluster has 400 GB total of RAM 160 total cores and only one Executor per VM.

Given an extremely long-running job for which completion must be guaranteed, which cluster configuration will be able to guarantee completion of the job in light of one or more VM failures?

A. • Total VMs: 8
• 50 GB per Executor
• 20 Cores / Executor
B. • Total VMs: 16
• 25 GB per Executor
• 10 Cores / Executor
C. • Total VMs: 1
• 400 GB per Executor
• 160 Cores/Executor
D. • Total VMs: 4
• 100 GB per Executor
• 40 Cores / Executor
E. • Total VMs: 2
• 200 GB per Executor
• 80 Cores / Executor

Show Suggested Answer

Suggested Answer: B 🗳️

by hal2401me at March 14, 2024, 7:01 a.m.

Comments

Submit Cancel

91d511b

2 months, 1 week ago

Selected Answer: B

Total VMs = 16 Resources per VM: 25 GB RAM and 10 cores per executor. Impact of a VM Failure: Losing one VM means losing only 6.25% of the cluster’s resources. Fault Tolerance: Excellent fault tolerance. The cluster can handle multiple VM failures (up to ~3 VMs) and still function effectively. Best Balance: With smaller VMs, the job remains highly fault-tolerant while using resources efficiently.

upvoted 1 times

...

shaojunni

6 months, 2 weeks ago

16 core provides more redundancy, fault tolerance and more parallelism. But if dataset is huge, 8VM maybe better. The question is missing some information.

upvoted 2 times

...

c00ccb7

9 months ago

Selected Answer: B

This setup ensures that the job can continue running and complete even if some VMs fail, as there are more VMs available to handle the workload

upvoted 2 times

...

ChayV

10 months, 2 weeks ago

Selected Answer: B

If VM is down, performance is degraded, so opting for vm's which has distributed memory per executor and optimal cores per executor.

upvoted 3 times

...

hal2401me

1 year ago

Selected Answer: B

in my exam today, i chose B, 16VM, because the "extremely long-run".

upvoted 4 times

ThoBustos

11 months, 3 weeks ago

do you have link to databricks doc?

upvoted 1 times

practicioner

7 months, 3 weeks ago

I have no link for Databricks doc. It's just a logic. The more VMs we have, the more robust our pipeline is.

upvoted 1 times

arekm

3 months ago

So long as the data partition fits into a smaller VM. But we don't have that information. From the perspective of failures of multiple machines, the move of them the better :)

upvoted 1 times

...

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 112 discussion

Comments

91d511b

shaojunni

c00ccb7

ChayV

hal2401me

ThoBustos

practicioner

arekm

SY0-701