exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 50 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 50
Topic #: 1
[All Certified Data Engineer Professional Questions]

A production cluster has 3 executor nodes and uses the same virtual machine type for the driver and executor.
When evaluating the Ganglia Metrics for this cluster, which indicator would signal a bottleneck caused by code executing on the driver?

  • A. The five Minute Load Average remains consistent/flat
  • B. Bytes Received never exceeds 80 million bytes per second
  • C. Total Disk Space remains constant
  • D. Network I/O never spikes
  • E. Overall cluster CPU utilization is around 25%
Show Suggested Answer Hide Answer
Suggested Answer: E 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
BrianNguyen95
Highly Voted 1 year, 5 months ago
Option E: In a Spark cluster, the driver node is responsible for managing the execution of the Spark application, including scheduling tasks, managing the execution plan, and interacting with the cluster manager. If the overall cluster CPU utilization is low (e.g., around 25%), it may indicate that the driver node is not utilizing the available resources effectively and might be a bottleneck.
upvoted 17 times
fe3b2fc
5 months, 2 weeks ago
A bottleneck occurs when resources are over utilized not underutilized, so that explanation doesn't make too much sense. CPU utilization would be at 100% and you wouldn't see spike in I/O if the driver was the issue. Conversely if the I/O was spiked and CPU utilization was at 25% , then network could be the issue. D is the only logical answer in this case.
upvoted 3 times
benni_ale
3 months, 1 week ago
i like this more
upvoted 2 times
...
...
guillesd
12 months ago
Overall CPU utilization can be misleading. The 25% utilization could be caused by the workload not requiring more than that rather than everything being executed in the driver node.
upvoted 2 times
...
...
srinivasa
Most Recent 1 month, 1 week ago
Selected Answer: A
Consistent/Flat Five Minute Load Average: If the load average on the driver node remains consistent and does not fluctuate, it suggests that the driver is under constant, significant load. This could be a sign that the driver is performing a lot of work, potentially leading to a bottleneck.
upvoted 1 times
...
AlejandroU
1 month, 3 weeks ago
Selected Answer: E
Answer E. A low CPU usage could indicate that the driver isn't working as efficiently as expected, which can lead to underutilization of the cluster and slower processing times.
upvoted 1 times
...
JB90
2 months, 1 week ago
Selected Answer: E
Only when the driver does all or most the work will the overall cluster CPU util be this low since the driver cpu is 25% of the overall cluster CPU amount
upvoted 1 times
...
nedlo
3 months, 1 week ago
Selected Answer: E
bottleneck means data skew means one of the nodes is doing majority of work while other is idle, so E is correct
upvoted 2 times
...
m79590530
3 months, 2 weeks ago
Selected Answer: E
D also means that Driver never send big data chunks to the Worker nodes but as it is not mentioned to be 0 then it has a constant flow of data going in & out between the Driver node and the Worker nodes. Therefore it is not a measure of Driver bottleneck. However Answer E means one of the 4 cluster nodes is always working at 100% which can not be other than the Driver node as it is always working and coordinating work across Executors.
upvoted 1 times
...
fe3b2fc
5 months, 2 weeks ago
Selected Answer: D
Executors talk between each other and between nodes, if the code/driver is working as intended you would see a spike in I/O while transferring data. If the code/driver was the issue you would see a spike in CPU usage and little network traffic between nodes. The correct answer is D.
upvoted 2 times
...
lophonos
8 months ago
Selected Answer: E
E is correct
upvoted 1 times
...
guillesd
12 months ago
Selected Answer: D
If there's no IO between driver and executor nodes then the executor nodes are not working
upvoted 1 times
...
Patito
1 year, 1 month ago
Selected Answer: D
D seems to be right
upvoted 2 times
...
rok21
1 year, 1 month ago
Selected Answer: E
E is correct
upvoted 1 times
...
azurelearn2020
1 year, 1 month ago
Selected Answer: E
25% indicates Cluster CPU under-utilized
upvoted 2 times
Def21
1 year ago
Not correct. 25% could (in theory) mean driver is using 100% CPU
upvoted 1 times
...
...
sturcu
1 year, 3 months ago
Selected Answer: E
If the overall cluster CPU utilization is around 25%, it means that only one out of the four nodes (driver + 3 executors) is using its full CPU capacity, while the other three nodes are idle or underutilized
upvoted 3 times
...
sturcu
1 year, 3 months ago
Selected Answer: D
If the overall cluster CPU utilization is around 25%, it means that only one out of the four nodes (driver + 3 executors) is using its full CPU capacity, while the other three nodes are idle or underutilized
upvoted 4 times
sturcu
1 year, 3 months ago
Correct Answer is E.
upvoted 1 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago