exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 50 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 50
Topic #: 1
[All Certified Data Engineer Professional Questions]

A production cluster has 3 executor nodes and uses the same virtual machine type for the driver and executor.
When evaluating the Ganglia Metrics for this cluster, which indicator would signal a bottleneck caused by code executing on the driver?

  • A. The five Minute Load Average remains consistent/flat
  • B. Bytes Received never exceeds 80 million bytes per second
  • C. Total Disk Space remains constant
  • D. Network I/O never spikes
  • E. Overall cluster CPU utilization is around 25%
Show Suggested Answer Hide Answer
Suggested Answer: E 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
BrianNguyen95
Highly Voted 1 year, 8 months ago
Option E: In a Spark cluster, the driver node is responsible for managing the execution of the Spark application, including scheduling tasks, managing the execution plan, and interacting with the cluster manager. If the overall cluster CPU utilization is low (e.g., around 25%), it may indicate that the driver node is not utilizing the available resources effectively and might be a bottleneck.
upvoted 19 times
fe3b2fc
8 months, 1 week ago
A bottleneck occurs when resources are over utilized not underutilized, so that explanation doesn't make too much sense. CPU utilization would be at 100% and you wouldn't see spike in I/O if the driver was the issue. Conversely if the I/O was spiked and CPU utilization was at 25% , then network could be the issue. D is the only logical answer in this case.
upvoted 3 times
benni_ale
6 months ago
i like this more
upvoted 2 times
...
...
guillesd
1 year, 2 months ago
Overall CPU utilization can be misleading. The 25% utilization could be caused by the workload not requiring more than that rather than everything being executed in the driver node.
upvoted 2 times
...
...
JoG1221
Most Recent 1 week ago
Selected Answer: A
Option E is valid and insightful, Option A is more targeted when you're specifically trying to detect a bottleneck on the driver.
upvoted 1 times
...
Tedet
1 month, 3 weeks ago
Selected Answer: A
When you see the "Five Minute Load Average" remain consistent or flat, it could indicate that the driver is under heavy load and is struggling to keep up with the workload. In the case of a Spark cluster, if the driver is handling too much work, it can become a bottleneck and prevent the overall job from progressing efficiently.
upvoted 2 times
...
srinivasa
4 months ago
Selected Answer: A
Consistent/Flat Five Minute Load Average: If the load average on the driver node remains consistent and does not fluctuate, it suggests that the driver is under constant, significant load. This could be a sign that the driver is performing a lot of work, potentially leading to a bottleneck.
upvoted 3 times
...
AlejandroU
4 months, 2 weeks ago
Selected Answer: E
Answer E. A low CPU usage could indicate that the driver isn't working as efficiently as expected, which can lead to underutilization of the cluster and slower processing times.
upvoted 2 times
...
JB90
5 months ago
Selected Answer: E
Only when the driver does all or most the work will the overall cluster CPU util be this low since the driver cpu is 25% of the overall cluster CPU amount
upvoted 1 times
...
nedlo
6 months ago
Selected Answer: E
bottleneck means data skew means one of the nodes is doing majority of work while other is idle, so E is correct
upvoted 2 times
...
m79590530
6 months, 1 week ago
Selected Answer: E
D also means that Driver never send big data chunks to the Worker nodes but as it is not mentioned to be 0 then it has a constant flow of data going in & out between the Driver node and the Worker nodes. Therefore it is not a measure of Driver bottleneck. However Answer E means one of the 4 cluster nodes is always working at 100% which can not be other than the Driver node as it is always working and coordinating work across Executors.
upvoted 1 times
...
fe3b2fc
8 months, 1 week ago
Selected Answer: D
Executors talk between each other and between nodes, if the code/driver is working as intended you would see a spike in I/O while transferring data. If the code/driver was the issue you would see a spike in CPU usage and little network traffic between nodes. The correct answer is D.
upvoted 2 times
...
lophonos
10 months, 3 weeks ago
Selected Answer: E
E is correct
upvoted 1 times
...
guillesd
1 year, 2 months ago
Selected Answer: D
If there's no IO between driver and executor nodes then the executor nodes are not working
upvoted 1 times
...
Patito
1 year, 4 months ago
Selected Answer: D
D seems to be right
upvoted 2 times
...
rok21
1 year, 4 months ago
Selected Answer: E
E is correct
upvoted 1 times
...
azurelearn2020
1 year, 4 months ago
Selected Answer: E
25% indicates Cluster CPU under-utilized
upvoted 2 times
Def21
1 year, 3 months ago
Not correct. 25% could (in theory) mean driver is using 100% CPU
upvoted 1 times
...
...
sturcu
1 year, 6 months ago
Selected Answer: E
If the overall cluster CPU utilization is around 25%, it means that only one out of the four nodes (driver + 3 executors) is using its full CPU capacity, while the other three nodes are idle or underutilized
upvoted 3 times
...
sturcu
1 year, 6 months ago
Selected Answer: D
If the overall cluster CPU utilization is around 25%, it means that only one out of the four nodes (driver + 3 executors) is using its full CPU capacity, while the other three nodes are idle or underutilized
upvoted 4 times
sturcu
1 year, 6 months ago
Correct Answer is E.
upvoted 1 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago