Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 50 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 50
Topic #: 1
[All Certified Data Engineer Professional Questions]

A production cluster has 3 executor nodes and uses the same virtual machine type for the driver and executor.
When evaluating the Ganglia Metrics for this cluster, which indicator would signal a bottleneck caused by code executing on the driver?

  • A. The five Minute Load Average remains consistent/flat
  • B. Bytes Received never exceeds 80 million bytes per second
  • C. Total Disk Space remains constant
  • D. Network I/O never spikes
  • E. Overall cluster CPU utilization is around 25%
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
BrianNguyen95
Highly Voted 1 year, 3 months ago
Option E: In a Spark cluster, the driver node is responsible for managing the execution of the Spark application, including scheduling tasks, managing the execution plan, and interacting with the cluster manager. If the overall cluster CPU utilization is low (e.g., around 25%), it may indicate that the driver node is not utilizing the available resources effectively and might be a bottleneck.
upvoted 17 times
fe3b2fc
3 months ago
A bottleneck occurs when resources are over utilized not underutilized, so that explanation doesn't make too much sense. CPU utilization would be at 100% and you wouldn't see spike in I/O if the driver was the issue. Conversely if the I/O was spiked and CPU utilization was at 25% , then network could be the issue. D is the only logical answer in this case.
upvoted 3 times
benni_ale
3 weeks, 5 days ago
i like this more
upvoted 1 times
...
...
guillesd
9 months, 3 weeks ago
Overall CPU utilization can be misleading. The 25% utilization could be caused by the workload not requiring more than that rather than everything being executed in the driver node.
upvoted 2 times
...
...
nedlo
Most Recent 4 weeks, 1 day ago
Selected Answer: E
bottleneck means data skew means one of the nodes is doing majority of work while other is idle, so E is correct
upvoted 1 times
...
m79590530
1 month, 1 week ago
Selected Answer: E
D also means that Driver never send big data chunks to the Worker nodes but as it is not mentioned to be 0 then it has a constant flow of data going in & out between the Driver node and the Worker nodes. Therefore it is not a measure of Driver bottleneck. However Answer E means one of the 4 cluster nodes is always working at 100% which can not be other than the Driver node as it is always working and coordinating work across Executors.
upvoted 1 times
...
fe3b2fc
3 months ago
Selected Answer: D
Executors talk between each other and between nodes, if the code/driver is working as intended you would see a spike in I/O while transferring data. If the code/driver was the issue you would see a spike in CPU usage and little network traffic between nodes. The correct answer is D.
upvoted 2 times
...
lophonos
5 months, 2 weeks ago
Selected Answer: E
E is correct
upvoted 1 times
...
guillesd
9 months, 3 weeks ago
Selected Answer: D
If there's no IO between driver and executor nodes then the executor nodes are not working
upvoted 1 times
...
Patito
11 months ago
Selected Answer: D
D seems to be right
upvoted 2 times
...
rok21
11 months, 3 weeks ago
Selected Answer: E
E is correct
upvoted 1 times
...
azurelearn2020
11 months, 3 weeks ago
Selected Answer: E
25% indicates Cluster CPU under-utilized
upvoted 2 times
Def21
10 months ago
Not correct. 25% could (in theory) mean driver is using 100% CPU
upvoted 1 times
...
...
sturcu
1 year, 1 month ago
Selected Answer: E
If the overall cluster CPU utilization is around 25%, it means that only one out of the four nodes (driver + 3 executors) is using its full CPU capacity, while the other three nodes are idle or underutilized
upvoted 3 times
...
sturcu
1 year, 1 month ago
Selected Answer: D
If the overall cluster CPU utilization is around 25%, it means that only one out of the four nodes (driver + 3 executors) is using its full CPU capacity, while the other three nodes are idle or underutilized
upvoted 4 times
sturcu
1 year, 1 month ago
Correct Answer is E.
upvoted 1 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...