I believe D is the correct one according to documentation from Databricks [1]:
"The driver process runs your main() function, sits on a node in the cluster, and is responsible for three things: maintaining information about the Spark Application; responding to a user’s program or input; and analyzing, distributing, and scheduling work across the executors (defined momentarily)."
Addittionaly:
"The cluster manager controls physical machines and allocates resources to Spark Applications."
Based on the above we could say that cluster manager is charge assign resources (CPU, Memory, etc) to the VMs used. Keep in mind that this is based on the definition from Databricks other definitions may include what was mentioned by cookiemonster42.
[1] https://www.databricks.com/glossary/what-are-spark-applications
Should be B -
D - Spark driver is not directly responsible for scheduling the execution of data by various worker nodes in cluster mode. It submits tasks to the cluster manager (e.g., YARN, Mesos, or Kubernetes), and the cluster manager handles the scheduling of tasks on worker nodes.
upvoted 1 times
...
Log in to ExamTopics
Sign in:
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one.
So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
Sowwy1
7 months, 3 weeks agomartcerv
1 year, 3 months agocookiemonster42
1 year, 3 months ago