exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 26 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 26
Topic #: 1
[All Certified Data Engineer Professional Questions]

Each configuration below is identical to the extent that each cluster has 400 GB total of RAM, 160 total cores and only one Executor per VM.
Given a job with at least one wide transformation, which of the following cluster configurations will result in maximum performance?

  • A. • Total VMs; 1
    • 400 GB per Executor
    • 160 Cores / Executor
  • B. • Total VMs: 8
    • 50 GB per Executor
    • 20 Cores / Executor
  • C. • Total VMs: 16
    • 25 GB per Executor
    • 10 Cores/Executor
  • D. • Total VMs: 4
    • 100 GB per Executor
    • 40 Cores/Executor
  • E. • Total VMs:2
    • 200 GB per Executor
    • 80 Cores / Executor
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
robson90
Highly Voted 1 year, 4 months ago
Option A, question is about maximum performance. Wide transformation will result in often expensive shuffle. With one executor this problem will be resolved. https://docs.databricks.com/en/clusters/cluster-config-best-practices.html#complex-batch-etl
upvoted 41 times
dp_learner
1 year, 2 months ago
source : https://docs.databricks.com/en/clusters/cluster-config-best-practices.html
upvoted 3 times
...
...
hassan_1
Most Recent 1 week, 5 days ago
Selected Answer: B
as the question states 1 executer per VM and the recommendation is not to use single worker in production so the answer should be B
upvoted 1 times
...
HairyTorso
1 week, 6 days ago
Selected Answer: B
Number of workers Choosing the right number of workers requires some trials and iterations to figure out the compute and memory needs of a Spark job. Here are some guidelines to help you start: Never choose a single worker for a production job, as it will be the single point for failure Start with 2-4 workers for small workloads (for example, a job with no wide transformations like joins and aggregations) Start with 8-10 workers for medium to big workloads that involve wide transformations like joins and aggregations, then scale up if necessary https://www.databricks.com/discover/pages/optimize-data-workloads-guide#number-workers
upvoted 1 times
...
arekm
2 weeks ago
Selected Answer: A
Maximum performance - A guarantees no shuffles between nodes in the cluster. Only processes on one VM.
upvoted 1 times
...
AlejandroU
1 month ago
Selected Answer: B
Answer B offers a good balance with 8 executors, providing a decent amount of memory and cores per executor, allowing for significant parallel processing. Option C increases the number of executors further but at the cost of reduced memory and cores per executor, which might not be as effective for wide transformations.
upvoted 1 times
arekm
2 weeks ago
The question is about maximum performance.
upvoted 1 times
...
...
janeZ
1 month ago
Selected Answer: C
for wide transformations, leveraging multiple executors typically results in better performance, resource utilization, and fault tolerance.
upvoted 2 times
...
Shakmak
1 month, 2 weeks ago
Selected Answer: B
B is a correct Answer based on https://www.databricks.com/discover/pages/optimize-data-workloads-guide#all-purpose
upvoted 1 times
...
AndreFR
2 months ago
Selected Answer: B
Besides that A & E do not provide enough parallelism & fault tolerance, I can't explain why, but the correct answer is B. I got the same question during the exam and got 100% at tooling with answer B. (B is the answer provided by other sites similar to examtopics) Choosing between B, C & D is tricky !
upvoted 3 times
Snakode
1 month, 2 weeks ago
Exactly, Also how can one node will resolve shuffle issue
upvoted 1 times
Nicks_name
1 month ago
VM != node
upvoted 1 times
...
...
...
kimberlyvsmith
2 months, 1 week ago
Selected Answer: B
B "Number of workers Choosing the right number of workers requires some trials and iterations to figure out the compute and memory needs of a Spark job. Here are some guidelines to help you start: Never choose a single worker for a production job, as it will be the single point for failure Start with 2-4 workers for small workloads (for example, a job with no wide transformations like joins and aggregations) Start with 8-10 workers for medium to big workloads that involve wide transformations like joins and aggregations, then scale up if necessary"
upvoted 3 times
benni_ale
2 months, 1 week ago
https://www.databricks.com/discover/pages/optimize-data-workloads-guide
upvoted 1 times
...
...
arik90
9 months, 3 weeks ago
Selected Answer: A
Wide transformation falls under complex etl which means Option A is correct in the documentation didn't mention to do otherwise in this scenario.
upvoted 1 times
...
PrashantTiwari
11 months, 1 week ago
A is correct
upvoted 1 times
...
vikrampatel5
11 months, 4 weeks ago
Selected Answer: A
Option A: https://docs.databricks.com/en/clusters/cluster-config-best-practices.html#complex-batch-etl
upvoted 3 times
...
RafaelCFC
1 year ago
Selected Answer: A
robson90's response explains it perfectly and has documentation to support it.
upvoted 1 times
...
ofed
1 year, 2 months ago
Option A
upvoted 2 times
...
ismoshkov
1 year, 2 months ago
Selected Answer: A
Our goal is top performance. Vertical scaling is more performant rather that horizontal. Especially we know that we need cross VM exchange. Option A.
upvoted 2 times
...
dp_learner
1 year, 2 months ago
response A. as of Complex batch ETL " More complex ETL jobs, such as processing that requires unions and joins across multiple tables, will probably work best when you can minimize the amount of data shuffled. Since reducing the number of workers in a cluster will help minimize shuffles, you should consider a smaller cluster like cluster A in the following diagram over a larger cluster like cluster D. "
upvoted 1 times
dp_learner
1 year, 2 months ago
source = source : https://docs.databricks.com/en/clusters/cluster-config-best-practices.html
upvoted 1 times
...
...
Santitoxic
1 year, 3 months ago
Selected Answer: D
Considering the need for both memory and parallelism, option D seems to offer the best balance between resources and parallel processing. It provides a reasonable amount of memory and cores per Executor while maintaining a sufficient level of parallelism with 4 Executors. This configuration is likely to result in maximum performance for a job with at least one wide transformation.
upvoted 4 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago