Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 26 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 26
Topic #: 1
[All Certified Data Engineer Professional Questions]

Each configuration below is identical to the extent that each cluster has 400 GB total of RAM, 160 total cores and only one Executor per VM.
Given a job with at least one wide transformation, which of the following cluster configurations will result in maximum performance?

  • A. • Total VMs; 1
    • 400 GB per Executor
    • 160 Cores / Executor
  • B. • Total VMs: 8
    • 50 GB per Executor
    • 20 Cores / Executor
  • C. • Total VMs: 16
    • 25 GB per Executor
    • 10 Cores/Executor
  • D. • Total VMs: 4
    • 100 GB per Executor
    • 40 Cores/Executor
  • E. • Total VMs:2
    • 200 GB per Executor
    • 80 Cores / Executor
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
robson90
Highly Voted 1 year, 3 months ago
Option A, question is about maximum performance. Wide transformation will result in often expensive shuffle. With one executor this problem will be resolved. https://docs.databricks.com/en/clusters/cluster-config-best-practices.html#complex-batch-etl
upvoted 41 times
dp_learner
1 year ago
source : https://docs.databricks.com/en/clusters/cluster-config-best-practices.html
upvoted 3 times
...
...
AndreFR
Most Recent 1 week, 6 days ago
Selected Answer: B
Besides that A & E do not provide enough parallelism & fault tolerance, I can't explain why, but the correct answer is B. I got the same question during the exam and got 100% at tooling with answer B. (B is the answer provided by other sites similar to examtopics) Choosing between B, C & D is tricky !
upvoted 2 times
...
kimberlyvsmith
2 weeks, 4 days ago
Selected Answer: B
B "Number of workers Choosing the right number of workers requires some trials and iterations to figure out the compute and memory needs of a Spark job. Here are some guidelines to help you start: Never choose a single worker for a production job, as it will be the single point for failure Start with 2-4 workers for small workloads (for example, a job with no wide transformations like joins and aggregations) Start with 8-10 workers for medium to big workloads that involve wide transformations like joins and aggregations, then scale up if necessary"
upvoted 2 times
benni_ale
2 weeks, 3 days ago
https://www.databricks.com/discover/pages/optimize-data-workloads-guide
upvoted 1 times
...
...
arik90
8 months ago
Selected Answer: A
Wide transformation falls under complex etl which means Option A is correct in the documentation didn't mention to do otherwise in this scenario.
upvoted 1 times
...
PrashantTiwari
9 months, 2 weeks ago
A is correct
upvoted 1 times
...
vikrampatel5
10 months, 1 week ago
Selected Answer: A
Option A: https://docs.databricks.com/en/clusters/cluster-config-best-practices.html#complex-batch-etl
upvoted 3 times
...
RafaelCFC
10 months, 3 weeks ago
Selected Answer: A
robson90's response explains it perfectly and has documentation to support it.
upvoted 1 times
...
ofed
1 year ago
Option A
upvoted 2 times
...
ismoshkov
1 year ago
Selected Answer: A
Our goal is top performance. Vertical scaling is more performant rather that horizontal. Especially we know that we need cross VM exchange. Option A.
upvoted 2 times
...
dp_learner
1 year ago
response A. as of Complex batch ETL " More complex ETL jobs, such as processing that requires unions and joins across multiple tables, will probably work best when you can minimize the amount of data shuffled. Since reducing the number of workers in a cluster will help minimize shuffles, you should consider a smaller cluster like cluster A in the following diagram over a larger cluster like cluster D. "
upvoted 1 times
dp_learner
1 year ago
source = source : https://docs.databricks.com/en/clusters/cluster-config-best-practices.html
upvoted 1 times
...
...
Santitoxic
1 year, 2 months ago
Selected Answer: D
Considering the need for both memory and parallelism, option D seems to offer the best balance between resources and parallel processing. It provides a reasonable amount of memory and cores per Executor while maintaining a sufficient level of parallelism with 4 Executors. This configuration is likely to result in maximum performance for a job with at least one wide transformation.
upvoted 4 times
...
mwyopme
1 year, 2 months ago
Sorry Response C = 16VM for maximing Wide Transformation
upvoted 2 times
...
mwyopme
1 year, 2 months ago
Key message is : Given a job with at least one wide transformation Performance, should max the number of concurrent VM, Selecting response B. 160/10 = 16 VM
upvoted 1 times
...
taif12340
1 year, 3 months ago
Selected Answer: D
Considering the need for both memory and parallelism, option D seems to offer the best balance between resources and parallel processing. It provides a reasonable amount of memory and cores per Executor while maintaining a sufficient level of parallelism with 4 Executors. This configuration is likely to result in maximum performance for a job with at least one wide transformation.
upvoted 1 times
...
BrianNguyen95
1 year, 3 months ago
correct answer is E: Option E provides a substantial amount of memory and cores per executor, allowing the job to handle wide transformations efficiently. However, performance can also be influenced by factors like the nature of your specific workload, data distribution, and overall cluster utilization. It's a good practice to conduct benchmarking and performance testing with various configurations to determine the optimal setup for your specific use case.
upvoted 1 times
...
stuart_gta1
1 year, 3 months ago
C. More VMs helps to distribute the workload across the cluster, which results in better fault tolerance and increase the chances of job completion.
upvoted 2 times
...
asmayassineg
1 year, 3 months ago
answer should be E. if at least one transformation is wide, so 1 executor of 200GB can do the job, rest of tasks can be carried out on the other node
upvoted 1 times
8605246
1 year, 3 months ago
would it be fault-tolerant?
upvoted 1 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...