Exam Certified Associate Developer for Apache Spark All Questions

View all questions & answers for the Certified Associate Developer for Apache Spark exam

Exam Certified Associate Developer for Apache Spark topic 1 question 13 discussion

Actual exam question from Databricks's Certified Associate Developer for Apache Spark

Question #: 13
Topic #: 1

[All Certified Associate Developer for Apache Spark Questions]

Which of the following cluster configurations is most likely to experience an out-of-memory error in response to data skew in a single partition?

Note: each configuration has roughly the same compute power using 100 GB of RAM and 200 cores.

A. Scenario #4
B. Scenario #5
C. Scenario #6
D. More information is needed to determine an answer.
E. Scenario #1

Show Suggested Answer

Suggested Answer: C 🗳️

by Dhruv_Ajmeri at April 3, 2023, 1:30 p.m.

Comments

Submit Cancel

TmData

Highly Voted 1 year, 2 months ago

Selected Answer: C

The most likely scenario to experience an out-of-memory error in response to data skew in a single partition is: C. Scenario #6: 12.5 GB Worker Node, 12.5 GB Executor. 1 Driver & 8 Executors. Explanation: Data skew refers to an uneven distribution of data across partitions. When there is significant skew in a single partition, it can lead to increased memory usage for that specific partition, potentially causing out-of-memory errors. The smaller the available memory per executor, the higher the likelihood of encountering such issues. In this case, Scenario #6 has the smallest worker node and executor configuration, with only 12.5 GB of RAM available for each executor. With 8 executors, the total available memory is still 100 GB (similar to other scenarios), but the reduced memory per executor increases the risk of encountering out-of-memory errors when handling skewed data in a single partition.

upvoted 11 times

...

azurearch

Most Recent 6 months ago

D is correct. even though you have less executor memory in scenario 6, spark will still complete the process , it might take more time to do the shuffle neverthless.

upvoted 2 times

...

Mohitsain

1 year, 2 months ago

Selected Answer: C

This is the right answer.

upvoted 1 times

...

TmData

1 year, 2 months ago

Selected Answer: C

Option A, Scenario #4, has larger worker nodes and executors compared to Scenario #6, reducing the likelihood of encountering out-of-memory errors due to data skew. Option B, Scenario #5, also has larger worker nodes and executors compared to Scenario #6, providing more memory per executor and reducing the risk of out-of-memory errors. Option D states that more information is needed to determine an answer, but based on the available information, Scenario #6 is the most likely to experience out-of-memory errors due to data skew in a single partition. Option E, Scenario #1, has larger worker nodes and executors compared to Scenario #6, reducing the likelihood of out-of-memory errors due to data skew.

upvoted 3 times

...

Indiee

1 year, 4 months ago

Data skew is when you have a few partitions oversized. But due to initial partitioning this large datasets needed to be processed by single threads so can cause OOM

upvoted 3 times

...

Dhruv_Ajmeri

1 year, 5 months ago

Please explain the answer!!

upvoted 1 times

...

Exam Certified Associate Developer for Apache Spark All Questions

View all questions & answers for the Certified Associate Developer for Apache Spark exam

Exam Certified Associate Developer for Apache Spark topic 1 question 13 discussion

Comments

TmData

azurearch

Mohitsain

TmData

Indiee

Dhruv_Ajmeri

SY0-701