exam questions

Exam Certified Associate Developer for Apache Spark All Questions

View all questions & answers for the Certified Associate Developer for Apache Spark exam

Exam Certified Associate Developer for Apache Spark topic 1 question 13 discussion

Which of the following cluster configurations is most likely to experience an out-of-memory error in response to data skew in a single partition?

Note: each configuration has roughly the same compute power using 100 GB of RAM and 200 cores.

  • A. Scenario #4
  • B. Scenario #5
  • C. Scenario #6
  • D. More information is needed to determine an answer.
  • E. Scenario #1
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
TmData
Highly Voted 1 year, 2 months ago
Selected Answer: C
The most likely scenario to experience an out-of-memory error in response to data skew in a single partition is: C. Scenario #6: 12.5 GB Worker Node, 12.5 GB Executor. 1 Driver & 8 Executors. Explanation: Data skew refers to an uneven distribution of data across partitions. When there is significant skew in a single partition, it can lead to increased memory usage for that specific partition, potentially causing out-of-memory errors. The smaller the available memory per executor, the higher the likelihood of encountering such issues. In this case, Scenario #6 has the smallest worker node and executor configuration, with only 12.5 GB of RAM available for each executor. With 8 executors, the total available memory is still 100 GB (similar to other scenarios), but the reduced memory per executor increases the risk of encountering out-of-memory errors when handling skewed data in a single partition.
upvoted 11 times
...
azurearch
Most Recent 6 months ago
D is correct. even though you have less executor memory in scenario 6, spark will still complete the process , it might take more time to do the shuffle neverthless.
upvoted 2 times
...
Mohitsain
1 year, 2 months ago
Selected Answer: C
This is the right answer.
upvoted 1 times
...
TmData
1 year, 2 months ago
Selected Answer: C
Option A, Scenario #4, has larger worker nodes and executors compared to Scenario #6, reducing the likelihood of encountering out-of-memory errors due to data skew. Option B, Scenario #5, also has larger worker nodes and executors compared to Scenario #6, providing more memory per executor and reducing the risk of out-of-memory errors. Option D states that more information is needed to determine an answer, but based on the available information, Scenario #6 is the most likely to experience out-of-memory errors due to data skew in a single partition. Option E, Scenario #1, has larger worker nodes and executors compared to Scenario #6, reducing the likelihood of out-of-memory errors due to data skew.
upvoted 3 times
...
Indiee
1 year, 4 months ago
Data skew is when you have a few partitions oversized. But due to initial partitioning this large datasets needed to be processed by single threads so can cause OOM
upvoted 3 times
...
Dhruv_Ajmeri
1 year, 5 months ago
Please explain the answer!!
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago