Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Associate Developer for Apache Spark All Questions

View all questions & answers for the Certified Associate Developer for Apache Spark exam

Exam Certified Associate Developer for Apache Spark topic 1 question 118 discussion

Which of the following describes the difference between DataFrame.repartition(n) and DataFrame.coalesce(n)?

  • A. DataFrame.repartition(n) will split a DataFrame into n number of new partitions with data distributed evenly.
    DataFrame.coalesce(n) will more quickly combine the existing partitions of a DataFrame but might result in an uneven distribution of data across the new partitions.
  • B. While the results are similar, DataFrame.repartition(n) will be more efficient than DataFrame.coalesce(n) because it can partition a Data Frame by the column.
  • C. DataFrame.repartition(n) will split a Data Frame into any number of new partitions while minimizing shuffling.
    DataFrame.coalesce(n) will split a DataFrame onto any number of new partitions utilizing a full shuffle.
  • D. While the results are similar, DataFrame.repartition(n) will be less efficient than DataFrame.coalesce(n) because it can partition a Data Frame by the column.
  • E. DataFrame.repartition(n) will combine the existing partitions of a DataFrame but may result in an uneven distribution of data across the new partitions.
    DataFrame.coalesce(n) will more slowly split a Data Frame into n number of new partitions with data distributed evenly.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
58470e1
1 week, 3 days ago
Selected Answer: A
A
upvoted 1 times
...
SaiPavan10
7 months, 3 weeks ago
Selected Answer: A
A is the right choice
upvoted 2 times
...
siva1280
7 months, 4 weeks ago
A is correct
upvoted 2 times
...
saryu
9 months, 3 weeks ago
It's A
upvoted 1 times
...
thanab
1 year, 2 months ago
A The correct answer is A. DataFrame.repartition(n) will split a DataFrame into n number of new partitions with data distributed evenly. DataFrame.coalesce(n) will more quickly combine the existing partitions of a DataFrame but might result in an uneven distribution of data across the new partitions. The `repartition()` method can be used to increase or decrease the number of partitions in a DataFrame, while the `coalesce()` method is used to only decrease the number of partitions in an efficient way². The `repartition()` method does a full shuffle and creates new partitions with data that's distributed evenly. On the other hand, `coalesce()` avoids a full shuffle by allowing only the reduction of partitions.
upvoted 4 times
...
cookiemonster42
1 year, 3 months ago
IMO it's A: B - repartition is less efficient because it involves shuffling - ->false C - same for the B reason --> false D - it's because of shuffling, not because of some column --> false E - coalesce if more fast --> false E -
upvoted 2 times
gwq1968
1 year, 3 months ago
A is correct
upvoted 1 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...