Which of the following describes the difference between DataFrame.repartition(n) and DataFrame.coalesce(n)?
A.
DataFrame.repartition(n) will split a DataFrame into n number of new partitions with data distributed evenly. DataFrame.coalesce(n) will more quickly combine the existing partitions of a DataFrame but might result in an uneven distribution of data across the new partitions.
B.
While the results are similar, DataFrame.repartition(n) will be more efficient than DataFrame.coalesce(n) because it can partition a Data Frame by the column.
C.
DataFrame.repartition(n) will split a Data Frame into any number of new partitions while minimizing shuffling. DataFrame.coalesce(n) will split a DataFrame onto any number of new partitions utilizing a full shuffle.
D.
While the results are similar, DataFrame.repartition(n) will be less efficient than DataFrame.coalesce(n) because it can partition a Data Frame by the column.
E.
DataFrame.repartition(n) will combine the existing partitions of a DataFrame but may result in an uneven distribution of data across the new partitions. DataFrame.coalesce(n) will more slowly split a Data Frame into n number of new partitions with data distributed evenly.
A
The correct answer is A. DataFrame.repartition(n) will split a DataFrame into n number of new partitions with data distributed evenly. DataFrame.coalesce(n) will more quickly combine the existing partitions of a DataFrame but might result in an uneven distribution of data across the new partitions. The `repartition()` method can be used to increase or decrease the number of partitions in a DataFrame, while the `coalesce()` method is used to only decrease the number of partitions in an efficient way². The `repartition()` method does a full shuffle and creates new partitions with data that's distributed evenly. On the other hand, `coalesce()` avoids a full shuffle by allowing only the reduction of partitions.
IMO it's A:
B - repartition is less efficient because it involves shuffling - ->false
C - same for the B reason --> false
D - it's because of shuffling, not because of some column --> false
E - coalesce if more fast --> false
E -
A voting comment increases the vote count for the chosen answer by one.
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one.
So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
58470e1
1 week, 3 days agoSaiPavan10
7 months, 3 weeks agosiva1280
7 months, 4 weeks agosaryu
9 months, 3 weeks agothanab
1 year, 2 months agocookiemonster42
1 year, 3 months agogwq1968
1 year, 3 months ago