Exam Certified Associate Developer for Apache Spark All Questions

View all questions & answers for the Certified Associate Developer for Apache Spark exam

Go to Exam

Exam Certified Associate Developer for Apache Spark topic 1 question 118 discussion

Actual exam question from Databricks's Certified Associate Developer for Apache Spark

Question #: 118
Topic #: 1

[All Certified Associate Developer for Apache Spark Questions]

Which of the following describes the difference between DataFrame.repartition(n) and DataFrame.coalesce(n)?

A. DataFrame.repartition(n) will split a DataFrame into n number of new partitions with data distributed evenly.
DataFrame.coalesce(n) will more quickly combine the existing partitions of a DataFrame but might result in an uneven distribution of data across the new partitions.
B. While the results are similar, DataFrame.repartition(n) will be more efficient than DataFrame.coalesce(n) because it can partition a Data Frame by the column.
C. DataFrame.repartition(n) will split a Data Frame into any number of new partitions while minimizing shuffling.
DataFrame.coalesce(n) will split a DataFrame onto any number of new partitions utilizing a full shuffle.
D. While the results are similar, DataFrame.repartition(n) will be less efficient than DataFrame.coalesce(n) because it can partition a Data Frame by the column.
E. DataFrame.repartition(n) will combine the existing partitions of a DataFrame but may result in an uneven distribution of data across the new partitions.
DataFrame.coalesce(n) will more slowly split a Data Frame into n number of new partitions with data distributed evenly.

Show Suggested Answer

Suggested Answer: A 🗳️

by cookiemonster42 at Aug. 2, 2023, 8:57 p.m.

Comments

Submit Cancel

58470e1

8 months ago

Selected Answer: A

upvoted 1 times

...

SaiPavan10

1 year, 3 months ago

Selected Answer: A

A is the right choice

upvoted 2 times

...

siva1280

1 year, 3 months ago

A is correct

upvoted 2 times

...

saryu

1 year, 5 months ago

It's A

upvoted 1 times

...

A The correct answer is A. DataFrame.repartition(n) will split a DataFrame into n number of new partitions with data distributed evenly. DataFrame.coalesce(n) will more quickly combine the existing partitions of a DataFrame but might result in an uneven distribution of data across the new partitions. The `repartition()` method can be used to increase or decrease the number of partitions in a DataFrame, while the `coalesce()` method is used to only decrease the number of partitions in an efficient way². The `repartition()` method does a full shuffle and creates new partitions with data that's distributed evenly. On the other hand, `coalesce()` avoids a full shuffle by allowing only the reduction of partitions.

upvoted 4 times

...

cookiemonster42

1 year, 11 months ago

IMO it's A: B - repartition is less efficient because it involves shuffling - ->false C - same for the B reason --> false D - it's because of shuffling, not because of some column --> false E - coalesce if more fast --> false E -

upvoted 2 times

gwq1968

1 year, 11 months ago

A is correct

upvoted 1 times

...

Exam Certified Associate Developer for Apache Spark All Questions

View all questions & answers for the Certified Associate Developer for Apache Spark exam

Exam Certified Associate Developer for Apache Spark topic 1 question 118 discussion

Comments

58470e1

SaiPavan10

siva1280

saryu

thanab

cookiemonster42

gwq1968