Exam Certified Associate Developer for Apache Spark topic 1 question 74 discussion

Actual exam question from Databricks's Certified Associate Developer for Apache Spark

Question #: 74
Topic #: 1

[All Certified Associate Developer for Apache Spark Questions]

Which of the following operations can be used to return a DataFrame with no duplicate rows? Please select the most complete answer.

A. DataFrame.distinct()
B. DataFrame.dropDuplicates() and DataFrame.distinct()
C. DataFrame.dropDuplicates()
D. DataFrame.drop_duplicates()
E. DataFrame.dropDuplicates(), DataFrame.distinct() and DataFrame.drop_duplicates()

Show Suggested Answer

Suggested Answer: E 🗳️

by thanab at Sept. 6, 2023, 4:38 a.m.

Comments

Submit Cancel

Ahlo

9 months ago

Answer E drop_duplicates() is an alias for dropDuplicates() it also work in pyspark

upvoted 1 times

...

azure_bimonster

9 months, 3 weeks ago

Selected Answer: E

it asks "most complete" one, so E would be correct as all these three options would work in pyspark

upvoted 1 times

...

B The most complete answer is B. DataFrame.dropDuplicates() and DataFrame.distinct(). Both DataFrame.distinct() and DataFrame.dropDuplicates() methods in PySpark can be used to return a new DataFrame with duplicate rows removed. The DataFrame.drop_duplicates() method is used in pandas, not in PySpark.

upvoted 1 times

juadaves

1 year, 1 month ago

It should be E, drop_duplicates() works in pyspark too.

upvoted 1 times

...

Exam Certified Associate Developer for Apache Spark All Questions

View all questions & answers for the Certified Associate Developer for Apache Spark exam

Exam Certified Associate Developer for Apache Spark topic 1 question 74 discussion

Comments

Ahlo

azure_bimonster

thanab

juadaves

Get IT Certification

New Version GCP Professional Cloud Architect Certificate & Helpful Information

The 5 Most In-Demand Project Management Certifications of 2019