Exam Certified Associate Developer for Apache Spark All Questions

View all questions & answers for the Certified Associate Developer for Apache Spark exam

Exam Certified Associate Developer for Apache Spark topic 1 question 19 discussion

Actual exam question from Databricks's Certified Associate Developer for Apache Spark

Question #: 19
Topic #: 1

[All Certified Associate Developer for Apache Spark Questions]

The code block shown below contains an error. The code block is intended to return a DataFrame containing all columns from DataFrame storesDF except for column sqft and column customerSatisfaction. Identify the error.
Code block:
storesDF.drop(sqft, customerSatisfaction)

A. The drop() operation only works if one column name is called at a time – there should be two calls in succession like storesDF.drop("sqft").drop("customerSatisfaction").
B. The drop() operation only works if column names are wrapped inside the col() function like storesDF.drop(col(sqft), col(customerSatisfaction)).
C. There is no drop() operation for storesDF.
D. The sqft and customerSatisfaction column names should be quoted like "sqft" and "customerSatisfaction".
E. The sqft and customerSatisfaction column names should be subset from the DataFrame storesDF like storesDF."sqft" and storesDF."customerSatisfaction".

Show Suggested Answer

Suggested Answer: D 🗳️

by 4be8126 at April 26, 2023, 8:29 a.m.

Comments

Submit Cancel

4be8126

Highly Voted 1 year, 7 months ago

Selected Answer: D

The error in the code block is that the column names sqft and customerSatisfaction should be quoted, like "sqft" and "customerSatisfaction", since they are strings. The correct code block should be: storesDF.drop("sqft", "customerSatisfaction") Option D correctly identifies this error.

upvoted 5 times

ZSun

1 year, 5 months ago

The correct one is B: storesDF.drop("sqft").drop("customerSatisfaction") For D, it should be list of column name: storesDF.drop(["sqft", "customerSatisfaction"])

upvoted 1 times

ZSun

1 year, 5 months ago

The correct one is D, but my explanation is correct

upvoted 1 times

...

azurearch

Most Recent 8 months, 3 weeks ago

sorry, Option D is correct

upvoted 1 times

...

azurearch

8 months, 3 weeks ago

https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.drop.html option A is correct, drop expects only one argument, if its more than one, you would have to use as listofcols=['col1','col2'] and drop(*listofcols)

upvoted 1 times

...

zozoshanky

1 year, 4 months ago

D is correct, df.drop('id','firstname').show() tested code

upvoted 1 times

...

TmData

1 year, 5 months ago

Selected Answer: D

When using the drop() operation in Spark DataFrame, the column names should be specified as strings and enclosed in quotes. In the given code block, the column names "sqft" and "customerSatisfaction" are not quoted, which results in a syntax error.

upvoted 1 times

...

Exam Certified Associate Developer for Apache Spark All Questions

View all questions & answers for the Certified Associate Developer for Apache Spark exam

Exam Certified Associate Developer for Apache Spark topic 1 question 19 discussion

Comments

4be8126

ZSun

ZSun

azurearch

azurearch

zozoshanky

TmData

Get IT Certification

New Version GCP Professional Cloud Architect Certificate & Helpful Information

The 5 Most In-Demand Project Management Certifications of 2019