Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Associate Developer for Apache Spark All Questions

View all questions & answers for the Certified Associate Developer for Apache Spark exam

Exam Certified Associate Developer for Apache Spark topic 1 question 19 discussion

The code block shown below contains an error. The code block is intended to return a DataFrame containing all columns from DataFrame storesDF except for column sqft and column customerSatisfaction. Identify the error.
Code block:
storesDF.drop(sqft, customerSatisfaction)

  • A. The drop() operation only works if one column name is called at a time – there should be two calls in succession like storesDF.drop("sqft").drop("customerSatisfaction").
  • B. The drop() operation only works if column names are wrapped inside the col() function like storesDF.drop(col(sqft), col(customerSatisfaction)).
  • C. There is no drop() operation for storesDF.
  • D. The sqft and customerSatisfaction column names should be quoted like "sqft" and "customerSatisfaction".
  • E. The sqft and customerSatisfaction column names should be subset from the DataFrame storesDF like storesDF."sqft" and storesDF."customerSatisfaction".
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
4be8126
Highly Voted 1 year, 7 months ago
Selected Answer: D
The error in the code block is that the column names sqft and customerSatisfaction should be quoted, like "sqft" and "customerSatisfaction", since they are strings. The correct code block should be: storesDF.drop("sqft", "customerSatisfaction") Option D correctly identifies this error.
upvoted 5 times
ZSun
1 year, 5 months ago
The correct one is B: storesDF.drop("sqft").drop("customerSatisfaction") For D, it should be list of column name: storesDF.drop(["sqft", "customerSatisfaction"])
upvoted 1 times
ZSun
1 year, 5 months ago
The correct one is D, but my explanation is correct
upvoted 1 times
...
...
...
azurearch
Most Recent 8 months, 3 weeks ago
sorry, Option D is correct
upvoted 1 times
...
azurearch
8 months, 3 weeks ago
https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.drop.html option A is correct, drop expects only one argument, if its more than one, you would have to use as listofcols=['col1','col2'] and drop(*listofcols)
upvoted 1 times
...
zozoshanky
1 year, 4 months ago
D is correct, df.drop('id','firstname').show() tested code
upvoted 1 times
...
TmData
1 year, 5 months ago
Selected Answer: D
When using the drop() operation in Spark DataFrame, the column names should be specified as strings and enclosed in quotes. In the given code block, the column names "sqft" and "customerSatisfaction" are not quoted, which results in a syntax error.
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...