Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Associate Developer for Apache Spark All Questions

View all questions & answers for the Certified Associate Developer for Apache Spark exam

Exam Certified Associate Developer for Apache Spark topic 1 question 70 discussion

The code block shown below contains an error. The code block is intended to return a new DataFrame where column managerName from DataFrame storesDF is split at the space character into column managerFirstName and column managerLastName. Identify the error.

A sample of DataFrame storesDF is displayed below:



Code block:

storesDF.withColumn("managerFirstName", col("managerName").split(" ").getItem(0))
.withColumn("managerLastName", col("managerName").split(" ").getItem(1))

  • A. The index values of 0 and 1 are not correct – they should be 1 and 2, respectively.
  • B. The index values of 0 and 1 should be provided as second arguments to the split() operation rather than indexing the result.
  • C. The split() operation comes from the imported functions object. It accepts a string column name and split character as arguments. It is not a method of a Column object.
  • D. The split() operation comes from the imported functions object. It accepts a Column object and split character as arguments. It is not a method of a Column object.
  • E. The withColumn operation cannot be called twice in a row.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
Sowwy1
7 months, 2 weeks ago
D. The split() operation comes from the imported functions object. It accepts a Column object and split character as arguments. It is not a method of a Column object.
upvoted 1 times
...
Ahlo
9 months ago
Answer C pyspark.sql.functions provides a function split() to split DataFrame string Column into multiple columns. https://sparkbyexamples.com/pyspark/pyspark-split-dataframe-column-into-multiple-columns/
upvoted 1 times
...
newusername
1 year ago
Selected Answer: C
I think it is C data = [ ("John Smith",), ("Jane Doe",), ("Mike Johnson",) ] df = spark.createDataFrame(data, ["managerName"]) df.show() df = df.withColumn("managerFirstName", split(col("managerName"), " ").getItem(0)) \ .withColumn("managerLastName", split(col("managerName"), " ").getItem(1)) df.show()
upvoted 2 times
cd6a625
4 months, 2 weeks ago
in your example, your are using split( col("managerName"), ... ) and not split("managerName", ...) <- means that answer is D
upvoted 1 times
...
...
zozoshanky
1 year, 3 months ago
Can be C as an answer too.
upvoted 1 times
cookiemonster42
1 year, 3 months ago
But you have to pass a column as an object, not a string. you have to use col() expression. So D is the right one.
upvoted 4 times
65bd33e
6 months, 3 weeks ago
Yes, I agree with you, D is correct we have to pass a column as an object
upvoted 1 times
...
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...