Exam Certified Associate Developer for Apache Spark topic 1 question 183 discussion

Actual exam question from Databricks's Certified Associate Developer for Apache Spark

Question #: 183
Topic #: 1

[All Certified Associate Developer for Apache Spark Questions]

Which of the following code blocks returns a DataFrame where column managerName from DataFrame storesDF is split at the space character into column managerFirstName and column managerLastName?

A sample of DataFrame storesDF is displayed below:

A. (storesDF.withColumn("managerFirstName", split(col("managerName"), " ")[0])
.withColumn("managerLastName", split(col("managerName"), " ")[1]))
B. (storesDF.withColumn("managerFirstName", col("managerName"). split(" ")[1])
.withColumn("managerLastName", col("managerName").split(" ")[2]))
C. (storesDF.withColumn("managerFirstName", split(col("managerName"), " ")[1])
.withColumn("managerLastName", split(col("managerName"), " ")[2]))
D. (storesDF.withColumn("managerFirstName", col("managerName").split(" ")[0])
.withColumn("managerLastName", col("managerName").split(" ")[1]))
E. (storesDF.withColumn("managerFirstName", split("managerName"), " ")[0])
.withColumn("managerLastName", split("managerName"), " ")[1]))

Show Suggested Answer

Suggested Answer: A 🗳️

by Oks_An at Sept. 19, 2024, 2:09 p.m.

Comments

Submit Cancel

Souvik_79

5 months, 1 week ago

Selected Answer: A

Explanation: split(col("managerName"), " "): This splits the column managerName into an array of strings based on the space character. Accessing array elements: Using [0] extracts the first element (first name) of the resulting array. Using [1] extracts the second element (last name) of the resulting array. withColumn: The withColumn() method is used to create new columns (managerFirstName and managerLastName) in the DataFrame.

upvoted 2 times

...

thinkbang

7 months, 2 weeks ago

Selected Answer: A

basic documentation

upvoted 2 times

...

max_manfred

8 months, 3 weeks ago

Right answer is A as the array returned by the split function is 0-based not 1-based. You can try yourself with the following code: df = spark.createDataFrame([('John Doe',)], ['Person',]) df2 = df \ .withColumn('first_name', split(col('Person'), ' ')[0]) \ .withColumn('last_name', split(col('Person'), ' ')[1]) df2.show()

upvoted 2 times

...

sofiess

8 months, 4 weeks ago

(storesDF.withColumn("managerFirstName", split(col("managerName"), " ")[0]) .withColumn("managerLastName", split(col("managerName"), " ")[1]))

upvoted 1 times

...