Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Associate Developer for Apache Spark All Questions

View all questions & answers for the Certified Associate Developer for Apache Spark exam

Exam Certified Associate Developer for Apache Spark topic 1 question 26 discussion

Which of the following code blocks returns a new DataFrame where column productCategories only has one word per row, resulting in a DataFrame with many more rows than DataFrame storesDF?
A sample of storesDF is displayed below:

  • A. storesDF.withColumn("productCategories", explode(col("productCategories")))
  • B. storesDF.withColumn("productCategories", split(col("productCategories")))
  • C. storesDF.withColumn("productCategories", col("productCategories").explode())
  • D. storesDF.withColumn("productCategories", col("productCategories").split())
  • E. storesDF.withColumn("productCategories", explode("productCategories"))
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
jds0
4 months ago
Selected Answer: A
Both option A and E work with spark 3.5.1. But A is better for backward compatibility. See code example below: from pyspark.sql import SparkSession from pyspark.sql.functions import col, explode spark = SparkSession.builder.appName("MyApp").getOrCreate() data = [ (0, ["value 1", "value 2", "value 3"]), (1, ["value 1", "value 2", "value 3"]), (2, ["value 1", "value 2", "value 3"]), ] storesDF = spark.createDataFrame(data, ["storeID", "productCategories"]) storesDF.withColumn("productCategories", explode(col("productCategories"))).show() # A. storesDF.withColumn("productCategories", explode("productCategories")).show() # E.
upvoted 1 times
...
bettermakeme
7 months, 3 weeks ago
A and E are correct
upvoted 1 times
...
arturffsi
8 months, 3 weeks ago
Selected Answer: E
Both A and E are correct according to the new version
upvoted 1 times
...
newusername
1 year, 2 months ago
Selected Answer: A
A is correct, use below code to test: from pyspark.sql import SparkSession # Initializing Spark session spark = SparkSession.builder.appName("test").getOrCreate() # 1. Creating DataFrame with an array column data_array = [ (1, ["electronics", "clothes", "toys"]), (2, ["groceries", "electronics"]), (3, ["books", "clothes"]), ] storesDF = spark.createDataFrame(data_array, ["ID", "productCategories"]) storesDF.show() df_array = storesDF.withColumn("productCategories", explode(col("productCategories"))) df_array.show()
upvoted 3 times
newusername
1 year, 2 months ago
But E works as well, sadly. What has to be chosen then? from pyspark.sql import SparkSession # Initializing Spark session spark = SparkSession.builder.appName("test").getOrCreate() # 1. Creating DataFrame with an array column data_array = [ (1, ["electronics", "clothes", "toys"]), (2, ["groceries", "electronics"]), (3, ["books", "clothes"]), ] storesDF = spark.createDataFrame(data_array, ["ID", "productCategories"]) storesDF.show() #df_array = storesDF.withColumn("productCategories", explode(col("productCategories"))) #df_array.show() #check E df_array = storesDF.withColumn("productCategories", explode("productCategories")) df_array.show()
upvoted 2 times
...
newusername
1 year ago
E for 3.0
upvoted 1 times
...
...
NickWerbung
1 year, 4 months ago
Both A and E are correct.
upvoted 4 times
...
mhaskins
1 year, 6 months ago
Selected Answer: A
While the Explode function allows for a str or Column input, this requires the col() wrapper because it is used in a withColumn() call, where the 2nd parameter requires the column object. https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.withColumn.html?highlight=withcolumn#pyspark.sql.DataFrame.withColumn
upvoted 2 times
...
4be8126
1 year, 7 months ago
Selected Answer: A
Option A is correct: storesDF.withColumn("productCategories", explode(col("productCategories"))). Explanation: The explode function is used to transform a column of arrays or maps into multiple rows, one for each element in the array or map. In this case, productCategories is a column with arrays of strings. The withColumn function is used to add a new column or update an existing column. The first argument is the name of the new or existing column, and the second argument is the expression that defines the values for the column.
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...