Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Associate Developer for Apache Spark All Questions

View all questions & answers for the Certified Associate Developer for Apache Spark exam

Exam Certified Associate Developer for Apache Spark topic 1 question 25 discussion

Which of the following code blocks returns a DataFrame where column storeCategory from DataFrame storesDF is split at the underscore character into column storeValueCategory and column storeSizeCategory?
A sample of DataFrame storesDF is displayed below:

  • A. (storesDF.withColumn("storeValueCategory", split(col("storeCategory"), "_")[1])
    .withColumn("storeSizeCategory", split(col("storeCategory"), "_")[2]))
  • B. (storesDF.withColumn("storeValueCategory", col("storeCategory").split("_")[0])
    .withColumn("storeSizeCategory", col("storeCategory").split("_")[1]))
  • C. (storesDF.withColumn("storeValueCategory", split(col("storeCategory"), "_")[0])
    .withColumn("storeSizeCategory", split(col("storeCategory"), "_")[1]))
  • D. (storesDF.withColumn("storeValueCategory", split("storeCategory", "_")[0])
    .withColumn("storeSizeCategory", split("storeCategory", "_")[1]))
  • E. (storesDF.withColumn("storeValueCategory", col("storeCategory").split("_")[1])
    .withColumn("storeSizeCategory", col("storeCategory").split("_")[2]))
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
ronfun
Highly Voted 1 year, 7 months ago
Both C or D are correct. Function split accepts both col and str. https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.split.html?highlight=split#pyspark.sql.functions.split
upvoted 8 times
NickWerbung
1 year, 4 months ago
Both C or D are correct!
upvoted 2 times
...
4be8126
1 year, 7 months ago
Option D is not correct because the split function should be used with the col function to split the values in a column. In option D, the split function is used with a string literal rather than a column, which will result in an error.
upvoted 3 times
...
...
jds0
Most Recent 4 months ago
Selected Answer: C
Both C or D work in Spark 3.5.1, but C is probably better for backward compatibility. See code example below: from pyspark.sql import SparkSession from pyspark.sql.functions import split, col spark = SparkSession.builder.appName("MyApp").getOrCreate() data = [ (0, True, 10020, "VALUE_MEDIUM"), (1, True, 10050, "MAINSTREAM_SMALL"), (2, False, 10070, "PREMIUM_LARGE"), ] storesDF = spark.createDataFrame(data, ["storeID", "open", "openDate", "storeCategory"]) (storesDF.withColumn("storeValueCategory", split(col("storeCategory"), "_")[0]).withColumn("storeSizeCategory", split(col("storeCategory"), "_")[1])).show() (storesDF.withColumn("storeValueCategory", split("storeCategory", "_")[0]).withColumn("storeSizeCategory", split("storeCategory", "_")[1])).show()
upvoted 1 times
...
newusername
1 year, 2 months ago
Selected Answer: C
C you can check, by running the code below: from pyspark.sql import SparkSession # Initialize Spark session spark = SparkSession.builder.appName("split_test").getOrCreate() # Create synthetic data data = [ {"storeCategory": "value1_size1"}, {"storeCategory": "value2_size2"}, {"storeCategory": "value3_size3"}, ] storesDF = spark.createDataFrame(data) storesDF.show() from pyspark.sql.functions import split, col # Option C newDF = (storesDF.withColumn("storeValueCategory", split(col("storeCategory"), "_")[0]) .withColumn("storeSizeCategory", split(col("storeCategory"), "_")[1])) newDF.show()
upvoted 2 times
...
zozoshanky
1 year, 3 months ago
c is correct
upvoted 1 times
...
4be8126
1 year, 7 months ago
Selected Answer: C
Option C returns a DataFrame where column storeCategory from DataFrame storesDF is split at the underscore character into column storeValueCategory and column storeSizeCategory. The correct code is: (storesDF.withColumn("storeValueCategory", split(col("storeCategory"), "_")[0]) .withColumn("storeSizeCategory", split(col("storeCategory"), "_")[1])) Explanation: split(col("storeCategory"), "_") splits the values in column storeCategory by the "_" character and returns an array of strings. [0] gets the first element of the resulting array and assigns it to the new column storeValueCategory. [1] gets the second element of the resulting array and assigns it to the new column storeSizeCategory. withColumn is used to create the new columns and returns a new DataFrame.
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...