Exam Certified Associate Developer for Apache Spark All Questions

View all questions & answers for the Certified Associate Developer for Apache Spark exam

Exam Certified Associate Developer for Apache Spark topic 1 question 42 discussion

Actual exam question from Databricks's Certified Associate Developer for Apache Spark

Question #: 42
Topic #: 1

[All Certified Associate Developer for Apache Spark Questions]

The code block shown below contains an error. The code block is intended to create a Python UDF assessPerformanceUDF() using the integer-returning Python function assessPerformance() and apply it to column customerSatisfaction in DataFrame storesDF. Identify the error.
Code block:
assessPerformanceUDF – udf(assessPerformance)
storesDF.withColumn("result", assessPerformanceUDF(col("customerSatisfaction")))

A. The assessPerformance() operation is not properly registered as a UDF.
B. The withColumn() operation is not appropriate here – UDFs should be applied by iterating over rows instead.
C. UDFs can only be applied vie SQL and not through the DataFrame API.
D. The return type of the assessPerformanceUDF() is not specified in the udf() operation.
E. The assessPerformance() operation should be used on column customerSatisfaction rather than the assessPerformanceUDF() operation.

Show Suggested Answer

Suggested Answer: D 🗳️

Community vote distribution

D (70%)

A (30%)

by 4be8126 at May 1, 2023, 3:05 p.m.

Comments

Submit Cancel

ZSun

Highly Voted 1 year, 10 months ago

The right answer is D. pyspark.sql.functions.udf(f=None, returnType=StringType) The default return type is string, but this question requires integer returning. so it should be D. "The return type of the assessPerformanceUDF() is not specified in the udf() operation."

upvoted 11 times

jds0

8 months, 2 weeks ago

Good explanation for Answer being D. Thank you!

upvoted 2 times

...

jds0

Most Recent 8 months, 2 weeks ago

Selected Answer: D

D is the right answer as otherwise the return type is the default StringType(). Test code below: from pyspark.sql import SparkSession from pyspark.sql.functions import col, udf from pyspark.sql.types import IntegerType spark = SparkSession.builder.appName("MyApp").getOrCreate() data = [ (0, 3, "A"), (1, 1, "A"), (2, 2, "A"), ] storesDF = spark.createDataFrame(data, ["storeID", "customerSatisfaction", "division"]) def assessPerformance(x): return 1 if x > 3 else 0 print("IntegerType()") assessPerformanceUDF = udf(assessPerformance, IntegerType()) df = storesDF.withColumn("result", assessPerformanceUDF(col("customerSatisfaction"))) df.printSchema() print("Default") assessPerformanceUDF = udf(assessPerformance) df = storesDF.withColumn("result", assessPerformanceUDF(col("customerSatisfaction"))) df.printSchema()

upvoted 3 times

...

Raheel_te

9 months, 2 weeks ago

correct answer is D

upvoted 1 times

...

juliom6

1 year, 5 months ago

Selected Answer: D

It is necessary to inform the return type as IntegerType(). from pyspark.sql.functions import udf, col from pyspark.sql.types import IntegerType storesDF = spark.createDataFrame([('1', '123'), ('2', '234')], ['id', 'customerSatisfaction']) assessPerformance = lambda x: int(x) assessPerformanceUDF = udf(assessPerformance, IntegerType()) storesDF.withColumn('result', assessPerformanceUDF(col('customerSatisfaction'))).printSchema()

upvoted 1 times

...

Singh_Sumit

1 year, 6 months ago

| 1. When `f` is a Python function: | | `returnType` defaults to string type and can be optionally specified. The produced | object must match the specified type. In this case, this API works as if | `register(name, f, returnType=StringType())`.

upvoted 2 times

...

thanab

1 year, 6 months ago

Selected Answer: D

The error in the code block is that the return type of the assessPerformanceUDF() is not specified in the udf() operation. In PySpark, when you register a Python function as a UDF, you should also specify the return type. This is important because Spark SQL needs to understand the return type to properly handle the UDF. Therefore, the correct answer is:

upvoted 2 times

...

cookiemonster42

1 year, 8 months ago

Selected Answer: D

if they mean that - is =, then we need a second parameter, the output type. so, D is the answe

upvoted 1 times

...

Deuterium

1 year, 9 months ago

Right answer is D, return type has to be specified into udf() or it will return StringType by default, the code should be : function_UDF = udf(function, returnType=IntegerType())

upvoted 2 times

...

4be8126

1 year, 11 months ago

Selected Answer: A

The error in the code block is A. The function assessPerformance() needs to be passed as a parameter to the udf() operation in order to create a UDF from it. The correct code block should be: assessPerformanceUDF = udf(assessPerformance) storesDF.withColumn("result", assessPerformanceUDF(col(

upvoted 3 times

ZSun

1 year, 10 months ago

what is the difference between your code and question itsefl? assessPerformanceUDF – udf(assessPerformance) assessPerformanceUDF = udf(assessPerformance) changing "-" to "="?

upvoted 3 times

...

Exam Certified Associate Developer for Apache Spark All Questions

View all questions & answers for the Certified Associate Developer for Apache Spark exam

Exam Certified Associate Developer for Apache Spark topic 1 question 42 discussion

Comments

ZSun

jds0

jds0

Raheel_te

juliom6

Singh_Sumit

thanab

cookiemonster42

Deuterium

4be8126

ZSun

SAA-C03