Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Associate Developer for Apache Spark All Questions

View all questions & answers for the Certified Associate Developer for Apache Spark exam

Exam Certified Associate Developer for Apache Spark topic 1 question 42 discussion

The code block shown below contains an error. The code block is intended to create a Python UDF assessPerformanceUDF() using the integer-returning Python function assessPerformance() and apply it to column customerSatisfaction in DataFrame storesDF. Identify the error.
Code block:
assessPerformanceUDF – udf(assessPerformance)
storesDF.withColumn("result", assessPerformanceUDF(col("customerSatisfaction")))

  • A. The assessPerformance() operation is not properly registered as a UDF.
  • B. The withColumn() operation is not appropriate here – UDFs should be applied by iterating over rows instead.
  • C. UDFs can only be applied vie SQL and not through the DataFrame API.
  • D. The return type of the assessPerformanceUDF() is not specified in the udf() operation.
  • E. The assessPerformance() operation should be used on column customerSatisfaction rather than the assessPerformanceUDF() operation.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
ZSun
Highly Voted 1 year, 5 months ago
The right answer is D. pyspark.sql.functions.udf(f=None, returnType=StringType) The default return type is string, but this question requires integer returning. so it should be D. "The return type of the assessPerformanceUDF() is not specified in the udf() operation."
upvoted 10 times
jds0
4 months ago
Good explanation for Answer being D. Thank you!
upvoted 2 times
...
...
jds0
Most Recent 4 months ago
Selected Answer: D
D is the right answer as otherwise the return type is the default StringType(). Test code below: from pyspark.sql import SparkSession from pyspark.sql.functions import col, udf from pyspark.sql.types import IntegerType spark = SparkSession.builder.appName("MyApp").getOrCreate() data = [ (0, 3, "A"), (1, 1, "A"), (2, 2, "A"), ] storesDF = spark.createDataFrame(data, ["storeID", "customerSatisfaction", "division"]) def assessPerformance(x): return 1 if x > 3 else 0 print("IntegerType()") assessPerformanceUDF = udf(assessPerformance, IntegerType()) df = storesDF.withColumn("result", assessPerformanceUDF(col("customerSatisfaction"))) df.printSchema() print("Default") assessPerformanceUDF = udf(assessPerformance) df = storesDF.withColumn("result", assessPerformanceUDF(col("customerSatisfaction"))) df.printSchema()
upvoted 2 times
...
Raheel_te
5 months ago
correct answer is D
upvoted 1 times
...
juliom6
1 year ago
Selected Answer: D
It is necessary to inform the return type as IntegerType(). from pyspark.sql.functions import udf, col from pyspark.sql.types import IntegerType storesDF = spark.createDataFrame([('1', '123'), ('2', '234')], ['id', 'customerSatisfaction']) assessPerformance = lambda x: int(x) assessPerformanceUDF = udf(assessPerformance, IntegerType()) storesDF.withColumn('result', assessPerformanceUDF(col('customerSatisfaction'))).printSchema()
upvoted 1 times
...
Singh_Sumit
1 year, 1 month ago
| 1. When `f` is a Python function: | | `returnType` defaults to string type and can be optionally specified. The produced | object must match the specified type. In this case, this API works as if | `register(name, f, returnType=StringType())`.
upvoted 2 times
...
thanab
1 year, 2 months ago
Selected Answer: D
The error in the code block is that the return type of the assessPerformanceUDF() is not specified in the udf() operation. In PySpark, when you register a Python function as a UDF, you should also specify the return type. This is important because Spark SQL needs to understand the return type to properly handle the UDF. Therefore, the correct answer is:
upvoted 2 times
...
cookiemonster42
1 year, 3 months ago
Selected Answer: D
if they mean that - is =, then we need a second parameter, the output type. so, D is the answe
upvoted 1 times
...
Deuterium
1 year, 4 months ago
Right answer is D, return type has to be specified into udf() or it will return StringType by default, the code should be : function_UDF = udf(function, returnType=IntegerType())
upvoted 2 times
...
4be8126
1 year, 6 months ago
Selected Answer: A
The error in the code block is A. The function assessPerformance() needs to be passed as a parameter to the udf() operation in order to create a UDF from it. The correct code block should be: assessPerformanceUDF = udf(assessPerformance) storesDF.withColumn("result", assessPerformanceUDF(col(
upvoted 3 times
ZSun
1 year, 5 months ago
what is the difference between your code and question itsefl? assessPerformanceUDF – udf(assessPerformance) assessPerformanceUDF = udf(assessPerformance) changing "-" to "="?
upvoted 3 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...