Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.

Unlimited Access

Get Unlimited Contributor Access to the all ExamTopics Exams!
Take advantage of PDF Files for 1000+ Exams along with community discussions and pass IT Certification Exams Easily.

Exam Certified Associate Developer for Apache Spark topic 1 question 42 discussion

The code block shown below contains an error. The code block is intended to create a Python UDF assessPerformanceUDF() using the integer-returning Python function assessPerformance() and apply it to column customerSatisfaction in DataFrame storesDF. Identify the error.
Code block:
assessPerformanceUDF – udf(assessPerformance)
storesDF.withColumn("result", assessPerformanceUDF(col("customerSatisfaction")))

  • A. The assessPerformance() operation is not properly registered as a UDF.
  • B. The withColumn() operation is not appropriate here – UDFs should be applied by iterating over rows instead.
  • C. UDFs can only be applied vie SQL and not through the DataFrame API.
  • D. The return type of the assessPerformanceUDF() is not specified in the udf() operation.
  • E. The assessPerformance() operation should be used on column customerSatisfaction rather than the assessPerformanceUDF() operation.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
ZSun
Highly Voted 1 year, 3 months ago
The right answer is D. pyspark.sql.functions.udf(f=None, returnType=StringType) The default return type is string, but this question requires integer returning. so it should be D. "The return type of the assessPerformanceUDF() is not specified in the udf() operation."
upvoted 9 times
jds0
1 month, 2 weeks ago
Good explanation for Answer being D. Thank you!
upvoted 2 times
...
...
jds0
Most Recent 1 month, 2 weeks ago
Selected Answer: D
D is the right answer as otherwise the return type is the default StringType(). Test code below: from pyspark.sql import SparkSession from pyspark.sql.functions import col, udf from pyspark.sql.types import IntegerType spark = SparkSession.builder.appName("MyApp").getOrCreate() data = [ (0, 3, "A"), (1, 1, "A"), (2, 2, "A"), ] storesDF = spark.createDataFrame(data, ["storeID", "customerSatisfaction", "division"]) def assessPerformance(x): return 1 if x > 3 else 0 print("IntegerType()") assessPerformanceUDF = udf(assessPerformance, IntegerType()) df = storesDF.withColumn("result", assessPerformanceUDF(col("customerSatisfaction"))) df.printSchema() print("Default") assessPerformanceUDF = udf(assessPerformance) df = storesDF.withColumn("result", assessPerformanceUDF(col("customerSatisfaction"))) df.printSchema()
upvoted 1 times
...
Raheel_te
2 months, 1 week ago
correct answer is D
upvoted 1 times
...
juliom6
10 months, 1 week ago
Selected Answer: D
It is necessary to inform the return type as IntegerType(). from pyspark.sql.functions import udf, col from pyspark.sql.types import IntegerType storesDF = spark.createDataFrame([('1', '123'), ('2', '234')], ['id', 'customerSatisfaction']) assessPerformance = lambda x: int(x) assessPerformanceUDF = udf(assessPerformance, IntegerType()) storesDF.withColumn('result', assessPerformanceUDF(col('customerSatisfaction'))).printSchema()
upvoted 1 times
...
Singh_Sumit
11 months, 1 week ago
| 1. When `f` is a Python function: | | `returnType` defaults to string type and can be optionally specified. The produced | object must match the specified type. In this case, this API works as if | `register(name, f, returnType=StringType())`.
upvoted 2 times
...
thanab
11 months, 3 weeks ago
Selected Answer: D
The error in the code block is that the return type of the assessPerformanceUDF() is not specified in the udf() operation. In PySpark, when you register a Python function as a UDF, you should also specify the return type. This is important because Spark SQL needs to understand the return type to properly handle the UDF. Therefore, the correct answer is:
upvoted 2 times
...
cookiemonster42
1 year, 1 month ago
Selected Answer: D
if they mean that - is =, then we need a second parameter, the output type. so, D is the answe
upvoted 1 times
...
Deuterium
1 year, 2 months ago
Right answer is D, return type has to be specified into udf() or it will return StringType by default, the code should be : function_UDF = udf(function, returnType=IntegerType())
upvoted 2 times
...
4be8126
1 year, 4 months ago
Selected Answer: A
The error in the code block is A. The function assessPerformance() needs to be passed as a parameter to the udf() operation in order to create a UDF from it. The correct code block should be: assessPerformanceUDF = udf(assessPerformance) storesDF.withColumn("result", assessPerformanceUDF(col(
upvoted 3 times
ZSun
1 year, 3 months ago
what is the difference between your code and question itsefl? assessPerformanceUDF – udf(assessPerformance) assessPerformanceUDF = udf(assessPerformance) changing "-" to "="?
upvoted 3 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...