Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Associate Developer for Apache Spark All Questions

View all questions & answers for the Certified Associate Developer for Apache Spark exam

Exam Certified Associate Developer for Apache Spark topic 1 question 39 discussion

Which of the following code blocks applies the function assessPerformance() to each row of DataFrame storesDF?

  • A. [assessPerformance(row) for row in storesDF.take(3)]
  • B. [assessPerformance() for row in storesDF]
  • C. storesDF.collect().apply(lambda: assessPerformance)
  • D. [assessPerformance(row) for row in storesDF.collect()]
  • E. [assessPerformance(row) for row in storesDF]
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
jds0
4 months ago
Selected Answer: D
Option D is correct. See example code below: from pyspark.sql import SparkSession spark = SparkSession.builder.appName("MyApp").getOrCreate() data = [ (0, 43161, "A"), (1, 51200, "A"), (2, None, "B"), (3, 78367, "B"), (4, None, "C"), ] storesDF = spark.createDataFrame(data, ["storeID", "sqft", "division"]) def myFunction(row): return row[0] [myFunction(row) for row in storesDF.collect()]
upvoted 1 times
...
ZSun
1 year, 5 months ago
There are many way to apply a function to dataframe. 1. apply, as shown in option D. but it should be apply(assessPerformance) 2. list comprehension: for row in df.collect() 3. foreach 4. map, but for RDD majorly
upvoted 1 times
...
4be8126
1 year, 7 months ago
Selected Answer: D
The correct answer is D. Explanation: Option A uses the take() method to extract three rows from the DataFrame, but it applies the assessPerformance() function to each row outside of the DataFrame context. Option B attempts to apply the assessPerformance() function to each row, but it doesn't reference the row object in any way. Option C tries to apply the assessPerformance() function to the entire DataFrame but does so using an incorrect syntax. Option D correctly applies the assessPerformance() function to each row of the DataFrame using a list comprehension over the result of the collect() method. Option E is similar to D, but it will iterate over rows individually instead of using the collect() method to retrieve all rows at once. While this is still a valid approach, it may be less efficient.
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...