Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Associate Developer for Apache Spark All Questions

View all questions & answers for the Certified Associate Developer for Apache Spark exam

Exam Certified Associate Developer for Apache Spark topic 1 question 21 discussion

Which of the following code blocks returns a DataFrame containing only the rows from DataFrame storesDF where the value in column sqft is less than or equal to 25,000 OR the value in column customerSatisfaction is greater than or equal to 30?

  • A. storesDF.filter(col("sqft") <= 25000 | col("customerSatisfaction") >= 30)
  • B. storesDF.filter(col("sqft") <= 25000 or col("customerSatisfaction") >= 30)
  • C. storesDF.filter(sqft <= 25000 or customerSatisfaction >= 30)
  • D. storesDF.filter(col(sqft) <= 25000 | col(customerSatisfaction) >= 30)
  • E. storesDF.filter((col("sqft") <= 25000) | (col("customerSatisfaction") >= 30))
Show Suggested Answer Hide Answer
Suggested Answer: E 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
jds0
4 months ago
Selected Answer: E
Answer: E I tried it with the code below, all other options raised an error: # register UDF with udf function from pyspark.sql import SparkSession from pyspark.sql.functions import col, udf from pyspark.sql.types import IntegerType spark = SparkSession.builder.appName("MyApp").getOrCreate() data = [ (0, 3, 20000, "A"), (1, 1, 50000, "A"), (2, 2, 70000, "A"), (3, 5, 10000, "B"), (4, 4, 100000, "B"), ] storesDF = spark.createDataFrame(data, ["storeID", "customerSatisfaction", "sqft", "division"]) try: storesDF.filter(col("sqft") <= 25000 | col("customerSatisfaction") >= 30).show() except Exception as e: print(e) try: storesDF.filter((col("sqft") <= 25000) | (col("customerSatisfaction") >= 30)).show() except Exception as e: print(e)
upvoted 1 times
...
TmData
1 year, 5 months ago
Selected Answer: E
Option E, storesDF.filter((col("sqft") <= 25000) | (col("customerSatisfaction") >= 30)), is the correct option. It uses the filter() operation with the conditions (col("sqft") <= 25000) | (col("customerSatisfaction") >= 30) to filter the rows where the value in column sqft is less than or equal to 25,000 OR the value in column customerSatisfaction is greater than or equal to 30.
upvoted 4 times
...
SonicBoom10C9
1 year, 6 months ago
Selected Answer: E
E has the right syntax, logic, operator and correct number of parentheses. All of the others falter in one of these respects.
upvoted 2 times
...
pierre_grns
1 year, 7 months ago
Selected Answer: A
Should be A. Tested it in communitity edition with 2 filters.
upvoted 1 times
pierre_grns
1 year, 7 months ago
sorry, we need 2 paranthesis indeed. So E !
upvoted 6 times
sly75
1 year, 6 months ago
Yes I agree, it's E
upvoted 2 times
...
evertonllins
1 year, 3 months ago
Congrats man, not everyone goes back to tell they were wrong and corrects them selves. We need more people like this on this platform
upvoted 4 times
...
...
...
4be8126
1 year, 7 months ago
The correct code block to return a DataFrame containing only the rows from DataFrame storesDF where the value in column sqft is less than or equal to 25,000 OR the value in column customerSatisfaction is greater than or equal to 30 is: storesDF.filter((col("sqft") <= 25000) | (col("customerSatisfaction") >= 30)) Option A uses a single pipe (|) instead of the correct syntax of two vertical bars (||) to represent "OR" logic, and also uses the wrong syntax for column referencing. Option B uses the correct or operator, but also uses the wrong syntax for column referencing. Option C uses the correct operator and syntax for column referencing, but does not use the col() function to reference column names. Option D uses the col() function, but also uses the wrong syntax for column referencing. Option E uses the correct syntax for both column referencing and logical operator, and correctly specifies the parentheses to ensure the proper order of operations. Therefore, the correct answer is E. storesDF.filter((col("sqft") <= 25000) | (col("customerSatisfaction") >= 30))
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...