Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Associate Developer for Apache Spark All Questions

View all questions & answers for the Certified Associate Developer for Apache Spark exam

Exam Certified Associate Developer for Apache Spark topic 1 question 23 discussion

Which of the following code blocks returns a new DataFrame with a new column employeesPerSqft that is the quotient of column numberOfEmployees and column sqft, both of which are from DataFrame storesDF? Note that column employeesPerSqft is not in the original DataFrame storesDF.

  • A. storesDF.withColumn("employeesPerSqft", col("numberOfEmployees") / col("sqft"))
  • B. storesDF.withColumn("employeesPerSqft", "numberOfEmployees" / "sqft")
  • C. storesDF.select("employeesPerSqft", "numberOfEmployees" / "sqft")
  • D. storesDF.select("employeesPerSqft", col("numberOfEmployees") / col("sqft"))
  • E. storesDF.withColumn(col("employeesPerSqft"), col("numberOfEmployees") / col("sqft"))
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
jds0
4 months ago
Selected Answer: A
Answer: A All other options do not work Code: from pyspark.sql import SparkSession from pyspark.sql.functions import col spark = SparkSession.builder.appName("MyApp").getOrCreate() data = [ (0, 3, 20, "A"), (1, 1, 50, "A"), (2, 2, 70, "A"), ] storesDF = spark.createDataFrame(data, ["storeID", "numberOfEmployees", "sqft", "division"]) storesDF.withColumn("employeesPerSqft", col("numberOfEmployees") / col("sqft")) # A.
upvoted 1 times
...
newusername
1 year, 2 months ago
Selected Answer: A
Test: from pyspark.sql import SparkSession from pyspark.sql.functions import col # Initializing Spark session (if not already initialized) spark = SparkSession.builder.appName("databricks_example").getOrCreate() # Creating some synthetic data for storesDF data = [ {"storeId": 1, "numberOfEmployees": 10, "sqft": 500}, {"storeId": 2, "numberOfEmployees": 15, "sqft": 750}, {"storeId": 3, "numberOfEmployees": 8, "sqft": 400} ] storesDF = spark.createDataFrame(data) # Option A: try: df_a = storesDF.withColumn("employeesPerSqft", col("numberOfEmployees") / col("sqft")) df_a.show() print("Option A works") except Exception as e: print("Option A doesn't work:", str(e))
upvoted 1 times
...
SonicBoom10C9
1 year, 6 months ago
Selected Answer: A
C, D are wrong as exmployeesPerSqft cannot be selected, it doesn't exist. Also, that is not proper select syntax anyway. B does not select existing columns using col(), and E refers to employeesPerSqft as an existing column; also, it cannot be the first argument for withColumn().
upvoted 2 times
...
4be8126
1 year, 7 months ago
storesDF.select("employeesPerSqft", col("numberOfEmployees") / col("sqft")) This code block selects the columns "employeesPerSqft" and the quotient of "numberOfEmployees" and "sqft" from the DataFrame storesDF. However, since "employeesPerSqft" is not a column in the original storesDF, this code block would throw an error. To create a new column "employeesPerSqft" in the resulting DataFrame, we need to use the withColumn() method instead of select(). Here's the corrected code block: storesDF.withColumn("employeesPerSqft", col("numberOfEmployees") / col("sqft")) This code block adds a new column "employeesPerSqft" to the storesDF DataFrame. The new column is created by dividing the values in column "numberOfEmployees" by the values in column "sqft".
upvoted 1 times
...
4be8126
1 year, 7 months ago
The correct code block to return a new DataFrame with a new column employeesPerSqft that is the quotient of column numberOfEmployees and column sqft from DataFrame storesDF is: storesDF.withColumn("employeesPerSqft", col("numberOfEmployees") / col("sqft")) Option A correctly uses the withColumn() function to create a new column employeesPerSqft by dividing column numberOfEmployees by column sqft. Option B has a syntax error because it uses quotation marks to reference column names instead of the col() function. Option C also has a syntax error because it uses quotation marks to reference column names instead of the col() function, and also uses the select() function instead of withColumn() to create a new column. Option D correctly references column names using col() and uses the select() function to return a DataFrame with only the two selected columns. Option E has a syntax error where col() is used as a first argument instead of a second argument for the withColumn() function. Therefore, the correct answer is A. storesDF.withColumn("employeesPerSqft", col("numberOfEmployees") / col("sqft"))
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...