Which of the following code blocks returns a DataFrame where column divisionDistinct is the approximate number of distinct values in column division from DataFrame storesDF?
A.
storesDF.withColumn("divisionDistinct", approx_count_distinct(col("division")))
B.
storesDF.agg(col("division").approx_count_distinct("divisionDistinct"))
C.
storesDF.agg(approx_count_distinct(col("division")).alias("divisionDistinct"))
D.
storesDF.withColumn("divisionDistinct", col("division").approx_count_distinct())
E.
storesDF.agg(col("division").approx_count_distinct().alias("divisionDistinct"))
I think it's C
https://spark.apache.org/docs/3.1.2/api/python/reference/api/pyspark.sql.functions.approx_count_distinct.html
upvoted 1 times
...
Log in to ExamTopics
Sign in:
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one.
So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
Sowwy1
7 months, 4 weeks ago