Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Associate Developer for Apache Spark All Questions

View all questions & answers for the Certified Associate Developer for Apache Spark exam

Exam Certified Associate Developer for Apache Spark topic 1 question 35 discussion

Which of the following code blocks returns a collection of summary statistics for all columns in
DataFrame storesDF?

  • A. storesDF.summary("mean")
  • B. storesDF.describe(all = True)
  • C. storesDF.describe("all")
  • D. storesDF.summary("all")
  • E. storesDF.describe()
Show Suggested Answer Hide Answer
Suggested Answer: E 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
jds0
4 months ago
Selected Answer: E
E is the right option. See code below with Spark 3.5.1 # Summary statistics of a DataFrame from pyspark.sql import SparkSession from pyspark.sql.functions import col from pyspark.errors import PySparkTypeError spark = SparkSession.builder.appName("MyApp").getOrCreate() data = [ (0, 43161), (1, 51200), (2, None), (3, 78367), (4, None), ] storesDF = spark.createDataFrame(data, ["storeID", "sqft"]) try: storesDF.summary("mean").show() except Exception as e: print(e) try: storesDF.describe(all = True).show() except Exception as e: print(e) try: storesDF.describe("all").show() except Exception as e: print(e) try: storesDF.summary("all").show() except Exception as e: print(e) try: storesDF.describe().show() except Exception as e: print(e)
upvoted 1 times
...
dbdantas
7 months, 2 weeks ago
Selected Answer: E
E is the correct one
upvoted 1 times
...
azure_bimonster
9 months, 3 weeks ago
Selected Answer: E
E would be correct here
upvoted 1 times
...
mahmoud_salah30
11 months ago
tested e is the right answer
upvoted 2 times
...
souha_axa
1 year, 3 months ago
E is the correct answer
upvoted 1 times
...
cookiemonster42
1 year, 3 months ago
Selected Answer: E
check the documentation, mates. both methods receive names of columns as arguments, so E is correct!
upvoted 1 times
...
zozoshanky
1 year, 3 months ago
E is correct, it's giving the output.
upvoted 2 times
...
zozoshanky
1 year, 4 months ago
B is correct. On running the last option it gives error. TypeError: describe() got an unexpected keyword argument 'all'
upvoted 1 times
cookiemonster42
1 year, 3 months ago
checked it, it gave me the right result, so E is the one
upvoted 3 times
...
...
4be8126
1 year, 7 months ago
Selected Answer: B
The answer is B. Explanation: The describe() method in DataFrame returns a DataFrame with summary statistics for all numeric columns in the input DataFrame. By default, only the count, mean, standard deviation, minimum, and maximum values are returned, but additional statistics can be specified with the percentiles parameter. Setting the all parameter to True will include non-numeric columns in the output as well. Therefore, option B is the correct answer. Option A is not correct, as the summary() method only returns summary statistics for the specified column(s) and is not a valid option for returning summary statistics for all columns in the DataFrame. Option C is not correct, as the describe() method does not have an "all" option. Option D is also not correct, as the summary() method only returns summary statistics for the specified column(s) and does not have an "all" option. Option E is not incorrect, but it does not specify whether to include non-numeric columns in the output. Therefore, option B is a better answer.
upvoted 1 times
ZSun
1 year, 5 months ago
Did you really try this in pyspark, or look up the document? TypeError: describe() got an unexpected keyword argument 'all'
upvoted 6 times
8605246
1 year, 4 months ago
describe() is correct
upvoted 5 times
...
...
Deuterium
1 year, 4 months ago
Is you answer from Chat GPT ?
upvoted 1 times
cookiemonster42
1 year, 3 months ago
even chat gpt says E is the correct one :)
upvoted 3 times
...
...
juadaves
1 year, 1 month ago
TypeError Traceback (most recent call last) <ipython-input-34-5077330dead7> in <cell line: 1>() ----> 1 storesDF.describe(all = True) TypeError: DataFrame.describe() got an unexpected keyword argument 'all'
upvoted 1 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...