Exam Certified Associate Developer for Apache Spark All Questions

View all questions & answers for the Certified Associate Developer for Apache Spark exam

Go to Exam

Exam Certified Associate Developer for Apache Spark topic 1 question 35 discussion

Actual exam question from Databricks's Certified Associate Developer for Apache Spark

Question #: 35
Topic #: 1

[All Certified Associate Developer for Apache Spark Questions]

Which of the following code blocks returns a collection of summary statistics for all columns in
DataFrame storesDF?

A. storesDF.summary("mean")
B. storesDF.describe(all = True)
C. storesDF.describe("all")
D. storesDF.summary("all")
E. storesDF.describe()

Show Suggested Answer

Suggested Answer: E 🗳️

by 4be8126 at April 26, 2023, 12:48 p.m.

Comments

Submit Cancel

NirajBhise

1 month, 3 weeks ago

Selected Answer: E

Column names or list of names is optional. If no columns specified then the function works on all columns.

upvoted 1 times

...

jds0

11 months, 2 weeks ago

Selected Answer: E

E is the right option. See code below with Spark 3.5.1 # Summary statistics of a DataFrame from pyspark.sql import SparkSession from pyspark.sql.functions import col from pyspark.errors import PySparkTypeError spark = SparkSession.builder.appName("MyApp").getOrCreate() data = [ (0, 43161), (1, 51200), (2, None), (3, 78367), (4, None), ] storesDF = spark.createDataFrame(data, ["storeID", "sqft"]) try: storesDF.summary("mean").show() except Exception as e: print(e) try: storesDF.describe(all = True).show() except Exception as e: print(e) try: storesDF.describe("all").show() except Exception as e: print(e) try: storesDF.summary("all").show() except Exception as e: print(e) try: storesDF.describe().show() except Exception as e: print(e)

upvoted 2 times

...

dbdantas

1 year, 2 months ago

Selected Answer: E

E is the correct one

upvoted 1 times

...

azure_bimonster

1 year, 4 months ago

Selected Answer: E

E would be correct here

upvoted 1 times

...

mahmoud_salah30

1 year, 6 months ago

tested e is the right answer

upvoted 2 times

...

souha_axa

1 year, 10 months ago

E is the correct answer

upvoted 1 times

...

cookiemonster42

1 year, 11 months ago

Selected Answer: E

check the documentation, mates. both methods receive names of columns as arguments, so E is correct!

upvoted 1 times

...

zozoshanky

1 year, 11 months ago

E is correct, it's giving the output.

upvoted 2 times

...

zozoshanky

1 year, 11 months ago

B is correct. On running the last option it gives error. TypeError: describe() got an unexpected keyword argument 'all'

upvoted 1 times

cookiemonster42

1 year, 11 months ago

checked it, it gave me the right result, so E is the one

upvoted 3 times

...

4be8126

2 years, 2 months ago

Selected Answer: B

The answer is B. Explanation: The describe() method in DataFrame returns a DataFrame with summary statistics for all numeric columns in the input DataFrame. By default, only the count, mean, standard deviation, minimum, and maximum values are returned, but additional statistics can be specified with the percentiles parameter. Setting the all parameter to True will include non-numeric columns in the output as well. Therefore, option B is the correct answer. Option A is not correct, as the summary() method only returns summary statistics for the specified column(s) and is not a valid option for returning summary statistics for all columns in the DataFrame. Option C is not correct, as the describe() method does not have an "all" option. Option D is also not correct, as the summary() method only returns summary statistics for the specified column(s) and does not have an "all" option. Option E is not incorrect, but it does not specify whether to include non-numeric columns in the output. Therefore, option B is a better answer.

upvoted 1 times

ZSun

2 years ago

Did you really try this in pyspark, or look up the document? TypeError: describe() got an unexpected keyword argument 'all'

upvoted 6 times

8605246

2 years ago

describe() is correct

upvoted 5 times

...

Deuterium

1 year, 12 months ago

Is you answer from Chat GPT ?

upvoted 1 times

cookiemonster42

1 year, 11 months ago

even chat gpt says E is the correct one :)

upvoted 3 times

...

juadaves

1 year, 8 months ago

TypeError Traceback (most recent call last) <ipython-input-34-5077330dead7> in <cell line: 1>() ----> 1 storesDF.describe(all = True) TypeError: DataFrame.describe() got an unexpected keyword argument 'all'

upvoted 1 times

...