exam questions

Exam DP-600 All Questions

View all questions & answers for the DP-600 exam

Exam DP-600 topic 1 question 56 discussion

Actual exam question from Microsoft's DP-600
Question #: 56
Topic #: 1
[All DP-600 Questions]

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You have a Fabric tenant that contains a new semantic model in OneLake.
You use a Fabric notebook to read the data into a Spark DataFrame.
You need to evaluate the data to calculate the min, max, mean, and standard deviation values for all the string and numeric columns.
Solution: You use the following PySpark expression:
df.summary()
Does this meet the goal?

  • A. Yes
  • B. No
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
stilferx
Highly Voted 9 months, 1 week ago
IMHO, A Example: df1 = spark.createDataFrame([(1, 10), (2, 10), (2, 15)], schema = ['fruit_id', 'amount']) df1.summary() summary fruit_id amount count 3 3 mean 1.6666666666666667 11.666666666666666 stddev 0.5773502691896257 2.886751345948129 min 1 10 25% 1 10 50% 2 10 75% 2 15 max 2 15
upvoted 9 times
...
b01d700
Most Recent 1 day, 22 hours ago
Selected Answer: B
The correct PySpark expression to calculate min, max, mean, and standard deviation for both numeric and STRING columns is: df.describe()
upvoted 1 times
...
slu239
1 month, 2 weeks ago
Selected Answer: B
Not meet the goal because it has to be df.summary().show()
upvoted 1 times
...
2fe10ed
2 months ago
Selected Answer: A
https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.summary.html#pyspark.sql.DataFrame.summary
upvoted 1 times
...
Pegooli
6 months, 2 weeks ago
Selected Answer: B
Using df.summary() in PySpark will provide summary statistics, including min, max, mean, and standard deviation for all numeric columns. However, it will not provide these statistics for string columns since summary statistics like min, max, mean, and standard deviation are not applicable to string data.
upvoted 2 times
gover07
5 months, 1 week ago
so the questions doesn't make sense if you are asked to calculate things that aren't defined
upvoted 1 times
...
...
6d1de25
7 months ago
Selected Answer: A
Correct
upvoted 1 times
...
7d97b62
7 months, 1 week ago
Selected Answer: A
In pandas, use df.describe() for summary statistics of numeric columns. In PySpark, use df.summary() for summary statistics of both numeric and string columns in a distributed computing environment.
upvoted 3 times
...
282b85d
8 months, 2 weeks ago
Selected Answer: B
while df.summary() does provide valuable information for numeric columns, it does not fully meet the goal of evaluating both string and numeric columns with the required statistical measures. Use df.summary() and df.agg() to cover numeric columns, and additional custom aggregations for string columns.
upvoted 4 times
...
XiltroX
11 months, 3 weeks ago
df.summary() is the only option where you can get MIX, MAX and AVG
upvoted 1 times
...
SamuComqi
12 months ago
Selected Answer: A
Also df.describe() is a valid solution. Sources: * summary --> https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.summary.html * describe --> https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.describe.html
upvoted 4 times
...
Momoanwar
12 months ago
Selected Answer: A
Correct
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago