Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Machine Learning Associate All Questions

View all questions & answers for the Certified Machine Learning Associate exam

Exam Certified Machine Learning Associate topic 1 question 29 discussion

Actual exam question from Databricks's Certified Machine Learning Associate
Question #: 29
Topic #: 1
[All Certified Machine Learning Associate Questions]

A data scientist has been given an incomplete notebook from the data engineering team. The notebook uses a Spark DataFrame spark_df on which the data scientist needs to perform further feature engineering. Unfortunately, the data scientist has not yet learned the PySpark DataFrame API.
Which of the following blocks of code can the data scientist run to be able to use the pandas API on Spark?

  • A. import pyspark.pandas as ps
    df = ps.DataFrame(spark_df)
  • B. import pyspark.pandas as ps
    df = ps.to_pandas(spark_df)
  • C. spark_df.to_sql()
  • D. import pandas as pd
    df = pd.DataFrame(spark_df)
  • E. spark_df.to_pandas()
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
smonov
1 week, 2 days ago
Selected Answer: E
It's E
upvoted 1 times
...
ricorosol
1 month, 3 weeks ago
E. is the closest answer, the correct method name is toPandas(). pyspark.sql.DataFrame.toPandas DataFrame.toPandas() → PandasDataFrameLike
upvoted 2 times
...
rajneesharora
4 months, 3 weeks ago
A is correct
upvoted 1 times
...
68c6a4b
5 months, 1 week ago
It's not A. E. spark_df.to_pandas() Here's why: The to_pandas() method is a built-in method of the PySpark DataFrame API. It converts a Spark DataFrame to a pandas DataFrame. By calling spark_df.to_pandas(), the data scientist can convert the Spark DataFrame spark_df to a pandas DataFrame, allowing them to use the familiar pandas API for further feature engineering. The resulting pandas DataFrame will be stored in memory on the driver node, so this approach is suitable when the data size is relatively small and can fit in the memory of the driver.
upvoted 3 times
rajneesharora
4 months, 3 weeks ago
E is not correct as to_pandas would convert into pandas DF, while what is given is a Spark DF
upvoted 2 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...