exam questions

Exam Certified Machine Learning Associate All Questions

View all questions & answers for the Certified Machine Learning Associate exam

Exam Certified Machine Learning Associate topic 1 question 17 discussion

Actual exam question from Databricks's Certified Machine Learning Associate
Question #: 17
Topic #: 1
[All Certified Machine Learning Associate Questions]

A data scientist has written a data cleaning notebook that utilizes the pandas library, but their colleague has suggested that they refactor their notebook to scale with big data.
Which of the following approaches can the data scientist take to spend the least amount of time refactoring their notebook to scale with big data?

  • A. They can refactor their notebook to process the data in parallel.
  • B. They can refactor their notebook to use the PySpark DataFrame API.
  • C. They can refactor their notebook to use the Scala Dataset API.
  • D. They can refactor their notebook to use Spark SQL.
  • E. They can refactor their notebook to utilize the pandas API on Spark.
Show Suggested Answer Hide Answer
Suggested Answer: E 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
oliver29
1 month, 4 weeks ago
Selected Answer: E
The pandas API on Spark (pyspark.pandas) is the most efficient path for minimal disruption, scalability, and productivity for a data scientist familiar with pandas.
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago