Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Machine Learning Professional All Questions

View all questions & answers for the Certified Machine Learning Professional exam

Exam Certified Machine Learning Professional topic 1 question 41 discussion

Actual exam question from Databricks's Certified Machine Learning Professional
Question #: 41
Topic #: 1
[All Certified Machine Learning Professional Questions]

A machine learning engineer has developed a random forest model using scikit-learn, logged the model using MLflow as random_forest_model, and stored its run ID in the run_id Python variable. They now want to deploy that model by performing batch inference on a Spark DataFrame spark_df.
Which of the following code blocks can they use to create a function called predict that they can use to complete the task?

  • A.
  • B. It is not possible to deploy a scikit-learn model on a Spark DataFrame.
  • C.
  • D.
  • E.
Show Suggested Answer Hide Answer
Suggested Answer: E 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
Mircuz
4 months, 2 weeks ago
Selected Answer: E
You need the spark env
upvoted 1 times
...
64934ca
4 months, 3 weeks ago
Selected Answer: E
The spark session is passed as the first argument to mlflow.pyfunc.spark_udf to provide the necessary context for creating and executing the UDF within the Spark environment. The model_uri is passed as the second argument to specify which MLflow model to load and use for predictions. This order is required by the function's design to ensure proper integration with Spark.
upvoted 1 times
...
spaceexplorer
9 months, 3 weeks ago
Selected Answer: E
E is correct
upvoted 3 times
...
JaydeepT
9 months, 4 weeks ago
Selected Answer: A
spark_df is the frame to be used for variable evaluation in runtime
upvoted 1 times
...
BokNinja
11 months, 1 week ago
E. import mlflow logged_model = 'runs:/e905f5759d434a131bbe1e54a2b/best-model' # Load model as a Spark UDF. loaded_model = mlflow.pyfunc.spark_udf(spark, model_uri=logged_model) # Predict on a Spark DataFrame. df.withColumn('predictions', loaded_model(*columns)).collect()
upvoted 2 times
victorcolome
10 months, 1 week ago
Must be A, not E, as the question states that the variable is called "spark_df".
upvoted 2 times
victorcolome
10 months, 1 week ago
My bad, it is E. Because the spark_udf function expects the SparkSession as first paramenter, not the DataFrame!
upvoted 4 times
...
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...