Exam Certified Machine Learning Professional All Questions

View all questions & answers for the Certified Machine Learning Professional exam

Exam Certified Machine Learning Professional topic 1 question 41 discussion

Actual exam question from Databricks's Certified Machine Learning Professional

Question #: 41
Topic #: 1

[All Certified Machine Learning Professional Questions]

A machine learning engineer has developed a random forest model using scikit-learn, logged the model using MLflow as random_forest_model, and stored its run ID in the run_id Python variable. They now want to deploy that model by performing batch inference on a Spark DataFrame spark_df.
Which of the following code blocks can they use to create a function called predict that they can use to complete the task?

A.
B. It is not possible to deploy a scikit-learn model on a Spark DataFrame.
C.
D.
E.

Show Suggested Answer

Suggested Answer: E 🗳️

by BokNinja at Dec. 19, 2023, 1:46 a.m.

Comments

Submit Cancel

ricorosol

6 months ago

Selected Answer: A

A. mlflow.pyfunc.spark_udf(spark_df...)

upvoted 1 times

...

Mircuz

12 months ago

Selected Answer: E

You need the spark env

upvoted 1 times

...

64934ca

1 year ago

Selected Answer: E

The spark session is passed as the first argument to mlflow.pyfunc.spark_udf to provide the necessary context for creating and executing the UDF within the Spark environment. The model_uri is passed as the second argument to specify which MLflow model to load and use for predictions. This order is required by the function's design to ensure proper integration with Spark.

upvoted 1 times

...

spaceexplorer

1 year, 5 months ago

Selected Answer: E

E is correct

upvoted 3 times

...

JaydeepT

1 year, 5 months ago

Selected Answer: A

spark_df is the frame to be used for variable evaluation in runtime

upvoted 2 times

...

BokNinja

1 year, 6 months ago

E. import mlflow logged_model = 'runs:/e905f5759d434a131bbe1e54a2b/best-model' # Load model as a Spark UDF. loaded_model = mlflow.pyfunc.spark_udf(spark, model_uri=logged_model) # Predict on a Spark DataFrame. df.withColumn('predictions', loaded_model(*columns)).collect()

upvoted 2 times

victorcolome

1 year, 5 months ago

Must be A, not E, as the question states that the variable is called "spark_df".

upvoted 2 times

victorcolome

1 year, 5 months ago

My bad, it is E. Because the spark_udf function expects the SparkSession as first paramenter, not the DataFrame!

upvoted 4 times

...