Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 105 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 105
Topic #: 1
[All Certified Data Engineer Professional Questions]

The data science team has created and logged a production model using MLflow. The model accepts a list of column names and returns a new column of type DOUBLE.

The following code correctly imports the production model, loads the customers table containing the customer_id key column into a DataFrame, and defines the feature columns needed for the model.



Which code block will output a DataFrame with the schema "customer_id LONG, predictions DOUBLE"?

  • A. df.map(lambda x:model(x[columns])).select("customer_id, predictions")
  • B. df.select("customer_id", model(*columns).alias("predictions"))
  • C. model.predict(df, columns)
  • D. df.select("customer_id", pandas_udf(model, columns).alias("predictions"))
  • E. df.apply(model, columns).select("customer_id, predictions")
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
divingbell17
Highly Voted 11 months ago
Selected Answer: B
B is correct. It's a spark udf not pandas
upvoted 6 times
...
aragorn_brego
Highly Voted 1 year ago
Selected Answer: B
This code block applies the Spark UDF created from the MLflow model to the DataFrame df by selecting the existing customer_id column and the new column produced by the model, which is aliased to predictions. The model(*columns) part is where the UDF is applied to the columns specified in the columns list, and alias("predictions") is used to name the output column of the model's predictions. This will result in a DataFrame with the desired schema: "customer_id LONG, predictions DOUBLE".
upvoted 5 times
...
60ties
Most Recent 1 year ago
I think it is B
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...