Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 7 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 7
Topic #: 1
[All Certified Data Engineer Professional Questions]

The data science team has created and logged a production model using MLflow. The following code correctly imports and applies the production model to output the predictions as a new DataFrame named preds with the schema "customer_id LONG, predictions DOUBLE, date DATE".

The data science team would like predictions saved to a Delta Lake table with the ability to compare all predictions across time. Churn predictions will be made at most once per day.
Which code block accomplishes this task while minimizing potential compute costs?

  • A. preds.write.mode("append").saveAsTable("churn_preds")
  • B. preds.write.format("delta").save("/preds/churn_preds")
  • C.
  • D.
  • E.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
thxsgod
Highly Voted 1 year, 2 months ago
Selected Answer: A
You need: - Batch operation since it is at most once a day - Append, since you need to keep track of past predictions A is the correct answer. You don't need to specify "format" when you use saveAsTable.
upvoted 12 times
...
benni_ale
Most Recent 1 month, 2 weeks ago
Selected Answer: A
Batch, Append
upvoted 1 times
...
coercion
6 months, 1 week ago
Selected Answer: A
default table format is delta so no need to specify the format. As per the requirement, "append" mode is required to maintain the history. Default mode is "ErrorIfExists"
upvoted 1 times
...
Jay_98_11
10 months, 2 weeks ago
Selected Answer: A
A is correct
upvoted 1 times
...
kz_data
11 months, 1 week ago
Selected Answer: A
A is correct
upvoted 1 times
...
sturcu
1 year, 1 month ago
Selected Answer: A
Correct
upvoted 1 times
...
sturcu
1 year, 1 month ago
Correct
upvoted 1 times
...
Eertyy
1 year, 2 months ago
answer is B
upvoted 2 times
Starvosxant
1 year, 1 month ago
First: the default node Databricks saves tables IS Delta Format. So no reason why you say it wouldn't benefit from Lakehouse features. Second: the default write mode is Error, means that if you try to write to a location and that already exists there, it will prone a Error. And the question specify that you gonna write once a day. You better revisit basic topics before continue to the professional level certification, or buy the dump entirely.
upvoted 4 times
...
Eertyy
1 year, 2 months ago
Here's why: A. saves the data as a managed table, which may not be efficient for large-scale data or frequent updates. It doesn't utilize Delta Lake capabilities. C.is used for streaming operations, not batch processing. Also, using "overwrite" as output mode will replace the existing data each time, which is not suitable for keeping historical predictions. D.is similar to option A but with "overwrite" mode. It will replace the entire table each time, which is not suitable for maintaining a historical record of predictions. E. is also for streaming operations and not for batch processing. Additionally, it uses the "table" method, which is not typically used for writing batch data into Delta Lake tables. Option B is suitable for batch processing, writes data in Delta Lake format, and allows you to efficiently maintain a historical record of predictions while minimizing compute costs.
upvoted 3 times
pradyumn9999
1 year, 1 month ago
Its also said they want to compare past values as well, so mode needs to be append. By default is error mode.
upvoted 4 times
...
...
...
buggumaster
1 year, 2 months ago
Selected answer is wrong, not write Format is specified in A.
upvoted 1 times
...
buggumaster
1 year, 2 months ago
Selected answer is wrong, not writeMode is specified in A.
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...