exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 73 discussion

Actual exam question from Google's Professional Machine Learning Engineer
Question #: 73
Topic #: 1
[All Professional Machine Learning Engineer Questions]

You work for a global footwear retailer and need to predict when an item will be out of stock based on historical inventory data Customer behavior is highly dynamic since footwear demand is influenced by many different factors. You want to serve models that are trained on all available data, but track your performance on specific subsets of data before pushing to production. What is the most streamlined and reliable way to perform this validation?

  • A. Use then TFX ModelValidator tools to specify performance metrics for production readiness.
  • B. Use k-fold cross-validation as a validation strategy to ensure that your model is ready for production.
  • C. Use the last relevant week of data as a validation set to ensure that your model is performing accurately on current data.
  • D. Use the entire dataset and treat the area under the receiver operating characteristics curve (AUC ROC) as the main metric.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
John_Pongthorn
Highly Voted 1 year, 9 months ago
https://www.tensorflow.org/tfx/guide/evaluator
upvoted 14 times
...
hiromi
Highly Voted 1 year, 10 months ago
Selected Answer: C
it's seem C for me B is wrong cuz "Many machine learning techniques don’t work well here due to the sequential nature and temporal correlation of time series. For example, k-fold cross validation can cause data leakage; models need to be retrained to generate new forecasts" - https://cloud.google.com/learn/what-is-time-series
upvoted 9 times
...
PhilipKoku
Most Recent 4 months, 3 weeks ago
Selected Answer: A
A) TFX ModelValidator is designed to handle the exact needs described in the scenario: training on all data, validating on specific subsets, and ensuring production readiness with comprehensive performance metrics. This makes it the most streamlined and reliable method compared to other options, which either lack specificity in production readiness (B), are too narrow in scope (C), or risk overfitting and inadequate validation (D).
upvoted 3 times
...
gscharly
6 months, 1 week ago
Selected Answer: A
Evaluator TFX lets you evaluate the performance on different subsets of data https://www.tensorflow.org/tfx/guide/evaluator
upvoted 3 times
...
pinimichele01
6 months, 2 weeks ago
Selected Answer: A
The Evaluator TFX pipeline component performs deep analysis on the training results for your models, to help you understand how your model performs on subsets of your data.
upvoted 3 times
...
edoo
7 months, 3 weeks ago
Selected Answer: A
I prefer A to C because 1 week of data may be insufficient to generalize the model and could lead to overfitting on the validation subset.
upvoted 4 times
...
pmle_nintendo
8 months ago
Selected Answer: C
option C provides a streamlined and reliable approach that focuses on evaluating the model's performance on the most relevant and recent data, which is essential for predicting out-of-stock events in a dynamic retail setting.
upvoted 1 times
...
Mickey321
11 months, 2 weeks ago
Selected Answer: A
Either A or C but C is only last week which is not specific data sets
upvoted 2 times
...
AdiML
1 year, 1 month ago
Answer should be C, we are dealing with dynamic data and the "last" data is more relevant to have an idea about the future performance
upvoted 1 times
...
joaquinmenendez
1 year, 1 month ago
Selected Answer: C
Option C, because it allows you to track your model's performance on the most *recent* data, which is the most relevant data for predicting stockout risk. Given that the preferences are dynamic, the most important thing is that the model WORKS correctly with the newest data
upvoted 1 times
...
[Removed]
1 year, 3 months ago
Selected Answer: A
The answer is A. Performance on specific subsets of data before pushing to production == TFX ModelValidator with custom performance metrics for production readiness. C is wrong because performance in the last relevant week of data != performance on specific subsets of data.
upvoted 2 times
tavva_prudhvi
1 year, 2 months ago
The ModelValidator TFX Pipeline Component (Deprecated)
upvoted 2 times
...
...
atlas_lyon
1 year, 3 months ago
Selected Answer: A
I will go for A. I don't think the aim of the question is to test if the candidates know whether or not a component is deprecated . Note that ModelValidator has been fused with Evaluator. So we can imagine, the question would have been updated in recent exams. Evaluator enables testing on specific subsets with the metrics we want, then indicates to Pusher component to push the new model to production if "model is good enough". This would make the pipeline quite streamlined (https://www.tensorflow.org/tfx/guide/evaluator) B: wrong: using historical data, one should watch data leakage C: wrong: We want to track performance on specific subsets of data (not necessarily the last week) maybe to do some targeting/segmentation ? who knows. D: wrong because we want to track performance on specific subsets of data not the entire dataset
upvoted 3 times
tavva_prudhvi
1 year, 3 months ago
Bro, thats not TFXModelValidator its Evaluator, are both the same?
upvoted 1 times
TFXModelValidator is deprecated, but its behaviour can be replicated using the Evaluator object - which is the point he tried to make. See the docs here: https://www.tensorflow.org/tfx/guide/modelval
upvoted 1 times
...
...
...
Liting
1 year, 3 months ago
Selected Answer: C
Went with C
upvoted 1 times
...
Voyager2
1 year, 4 months ago
Selected Answer: C
I think that it should be C for the following key point ", but track your performance on specific subsets of data before pushing to production" So the ask is which subset of data you should use.
upvoted 1 times
...
julliet
1 year, 4 months ago
Could someone explain why A is better option than C? C is correct one in terms of evaluation overall, no doubt. But do we choose TFX because it understands we are dealing with time series? Or is it the "specific subset" in the Q that makes us thinking we have already chosen the data of last period and just need to push it into the TFX?
upvoted 1 times
...
aw_49
1 year, 5 months ago
Selected Answer: C
A is deprecated.. so C
upvoted 1 times
...
M25
1 year, 5 months ago
Selected Answer: A
Went with A
upvoted 3 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago