Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 73 discussion

Actual exam question from Google's Professional Machine Learning Engineer

Question #: 73
Topic #: 1

[All Professional Machine Learning Engineer Questions]

You work for a global footwear retailer and need to predict when an item will be out of stock based on historical inventory data Customer behavior is highly dynamic since footwear demand is influenced by many different factors. You want to serve models that are trained on all available data, but track your performance on specific subsets of data before pushing to production. What is the most streamlined and reliable way to perform this validation?

A. Use then TFX ModelValidator tools to specify performance metrics for production readiness.
B. Use k-fold cross-validation as a validation strategy to ensure that your model is ready for production.
C. Use the last relevant week of data as a validation set to ensure that your model is performing accurately on current data.
D. Use the entire dataset and treat the area under the receiver operating characteristics curve (AUC ROC) as the main metric.

Show Suggested Answer

Suggested Answer: A 🗳️

by LearnSodas at Dec. 11, 2022, 1:18 p.m.

Comments

Submit Cancel

John_Pongthorn

Highly Voted 1 year, 9 months ago

https://www.tensorflow.org/tfx/guide/evaluator

upvoted 14 times

...

hiromi

Highly Voted 1 year, 10 months ago

Selected Answer: C

it's seem C for me B is wrong cuz "Many machine learning techniques don’t work well here due to the sequential nature and temporal correlation of time series. For example, k-fold cross validation can cause data leakage; models need to be retrained to generate new forecasts" - https://cloud.google.com/learn/what-is-time-series

upvoted 9 times

...

PhilipKoku

Most Recent 4 months, 3 weeks ago

Selected Answer: A

A) TFX ModelValidator is designed to handle the exact needs described in the scenario: training on all data, validating on specific subsets, and ensuring production readiness with comprehensive performance metrics. This makes it the most streamlined and reliable method compared to other options, which either lack specificity in production readiness (B), are too narrow in scope (C), or risk overfitting and inadequate validation (D).

upvoted 3 times

...

gscharly

6 months, 1 week ago

Selected Answer: A

Evaluator TFX lets you evaluate the performance on different subsets of data https://www.tensorflow.org/tfx/guide/evaluator

upvoted 3 times

...

pinimichele01

6 months, 2 weeks ago

Selected Answer: A

The Evaluator TFX pipeline component performs deep analysis on the training results for your models, to help you understand how your model performs on subsets of your data.

upvoted 3 times

...

edoo

7 months, 3 weeks ago

Selected Answer: A

I prefer A to C because 1 week of data may be insufficient to generalize the model and could lead to overfitting on the validation subset.

upvoted 4 times

...

pmle_nintendo

8 months ago

Selected Answer: C

option C provides a streamlined and reliable approach that focuses on evaluating the model's performance on the most relevant and recent data, which is essential for predicting out-of-stock events in a dynamic retail setting.

upvoted 1 times

...

Mickey321

11 months, 2 weeks ago

Selected Answer: A

Either A or C but C is only last week which is not specific data sets

upvoted 2 times

...

AdiML

1 year, 1 month ago

Answer should be C, we are dealing with dynamic data and the "last" data is more relevant to have an idea about the future performance

upvoted 1 times

...

joaquinmenendez

1 year, 1 month ago

Selected Answer: C

Option C, because it allows you to track your model's performance on the most *recent* data, which is the most relevant data for predicting stockout risk. Given that the preferences are dynamic, the most important thing is that the model WORKS correctly with the newest data

upvoted 1 times

...

[Removed]

1 year, 3 months ago

Selected Answer: A

The answer is A. Performance on specific subsets of data before pushing to production == TFX ModelValidator with custom performance metrics for production readiness. C is wrong because performance in the last relevant week of data != performance on specific subsets of data.

upvoted 2 times

tavva_prudhvi

1 year, 2 months ago

The ModelValidator TFX Pipeline Component (Deprecated)

upvoted 2 times

...

atlas_lyon

1 year, 3 months ago

Selected Answer: A

I will go for A. I don't think the aim of the question is to test if the candidates know whether or not a component is deprecated . Note that ModelValidator has been fused with Evaluator. So we can imagine, the question would have been updated in recent exams. Evaluator enables testing on specific subsets with the metrics we want, then indicates to Pusher component to push the new model to production if "model is good enough". This would make the pipeline quite streamlined (https://www.tensorflow.org/tfx/guide/evaluator) B: wrong: using historical data, one should watch data leakage C: wrong: We want to track performance on specific subsets of data (not necessarily the last week) maybe to do some targeting/segmentation ? who knows. D: wrong because we want to track performance on specific subsets of data not the entire dataset

upvoted 3 times

tavva_prudhvi

1 year, 3 months ago

Bro, thats not TFXModelValidator its Evaluator, are both the same?

upvoted 1 times

MultipleWorkerMirroredStrategy

1 year ago

TFXModelValidator is deprecated, but its behaviour can be replicated using the Evaluator object - which is the point he tried to make. See the docs here: https://www.tensorflow.org/tfx/guide/modelval

upvoted 1 times

...

Liting

1 year, 3 months ago

Selected Answer: C

Went with C

upvoted 1 times

...

Voyager2

1 year, 4 months ago

Selected Answer: C

I think that it should be C for the following key point ", but track your performance on specific subsets of data before pushing to production" So the ask is which subset of data you should use.

upvoted 1 times

...

julliet

1 year, 4 months ago

Could someone explain why A is better option than C? C is correct one in terms of evaluation overall, no doubt. But do we choose TFX because it understands we are dealing with time series? Or is it the "specific subset" in the Q that makes us thinking we have already chosen the data of last period and just need to push it into the TFX?

upvoted 1 times

...

aw_49

1 year, 5 months ago

Selected Answer: C

A is deprecated.. so C

upvoted 1 times

...

M25

1 year, 5 months ago

Selected Answer: A

Went with A

upvoted 3 times

...

Load full discussion...

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 73 discussion

Comments

John_Pongthorn

hiromi

PhilipKoku

gscharly

pinimichele01

edoo

pmle_nintendo

Mickey321

AdiML

joaquinmenendez

[Removed]

tavva_prudhvi

atlas_lyon

tavva_prudhvi

MultipleWorkerMirroredStrategy

Liting

Voyager2

julliet

aw_49

M25

SY0-701