exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 76 discussion

Actual exam question from Google's Professional Machine Learning Engineer
Question #: 76
Topic #: 1
[All Professional Machine Learning Engineer Questions]

You are working on a classification problem with time series data. After conducting just a few experiments using random cross-validation, you achieved an Area Under the Receiver Operating Characteristic Curve (AUC ROC) value of 99% on the training data. You haven’t explored using any sophisticated algorithms or spent any time on hyperparameter tuning. What should your next step be to identify and fix the problem?

  • A. Address the model overfitting by using a less complex algorithm and use k-fold cross-validation.
  • B. Address data leakage by applying nested cross-validation during model training.
  • C. Address data leakage by removing features highly correlated with the target value.
  • D. Address the model overfitting by tuning the hyperparameters to reduce the AUC ROC value.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
desertlotus1211
2 months, 2 weeks ago
Selected Answer: C
You are working with time series data yet used random cross-validation, and you immediately achieved an extremely high AUC (99%) with little effort. This is a red flag for data leakage—meaning information from the future (or directly from the target) is leaking into the training process C is better answer
upvoted 2 times
...
pinimichele01
11 months, 2 weeks ago
Selected Answer: B
random cross-validation time series data -> B
upvoted 3 times
...
gscharly
11 months, 3 weeks ago
Selected Answer: B
B with nested cross validation.
upvoted 2 times
pinimichele01
11 months, 2 weeks ago
can you explain me why?
upvoted 1 times
...
...
Werner123
1 year, 1 month ago
Selected Answer: B
"99% on training data" -> Data leakage "random cross-validation" -> Not suitable for time series, use "nested cross-validation"
upvoted 3 times
...
pmle_nintendo
1 year, 1 month ago
Selected Answer: D
Options B and C (Address data leakage by applying nested cross-validation during model training; Address data leakage by removing features highly correlated with the target value) are less relevant in this scenario because the primary concern appears to be overfitting rather than data leakage. Data leakage typically involves inadvertent inclusion of information from the test set in the training process, which may lead to overly optimistic performance metrics. However, there is no indication that data leakage is the cause of the high AUC ROC value in this case.
upvoted 1 times
503b759
4 months, 2 weeks ago
Data leakage is occuring owing to the use of k-fold cross val, because of the time series nature of the data.
upvoted 1 times
...
...
pico
1 year, 4 months ago
Selected Answer: D
Options A and B also address overfitting, but they involve different strategies. Option A suggests using a less complex algorithm and k-fold cross-validation. While this can be effective, it might be premature to change the algorithm without first exploring hyperparameter tuning. Option B suggests addressing data leakage, which is a different issue and may not be the primary cause of overfitting in this scenario.
upvoted 3 times
...
humancomputation
1 year, 6 months ago
Selected Answer: B
B with nested cross validation.
upvoted 1 times
...
M25
1 year, 10 months ago
Selected Answer: B
Went with B
upvoted 2 times
...
BenMS
2 years, 1 month ago
Selected Answer: B
Nested cross-validation to reduce data leakage - same as a previous question.
upvoted 1 times
...
Alexarr6
2 years, 1 month ago
Selected Answer: B
It`s B
upvoted 1 times
...
hiromi
2 years, 3 months ago
Selected Answer: B
B (same question 48) - https://towardsdatascience.com/time-series-nested-cross-validation-76adba623eb9
upvoted 3 times
...
ares81
2 years, 3 months ago
To say overfitting, I should have results on testing data, so it's data leakage. Common sense excludes C, so it's B.
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago