Exam DP-100 All Questions

View all questions & answers for the DP-100 exam

Exam DP-100 topic 3 question 63 discussion

Actual exam question from Microsoft's DP-100

Question #: 63
Topic #: 3

HOTSPOT -
You are tuning a hyperparameter for an algorithm. The following table shows a data set with different hyperparameter, training error, and validation errors.

Use the drop-down menus to select the answer choice that answers each question based on the information presented in the graphic.
Hot Area:

Show Suggested Answer

Suggested Answer:

Box 1: 4 -
Choose the one which has lower training and validation error and also the closest match.
Minimize variance (difference between validation error and train error).

Box 2: 5 -
Minimize variance (difference between validation error and train error).
Reference:
https://medium.com/comet-ml/organizing-machine-learning-projects-project-management-guidelines-2d2b85651bbd

by Yilu at May 16, 2020, 4:52 a.m.

Comments

Submit Cancel

pepmir

Highly Voted 4 years, 6 months ago

Answers looks correct to me Difference between Erros: 105-095: 10 200-085: 115 250-100: 150 105-100: 5 ---> This is the best H value. Agree with #4 for Q1 400-050: 350 -> Highest Diff. So Poor for Q2

upvoted 41 times

HkIsCrazY

3 years, 11 months ago

Why would you take the difference? it makes no sense! Best H value should be option A. 105 and 95 reason - validation error in option A is 95 whereas for option D it is 100.Training error is same in both case.

upvoted 16 times

...

snegnik

1 year, 7 months ago

It depends on the main measure we use. If it is bias, we should find low numbers, and if it is variance, we should find a small difference. The best way to find a good trade-off between bias and variance is to have low error numbers and a small difference between test and validation errors.

upvoted 1 times

...

Yoshizn

1 year, 11 months ago

Doing the difference here makes no sense since in the #1H and #4H both has 105 so we will look to the smallest Validation error between #1H and #4H, so 100 > 95 then we will take #1H 1 as the H value to choose.

upvoted 1 times

...

akgarg00

3 years, 10 months ago

This depends on the trade-off curve validation error and training error are making. If they intersect then we cannot use this logic

upvoted 1 times

...

Yilu

Highly Voted 4 years, 8 months ago

Why not 1 with lowest value in both training and validation?

upvoted 22 times

swatidorge

4 years, 2 months ago

exactly normally training isn't greater than 70% data. if we have 50%-50% split of test and training then it's fine to have a closest match.

upvoted 2 times

...

nato16

4 years, 3 months ago

Yes, why not 1

upvoted 2 times

...

sl_mslconsulting

Most Recent 8 months ago

used ChatGpt 4 and got this explanation which I agreed: The best hyperparameters to select would be the ones that have the lowest validation error, as this indicates how well the model is likely to perform on unseen data. In this case, that would be the one with a validation error of 50. The poorest training result would be the one with the highest training error. In this case, that would be the one with a training error of 400. In general, the goal of hyperparameter tuning is to minimize the validation error, which indicates how well the model is likely to perform on unseen data. The model with the lowest variance isn't necessarily the best model. A model with high bias can have low variance, but still be inaccurate. Similarly, a model with low bias can have high variance, but still be accurate. This is known as the bias-variance tradeoff.

upvoted 1 times

...

phdykd

1 year, 6 months ago

Based on these values, the optimal hyperparameter setting seems to be H1. It has the lowest total error when you consider both the training and validation error, which suggests it may be the best compromise between underfitting and overfitting. The hyperparameter setting that displays the poorest training result would be H5, as it has the highest training error (TE=400), suggesting it might be underfitting to the training data.

upvoted 2 times

...

Gferreira

2 years ago

chatGcP said : The best results are those that have a low training error and a low validation error. In the first case, the training error is 105 and the validation error is 95, while in the second case the training error is 105 and the validation error is 100. Therefore, the first case is better, as the validation error is lower. This indicates that the model is generalizing well and is not "memorizing" the training data.

upvoted 2 times

...

Mckay_

2 years, 3 months ago

The answer should be 1 and 5. When training/testing a model, the problem of overfitting and underfitting need to be considered. In the case of the best H value. H = 1 clearly produced the best model with minimum validation error on the test dataset (which is the dataset we care about).

upvoted 7 times

...

ning

2 years, 7 months ago

Poorest training result --> 5 Best H Parameter, this question does not have enough information, we do not know the sample size for training and test data, If there are both in millions, then no one cares about 100 errors vs 500 errors, if they are only in thousands, then I will only consider 1 and 4, in this case I guess 4 is given slight better results in testing, so I will go 4

upvoted 1 times

...

David_Tadeu

2 years, 9 months ago

The question is on stack exchange https://stats.stackexchange.com/questions/570322/how-to-choose-a-models-hyperparameters-in-terms-of-the-variance/570485#570485

upvoted 1 times

...

synapse

2 years, 10 months ago

The answer is 1 and 5... Why would you choose an option with the two closest error? Would you choose 300 and 299 as the best ?

upvoted 3 times

...

TheCyanideLancer

3 years ago

Agree with pepmir. 4 has least difference between validation and training result, and box 2 is about "poorest training result" which is by data given, 5

upvoted 1 times

...

dija123

3 years, 1 month ago

Underfitting – Validation and training error high Overfitting – Validation error is high, training error low Good fit – Validation error low, slightly higher than the training error Unknown fit - Validation error low, training error 'high'

upvoted 2 times

...

nit687

3 years, 7 months ago

We have to see which model generalizes well on test data..clearly in option 1 difference of train and test is 10..while in option 4 difference is only 5. So 4th one may generalize well .When we do train and test split , our target is to have as close train and test error along with minimum error

upvoted 2 times

...

kty

3 years, 10 months ago

the answer is 1 and 5 for those who calculate the difference between losses, if we have 500 and 498 we would then chose this option?

upvoted 17 times

...

adbush

3 years, 11 months ago

the best model is not 4, it is 1 looking at the difference between training and validation errors is not helpful - by this logic a model with TE 105 VE 110 would also be better than model 1. This is clearly not the case.

upvoted 3 times

...

fredgu

4 years, 1 month ago

Pepmir's explanation is correct.

upvoted 1 times

...

Pucha

4 years, 2 months ago

Why not opt 2

upvoted 1 times

...

CleMue

4 years, 6 months ago

This question is a weird one. The training error here is much higher than the validation error. Usually it's the other way around. Depending on the degree of overfitting, the VE can be a lot higher than the TE, but almost never smaller than the TE. Still the general rule for such a question is: 1) Go for the H with the smallest VE 2) The H with the highest VE is the worst. Unfortunately here H=3 and H=4 are equally bad, so doesn't make sense to choose only one of them w

upvoted 6 times

Paa_Kwesi

4 years, 1 month ago

So rather this is a case of underfitting

upvoted 2 times

...

Load full discussion...