Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 158 discussion

Exam question from Amazon's AWS Certified Machine Learning - Specialty

Question #: 158
Topic #: 1

[All AWS Certified Machine Learning - Specialty Questions]

A data scientist is training a text classification model by using the Amazon SageMaker built-in BlazingText algorithm. There are 5 classes in the dataset, with 300 samples for category A, 292 samples for category B, 240 samples for category C, 258 samples for category D, and 310 samples for category E.
The data scientist shuffles the data and splits off 10% for testing. After training the model, the data scientist generates confusion matrices for the training and test sets.

What could the data scientist conclude form these results?

A. Classes C and D are too similar.
B. The dataset is too small for holdout cross-validation.
C. The data distribution is skewed.
D. The model is overfitting for classes B and E.

Show Suggested Answer

Suggested Answer: A 🗳️

by bluer1 at May 2, 2022, 4:17 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

LydiaGom

Highly Voted 2 years, 5 months ago

Isn't it A? the model doesn't classify C & D well.

upvoted 8 times

...

dolorez

Highly Voted 2 years, 5 months ago

Selected Answer: A

the correct answer should be A, the model is clearly unable to tell C and D apart the reason why B is incorrect is subtle - there is holdout validation or cross-validation, but not holdout cross-validation; while I think it would be more reasonable to use CV with such a small dataset rather than holdout, the answer is mixing terms and therefore should be wrong also, the test set confusion matrix is still pretty comparable to the train set one, so I wouldn't say there is objective evidence to claim holdout is a wrong choice here

upvoted 7 times

...

Antoh1978

Most Recent 4 months, 2 weeks ago

Selected Answer: A

I would go for A as well.

upvoted 1 times

...

tueo

6 months, 1 week ago

Selected Answer: A

I think option A is correct as C & D are behaving similarly.

upvoted 1 times

...

vkbajoria

7 months, 3 weeks ago

I think the answer is D. A => C, D are similar in train but the testing results contradict that. There are many As and Bs for C

upvoted 2 times

...

kyuhuck

8 months, 2 weeks ago

Selected Answer: D

These results indicate that the model is overfitting for classes B and E, meaning that it is memorizing the specific features of these classes in the training data, but failing to capture the general features that are applicable to the test data. Overfitting is a common problem in machine learning, where the model performs well on the training data, but poorly on the test data3. Some possible causes of overfitting are: The model is too complex or has too many parameters for the given data. This makes the model flexible enough to fit the noise and outliers in the training data, but reduces its ability to generalize to new data

upvoted 1 times

...

DimLam

1 year ago

Selected Answer: B

Actually, both A and D are true. It would be an easy one if we had to choose two answers. But we need to choose only one. So how to make sure that the person who created this question thought about A only? Also if we take a look into the test confusion matrix. We can see that the A class also missed with C class at the same rate as the C and D classes. I would even say that here the model is generally overfitted. I would go for B

upvoted 3 times

DimLam

1 year ago

Also because of random peeking of test set entries, we got the wrong proportions of labels between train and test sets. So the answer can be even C

upvoted 1 times

...

kaike_reis

1 year, 2 months ago

Selected Answer: A

Letter A is correct. The model gets confused between (C) and (D) in training and testing.

upvoted 1 times

DimLam

1 year ago

But on the test set it's even confused between A and C classes

upvoted 1 times

...

rockyykrish

1 year, 2 months ago

Selected Answer: B Hold-out Hold-out is when you split up your dataset into a ‘train’ and ‘test’ set. The training set is what the model is trained on, and the test set is used to see how well that model performs on unseen data. A common split when using the hold-out method is using 80% of data for training and the remaining 20% of the data for testing. Hold-out Hold-out is when you split up your dataset into a ‘train’ and ‘test’ set. The training set is what the model is trained on, and the test set is used to see how well that model performs on unseen data. A common split when using the hold-out method is using 80% of data for training and the remaining 20% of the data for testing. Refere: https://medium.com/@eijaz/holdout-vs-cross-validation-in-machine-learning-7637112d3f8f

upvoted 1 times

...

Mickey321

1 year, 2 months ago

Selected Answer: A

Model in unable to tell c&D

upvoted 1 times

...

DD4

2 years, 1 month ago

D - Training accuracies of B and E are higher than those of test, whereas A has similar accuracy in both. For C and D, test accuracy has actually improved.

upvoted 3 times

ZSun

1 year, 5 months ago

B and C has below 50% accuracy. D has 98% in train and 86% accuracy in test. And you are telling me, the take away is overfitting of D, Seriously???

upvoted 1 times

...

tgaos

2 years, 4 months ago

I think the answer is A. The model doesn't perform well on class C and D in both training and testing dataset. I don't think B is relevant to the question(cross-validation is not mentioned in the question)

upvoted 3 times

...

exam887

2 years, 4 months ago

Selected Answer: A

What means holdout cross validation. There should be holdout validation vs cross validation

upvoted 2 times

...

bluer1

2 years, 5 months ago

B should be the correct answer

upvoted 2 times

...

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 158 discussion

Comments

LydiaGom

dolorez

Antoh1978

tueo

vkbajoria

kyuhuck

DimLam

DimLam

kaike_reis

DimLam

rockyykrish

Mickey321

DD4

ZSun

tgaos

exam887

bluer1

SY0-701