exam questions

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 158 discussion

A data scientist is training a text classification model by using the Amazon SageMaker built-in BlazingText algorithm. There are 5 classes in the dataset, with 300 samples for category A, 292 samples for category B, 240 samples for category C, 258 samples for category D, and 310 samples for category E.
The data scientist shuffles the data and splits off 10% for testing. After training the model, the data scientist generates confusion matrices for the training and test sets.


What could the data scientist conclude form these results?

  • A. Classes C and D are too similar.
  • B. The dataset is too small for holdout cross-validation.
  • C. The data distribution is skewed.
  • D. The model is overfitting for classes B and E.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
LydiaGom
Highly Voted 2 years, 5 months ago
Isn't it A? the model doesn't classify C & D well.
upvoted 8 times
...
dolorez
Highly Voted 2 years, 5 months ago
Selected Answer: A
the correct answer should be A, the model is clearly unable to tell C and D apart the reason why B is incorrect is subtle - there is holdout validation or cross-validation, but not holdout cross-validation; while I think it would be more reasonable to use CV with such a small dataset rather than holdout, the answer is mixing terms and therefore should be wrong also, the test set confusion matrix is still pretty comparable to the train set one, so I wouldn't say there is objective evidence to claim holdout is a wrong choice here
upvoted 7 times
...
Antoh1978
Most Recent 4 months, 2 weeks ago
Selected Answer: A
I would go for A as well.
upvoted 1 times
...
tueo
6 months, 1 week ago
Selected Answer: A
I think option A is correct as C & D are behaving similarly.
upvoted 1 times
...
vkbajoria
7 months, 3 weeks ago
I think the answer is D. A => C, D are similar in train but the testing results contradict that. There are many As and Bs for C
upvoted 2 times
...
kyuhuck
8 months, 2 weeks ago
Selected Answer: D
These results indicate that the model is overfitting for classes B and E, meaning that it is memorizing the specific features of these classes in the training data, but failing to capture the general features that are applicable to the test data. Overfitting is a common problem in machine learning, where the model performs well on the training data, but poorly on the test data3. Some possible causes of overfitting are: The model is too complex or has too many parameters for the given data. This makes the model flexible enough to fit the noise and outliers in the training data, but reduces its ability to generalize to new data
upvoted 1 times
...
DimLam
1 year ago
Selected Answer: B
Actually, both A and D are true. It would be an easy one if we had to choose two answers. But we need to choose only one. So how to make sure that the person who created this question thought about A only? Also if we take a look into the test confusion matrix. We can see that the A class also missed with C class at the same rate as the C and D classes. I would even say that here the model is generally overfitted. I would go for B
upvoted 3 times
DimLam
1 year ago
Also because of random peeking of test set entries, we got the wrong proportions of labels between train and test sets. So the answer can be even C
upvoted 1 times
...
...
kaike_reis
1 year, 2 months ago
Selected Answer: A
Letter A is correct. The model gets confused between (C) and (D) in training and testing.
upvoted 1 times
DimLam
1 year ago
But on the test set it's even confused between A and C classes
upvoted 1 times
...
...
rockyykrish
1 year, 2 months ago
Selected Answer: B Hold-out Hold-out is when you split up your dataset into a ‘train’ and ‘test’ set. The training set is what the model is trained on, and the test set is used to see how well that model performs on unseen data. A common split when using the hold-out method is using 80% of data for training and the remaining 20% of the data for testing. Hold-out Hold-out is when you split up your dataset into a ‘train’ and ‘test’ set. The training set is what the model is trained on, and the test set is used to see how well that model performs on unseen data. A common split when using the hold-out method is using 80% of data for training and the remaining 20% of the data for testing. Refere: https://medium.com/@eijaz/holdout-vs-cross-validation-in-machine-learning-7637112d3f8f
upvoted 1 times
...
Mickey321
1 year, 2 months ago
Selected Answer: A
Model in unable to tell c&D
upvoted 1 times
...
DD4
2 years, 1 month ago
D - Training accuracies of B and E are higher than those of test, whereas A has similar accuracy in both. For C and D, test accuracy has actually improved.
upvoted 3 times
ZSun
1 year, 5 months ago
B and C has below 50% accuracy. D has 98% in train and 86% accuracy in test. And you are telling me, the take away is overfitting of D, Seriously???
upvoted 1 times
...
...
tgaos
2 years, 4 months ago
I think the answer is A. The model doesn't perform well on class C and D in both training and testing dataset. I don't think B is relevant to the question(cross-validation is not mentioned in the question)
upvoted 3 times
...
exam887
2 years, 4 months ago
Selected Answer: A
What means holdout cross validation. There should be holdout validation vs cross validation
upvoted 2 times
...
bluer1
2 years, 5 months ago
B should be the correct answer
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago