exam questions

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 310 discussion

A machine learning engineer is building a bird classification model. The engineer randomly separates a dataset into a training dataset and a validation dataset. During the training phase, the model achieves very high accuracy. However, the model did not generalize well during validation of the validation dataset. The engineer realizes that the original dataset was imbalanced.

What should the engineer do to improve the validation accuracy of the model?

  • A. Perform stratified sampling on the original dataset.
  • B. Acquire additional data about the majority classes in the original dataset.
  • C. Use a smaller, randomly sampled version of the training dataset.
  • D. Perform systematic sampling on the original dataset.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Peter_Hsieh
5 months, 4 weeks ago
Selected Answer: A
https://aws.amazon.com/about-aws/whats-new/2022/04/amazon-sagemaker-data-wrangler-supports-random-sampling-stratified-sampling/
upvoted 2 times
...
F1Fan
7 months ago
A. Balanced Class Representation. Stratified sampling divides the original dataset into strata (groups) based on the class labels. It then selects instances from each stratum in a proportional manner, ensuring that the class distribution in the training and validation datasets reflects the original class distribution. Improved Generalization. By having a balanced representation of all classes in the training and validation datasets, the model is exposed to a diverse range of instances during training. This helps the model learn the distinguishing features of each class more effectively, leading to better generalization performance on the validation dataset. Addressing Imbalanced Data. Stratified sampling directly addresses the issue of imbalanced data, which was identified as the root cause of the model's poor generalization performance on the validation dataset.
upvoted 1 times
...
vkbajoria
7 months, 1 week ago
Selected Answer: A
Stratified sampling
upvoted 1 times
...
AIWave
7 months, 2 weeks ago
Selected Answer: A
A: Yes - Stratified sampling ensures that each class is proportionally represented and mitigates the impact of class imbalance on model performance B: No - additional data about the majority classes does not solve class imbalance issue C: No - Does not solve class imbalance issue and may worsen the situation D: No - selecting data points at regular intervals does not solve class imbalance issue
upvoted 3 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago