Exam AWS Certified Machine Learning - Specialty topic 1 question 310 discussion

Exam question from Amazon's AWS Certified Machine Learning - Specialty

Question #: 310
Topic #: 1

[All AWS Certified Machine Learning - Specialty Questions]

A machine learning engineer is building a bird classification model. The engineer randomly separates a dataset into a training dataset and a validation dataset. During the training phase, the model achieves very high accuracy. However, the model did not generalize well during validation of the validation dataset. The engineer realizes that the original dataset was imbalanced.

What should the engineer do to improve the validation accuracy of the model?

A. Perform stratified sampling on the original dataset.
B. Acquire additional data about the majority classes in the original dataset.
C. Use a smaller, randomly sampled version of the training dataset.
D. Perform systematic sampling on the original dataset.

Show Suggested Answer

Suggested Answer: A 🗳️

by AIWave at March 9, 2024, 9:57 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

Peter_Hsieh

5 months, 4 weeks ago

Selected Answer: A

https://aws.amazon.com/about-aws/whats-new/2022/04/amazon-sagemaker-data-wrangler-supports-random-sampling-stratified-sampling/

upvoted 2 times

...

F1Fan

7 months ago

A. Balanced Class Representation. Stratified sampling divides the original dataset into strata (groups) based on the class labels. It then selects instances from each stratum in a proportional manner, ensuring that the class distribution in the training and validation datasets reflects the original class distribution. Improved Generalization. By having a balanced representation of all classes in the training and validation datasets, the model is exposed to a diverse range of instances during training. This helps the model learn the distinguishing features of each class more effectively, leading to better generalization performance on the validation dataset. Addressing Imbalanced Data. Stratified sampling directly addresses the issue of imbalanced data, which was identified as the root cause of the model's poor generalization performance on the validation dataset.

upvoted 1 times

...

vkbajoria

7 months, 1 week ago

Selected Answer: A

Stratified sampling

upvoted 1 times

...

AIWave

7 months, 2 weeks ago

Selected Answer: A

A: Yes - Stratified sampling ensures that each class is proportionally represented and mitigates the impact of class imbalance on model performance B: No - additional data about the majority classes does not solve class imbalance issue C: No - Does not solve class imbalance issue and may worsen the situation D: No - selecting data points at regular intervals does not solve class imbalance issue

upvoted 3 times

...

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 310 discussion

Comments

Peter_Hsieh

F1Fan

vkbajoria

AIWave

SY0-701