Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 33 discussion

A gaming company has launched an online game where people can start playing for free, but they need to pay if they choose to use certain features. The company needs to build an automated system to predict whether or not a new user will become a paid user within 1 year. The company has gathered a labeled dataset from 1 million users.
The training dataset consists of 1,000 positive samples (from users who ended up paying within 1 year) and 999,000 negative samples (from users who did not use any paid features). Each data sample consists of 200 features including user age, device, location, and play patterns.
Using this dataset for training, the Data Science team trained a random forest model that converged with over 99% accuracy on the training set. However, the prediction results on a test dataset were not satisfactory
Which of the following approaches should the Data Science team take to mitigate this issue? (Choose two.)

  • A. Add more deep trees to the random forest to enable the model to learn more features.
  • B. Include a copy of the samples in the test dataset in the training dataset.
  • C. Generate more positive samples by duplicating the positive samples and adding a small amount of noise to the duplicated data.
  • D. Change the cost function so that false negatives have a higher impact on the cost value than false positives.
  • E. Change the cost function so that false positives have a higher impact on the cost value than false negatives.
Show Suggested Answer Hide Answer
Suggested Answer: CD 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
Phong
Highly Voted 3 years, 1 month ago
I think it should be CD C: because we need a balance dataset D: The number of positive samples is large so model tends to predict 0 (negative) for all cases leading to False Negative problem. We should minimize that. My opinion
upvoted 30 times
...
Phong
Highly Voted 3 years, 1 month ago
I think it should be CD C: because we need a balance dataset D: The number of negative samples is large so model tends to predict 0 (negative) for all cases leading to False Negative problem. We should minimize that. My opinion
upvoted 23 times
...
dinITExam
Most Recent 2 weeks, 1 day ago
Think C and D
upvoted 1 times
...
John_Pongthorn
2 years, 8 months ago
Selected Answer: CD
C,D is correct (percentage of the positive class is key to decide which case we are interested in) This question, positive class (Pay) is 0.01% as compared to 99.99( not pay) , as a result, we have to pay attention to Pay because if we miss 0.01% out, we didn't get revenue. it is a false negative. In contrast to these questions, it positive class (Pay) is 40% as compared to negative class (60% not pay), it is avoidable to emphasize on 40% ( if model predict as payment but in reality customer neglect), we won't get revenue the amount from false positive)
upvoted 5 times
...
apprehensive_scar
2 years, 9 months ago
I think is CD
upvoted 1 times
...
cloud_trail
3 years ago
C and D. Hopefully, no one honestly thinks that B is a good answer. Never expose test data to the training set or vice versa. C is right because of the highly imbalanced training set. D is right because you want to minimize false negatives, maximize true positives, maximize recall of the positive class. I'm not sure why anyone's worried about precision in this case.
upvoted 4 times
...
felbuch
3 years ago
CD The model has 99% accuracy because it's simply predicting that everyone's a negative. Since almost everyone's a negative, it will get almost everyone right. So we need to penalize the model for predicting that someone is a negative when it is not (i.e. penalize false negatives). So that's D. Also, it would be really nice to have more positives -- one way to do that is to follow option C.
upvoted 7 times
...
engomaradel
3 years ago
CD 100%
upvoted 1 times
...
ybad
3 years ago
CD C:imbalance of test (1000 positive, 999000 negative = 0.1% positive) thus C to increase that D :also to reduce generalizing, since everyone says no, the model would generalize to no, but increasing the penalty of a false negative would reduce generalizing..
upvoted 2 times
...
Omar_Cascudo
3 years ago
It is needed to diminish the FP, because they are player predicted to pay and in reality will not pay. So FP should impact the cost metric more. CE should be the answer.
upvoted 2 times
...
bidds
3 years ago
CD are correct for sure.
upvoted 3 times
...
hans1234
3 years, 1 month ago
It is C,E... we want to find all paying customers, which are positives, so we have to punish incorrectly finding negatives, which is E
upvoted 2 times
...
Wira
3 years, 1 month ago
CD although i am worried about the noise being introduced as it could skew the data nevertheless no better answer is given
upvoted 2 times
...
aws_razor
3 years, 1 month ago
CD We need high recall so that we do not miss many Positive cases. In that case we need to have less False Negative(FN) therefore it should have high impact on cost function.
upvoted 3 times
...
roytruong
3 years, 1 month ago
in my view, CD are answers C: of course, handle the imbalanced dataset D: right now, model accuracy is 99%, it means model predict everything is negative leading to FN problem, so we need to minimize it more in cost function
upvoted 3 times
...
wuha5086
3 years, 1 month ago
CD, FN are valuable players, we should care more on FN
upvoted 8 times
...
VB
3 years, 1 month ago
Is my assumption right here? ACTUAL -------------------------------------- P PAY NPAY R -------------------------------------- E PAY TP FP D I NPAY FN TN C -----------------------------------
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...