A gaming company has launched an online game where people can start playing for free, but they need to pay if they choose to use certain features. The company needs to build an automated system to predict whether or not a new user will become a paid user within 1 year. The company has gathered a labeled dataset from 1 million users.
The training dataset consists of 1,000 positive samples (from users who ended up paying within 1 year) and 999,000 negative samples (from users who did not use any paid features). Each data sample consists of 200 features including user age, device, location, and play patterns.
Using this dataset for training, the Data Science team trained a random forest model that converged with over 99% accuracy on the training set. However, the prediction results on a test dataset were not satisfactory
Which of the following approaches should the Data Science team take to mitigate this issue? (Choose two.)
Phong
Highly Voted 3 years, 6 months agoPhong
Highly Voted 3 years, 6 months agoJonSno
Most Recent 1 month, 1 week agodinITExam
5 months agoJohn_Pongthorn
3 years, 1 month agoapprehensive_scar
3 years, 1 month agocloud_trail
3 years, 4 months agofelbuch
3 years, 4 months agoengomaradel
3 years, 5 months agoybad
3 years, 5 months agoOmar_Cascudo
3 years, 5 months agobidds
3 years, 5 months agohans1234
3 years, 5 months agoWira
3 years, 5 months agoaws_razor
3 years, 5 months agoroytruong
3 years, 5 months agowuha5086
3 years, 5 months ago