Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 34 discussion

Exam question from Amazon's AWS Certified Machine Learning - Specialty

Question #: 34
Topic #: 1

[All AWS Certified Machine Learning - Specialty Questions]

A Data Scientist is developing a machine learning model to predict future patient outcomes based on information collected about each patient and their treatment plans. The model should output a continuous value as its prediction. The data available includes labeled outcomes for a set of 4,000 patients. The study was conducted on a group of individuals over the age of 65 who have a particular disease that is known to worsen with age.
Initial models have performed poorly. While reviewing the underlying data, the Data Scientist notices that, out of 4,000 patient observations, there are 450 where the patient age has been input as 0. The other features for these observations appear normal compared to the rest of the sample population
How should the Data Scientist correct this issue?

A. Drop all records from the dataset where age has been set to 0.
B. Replace the age field value for records with a value of 0 with the mean or median value from the dataset
C. Drop the age feature from the dataset and train the model using the rest of the features.
D. Use k-means clustering to handle missing features

Show Suggested Answer

Suggested Answer: B 🗳️

by rsimham at Dec. 9, 2019, 8:58 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

rajs

Highly Voted 3 years, 6 months ago

Dropping the Age feature is a NOT ATOLL a good idea - as age plays a critical role in this disease as per the question Dropping 10% of data is NOT a good idea considering the fact that the number of observations is already low. The Mean or Median are a potential solutions But the question says that "Disease worsens after age 65 so there is a correlation between age and other symptoms related feature" So that means that using Unsupervised Learning we can make pretty good prediction of "Age" So the answer is D Use K-Means clustering

upvoted 39 times

L2007

3 years, 6 months ago

https://www.displayr.com/5-ways-deal-missing-data-cluster-analysis/ B is correct

upvoted 7 times

Shakespeare

4 months, 2 weeks ago

If it was KNN it would be more accurate, but we don't have that option.

upvoted 1 times

...

vetal

Highly Voted 3 years, 7 months ago

Replacing the age with mean or median might bring a bias to the dataset. Use k-means clustering to estimate the missing age based on other features might get better results. Removing 10% available data looks odd. Why not D?

upvoted 20 times

...

JonSno

Most Recent 2 months, 1 week ago

Selected Answer: B

The issue arises from incorrect age values (age = 0) in a dataset where all patients are supposed to be over 65 years old. Since age is an important predictor for the disease's progression, removing or ignoring this feature may negatively impact model performance. The best approach is imputing missing or incorrect values with a reasonable estimate (e.g., mean or median age of the dataset), ensuring that: The dataset remains intact without losing valuable patient records. The model still benefits from age as a feature. The imputed values are realistic and do not introduce bias.

upvoted 2 times

...

growe

3 months, 4 weeks ago

Selected Answer: B

Preserves data, maintains model integrity, and corrects anomalies effectively.

upvoted 1 times

...

imymoco

9 months, 3 weeks ago

B. Replace the age field value for records with a value of 0 with the mean or median value from the dataset: This method allows for retaining all patient records while addressing the anomaly. It is a standard approach for dealing with missing or incorrect values in a way that preserves the integrity of the dataset. B. GPT answer

upvoted 1 times

...

pn12345

11 months, 3 weeks ago

B-chatgpt

upvoted 1 times

...

rookiee1111

12 months ago

The question tries to mislead by adding information around the feature correlation. K-means clustering is not meant for imputing data. Hence answer should be B, that would be the right way of handling the missing value.

upvoted 1 times

...

3eb0542

1 year ago

Selected Answer: B

Using k-means clustering to handle missing features is not directly applicable to this scenario. K-means clustering is a method for grouping data points into clusters based on similarity, and it's not typically used for imputing missing values.

upvoted 4 times

...

kyuhuck

1 year, 2 months ago

Selected Answer: B

add/ comment why? b ? - >replacing the age field value for records with a value of 0 with the mean or median value from the dataset, is generally the best approach among the given options. It allows the preservation of the dataset size and leverages the remaining correct data points, assuming age is a crucial predictor in this context. However, it's vital to perform this imputation carefully to avoid introducing bias. Median is often preferred in this scenario to mitigate the impact of outliers.

upvoted 3 times

...

kyuhuck

1 year, 2 months ago

Selected Answer: B

The best way to handle the missing values in the patient age feature is to replace them with the mean or median value from the dataset. This is a common technique for imputing missing values that preserves the overall distribution of the data and avoids introducing bias or reducing the sample size. Dropping the records or the feature would result in losing valuable information and reducing the accuracy of the model. Using k-means clustering would not be appropriate for handling missing values in a single feature, as it is a method for grouping similar data points based on multiple

upvoted 2 times

...

Topg4u

1 year, 2 months ago

mean or median is for outliers so D

upvoted 1 times

...

endeesa

1 year, 5 months ago

Selected Answer: B

Obviously B, why would you use a clustering algorithm to predict a value? D just doesn't make sense

upvoted 4 times

...

geoan13

1 year, 5 months ago

B is correct.K-means is unsupervised and used mainly for clustering. KNN would have been more accurate. It can be used to predict a value. since knn is not present i think it is mean median value

upvoted 4 times

...

elvin_ml_qayiran25091992razor

1 year, 5 months ago

Selected Answer: B

B is correct or KNN, but dont K means

upvoted 4 times

...

loict

1 year, 7 months ago

Selected Answer: D

A. NO - unless we want to loose 10% of the data B. NO - age is predictive, so using the means we would introduce a bias C. NO - age is predictive D. YES - better quality than B, it is likely that other physiological values can help predict the age

upvoted 2 times

...

FloKo

1 year, 9 months ago

Selected Answer: D

k-means should give the best estimation of the age. Using mean would reduce the correlation between outcome and age for the model.

upvoted 1 times

...

jyrajan69

1 year, 9 months ago

How can it be when there is a labelled outcome, which means this is Supervised and K-Means is for UnSupervised. So only possible answer should be B

upvoted 3 times

...

Load full discussion...

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 34 discussion

Comments

rajs

L2007

Shakespeare

vetal

JonSno

growe

imymoco

pn12345

rookiee1111

3eb0542

kyuhuck

kyuhuck

Topg4u

endeesa

geoan13

elvin_ml_qayiran25091992razor

loict

FloKo

jyrajan69

SY0-701