exam questions

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 34 discussion

A Data Scientist is developing a machine learning model to predict future patient outcomes based on information collected about each patient and their treatment plans. The model should output a continuous value as its prediction. The data available includes labeled outcomes for a set of 4,000 patients. The study was conducted on a group of individuals over the age of 65 who have a particular disease that is known to worsen with age.
Initial models have performed poorly. While reviewing the underlying data, the Data Scientist notices that, out of 4,000 patient observations, there are 450 where the patient age has been input as 0. The other features for these observations appear normal compared to the rest of the sample population
How should the Data Scientist correct this issue?

  • A. Drop all records from the dataset where age has been set to 0.
  • B. Replace the age field value for records with a value of 0 with the mean or median value from the dataset
  • C. Drop the age feature from the dataset and train the model using the rest of the features.
  • D. Use k-means clustering to handle missing features
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
rajs
Highly Voted 3 years, 6 months ago
Dropping the Age feature is a NOT ATOLL a good idea - as age plays a critical role in this disease as per the question Dropping 10% of data is NOT a good idea considering the fact that the number of observations is already low. The Mean or Median are a potential solutions But the question says that "Disease worsens after age 65 so there is a correlation between age and other symptoms related feature" So that means that using Unsupervised Learning we can make pretty good prediction of "Age" So the answer is D Use K-Means clustering
upvoted 39 times
L2007
3 years, 6 months ago
https://www.displayr.com/5-ways-deal-missing-data-cluster-analysis/ B is correct
upvoted 7 times
Shakespeare
4 months, 2 weeks ago
If it was KNN it would be more accurate, but we don't have that option.
upvoted 1 times
...
...
...
vetal
Highly Voted 3 years, 7 months ago
Replacing the age with mean or median might bring a bias to the dataset. Use k-means clustering to estimate the missing age based on other features might get better results. Removing 10% available data looks odd. Why not D?
upvoted 20 times
...
JonSno
Most Recent 2 months, 1 week ago
Selected Answer: B
The issue arises from incorrect age values (age = 0) in a dataset where all patients are supposed to be over 65 years old. Since age is an important predictor for the disease's progression, removing or ignoring this feature may negatively impact model performance. The best approach is imputing missing or incorrect values with a reasonable estimate (e.g., mean or median age of the dataset), ensuring that: The dataset remains intact without losing valuable patient records. The model still benefits from age as a feature. The imputed values are realistic and do not introduce bias.
upvoted 2 times
...
growe
3 months, 4 weeks ago
Selected Answer: B
Preserves data, maintains model integrity, and corrects anomalies effectively.
upvoted 1 times
...
imymoco
9 months, 3 weeks ago
B. Replace the age field value for records with a value of 0 with the mean or median value from the dataset: This method allows for retaining all patient records while addressing the anomaly. It is a standard approach for dealing with missing or incorrect values in a way that preserves the integrity of the dataset. B. GPT answer
upvoted 1 times
...
pn12345
11 months, 3 weeks ago
B-chatgpt
upvoted 1 times
...
rookiee1111
12 months ago
The question tries to mislead by adding information around the feature correlation. K-means clustering is not meant for imputing data. Hence answer should be B, that would be the right way of handling the missing value.
upvoted 1 times
...
3eb0542
1 year ago
Selected Answer: B
Using k-means clustering to handle missing features is not directly applicable to this scenario. K-means clustering is a method for grouping data points into clusters based on similarity, and it's not typically used for imputing missing values.
upvoted 4 times
...
kyuhuck
1 year, 2 months ago
Selected Answer: B
add/ comment why? b ? - >replacing the age field value for records with a value of 0 with the mean or median value from the dataset, is generally the best approach among the given options. It allows the preservation of the dataset size and leverages the remaining correct data points, assuming age is a crucial predictor in this context. However, it's vital to perform this imputation carefully to avoid introducing bias. Median is often preferred in this scenario to mitigate the impact of outliers.
upvoted 3 times
...
kyuhuck
1 year, 2 months ago
Selected Answer: B
The best way to handle the missing values in the patient age feature is to replace them with the mean or median value from the dataset. This is a common technique for imputing missing values that preserves the overall distribution of the data and avoids introducing bias or reducing the sample size. Dropping the records or the feature would result in losing valuable information and reducing the accuracy of the model. Using k-means clustering would not be appropriate for handling missing values in a single feature, as it is a method for grouping similar data points based on multiple
upvoted 2 times
...
Topg4u
1 year, 2 months ago
mean or median is for outliers so D
upvoted 1 times
...
endeesa
1 year, 5 months ago
Selected Answer: B
Obviously B, why would you use a clustering algorithm to predict a value? D just doesn't make sense
upvoted 4 times
...
geoan13
1 year, 5 months ago
B is correct.K-means is unsupervised and used mainly for clustering. KNN would have been more accurate. It can be used to predict a value. since knn is not present i think it is mean median value
upvoted 4 times
...
elvin_ml_qayiran25091992razor
1 year, 5 months ago
Selected Answer: B
B is correct or KNN, but dont K means
upvoted 4 times
...
loict
1 year, 7 months ago
Selected Answer: D
A. NO - unless we want to loose 10% of the data B. NO - age is predictive, so using the means we would introduce a bias C. NO - age is predictive D. YES - better quality than B, it is likely that other physiological values can help predict the age
upvoted 2 times
...
FloKo
1 year, 9 months ago
Selected Answer: D
k-means should give the best estimation of the age. Using mean would reduce the correlation between outcome and age for the model.
upvoted 1 times
...
jyrajan69
1 year, 9 months ago
How can it be when there is a labelled outcome, which means this is Supervised and K-Means is for UnSupervised. So only possible answer should be B
upvoted 3 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago