exam questions

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 46 discussion

An online reseller has a large, multi-column dataset with one column missing 30% of its data. A Machine Learning Specialist believes that certain columns in the dataset could be used to reconstruct the missing data.
Which reconstruction approach should the Specialist use to preserve the integrity of the dataset?

  • A. Listwise deletion
  • B. Last observation carried forward
  • C. Multiple imputation
  • D. Mean substitution
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
rsimham
Highly Voted 2 years, 6 months ago
C looks correct since multiple imputation can be performed based on the related variable as given in the question
upvoted 26 times
...
harmanbirstudy
Highly Voted 2 years, 5 months ago
Multiple Imputation by Chained Equations or MICE, as per udemy this is always the best answer of all
upvoted 8 times
...
sonoluminescence
Most Recent 6 months ago
Why not D: Doesn't Account for Relationships: Mean substitution doesn't take into account the potential relationships between variables. In the scenario you provided, it's believed that other columns could help in reconstructing the missing data. Using only the mean of the missing column doesn't leverage this potential inter-column relationship. Assumption of Missing Completely at Random (MCAR): Mean substitution often operates under the assumption that the data is Missing Completely at Random (MCAR). In reality, data might be missing for a reason, and that reason might relate to other observed variables. Using mean substitution in such cases can introduce biases.
upvoted 2 times
...
loict
7 months, 2 weeks ago
Selected Answer: C
A. NO - Listwise deletion is just dropping rows B. NO - does not reconstruct the data based on other fields C. YES - by definition D. NO - does not reconstruct the data based on other fields
upvoted 2 times
...
DavidRou
7 months, 2 weeks ago
Selected Answer: C
MICE is the algorithm to choose here
upvoted 1 times
...
Mickey321
8 months ago
Selected Answer: C
Option C
upvoted 1 times
...
AjoseO
1 year, 2 months ago
Selected Answer: C
Multiple imputation is a statistical technique for handling missing data that involves generating multiple versions of the dataset with missing values filled in, and then combining the results to produce a single, complete dataset. This approach takes into account the relationship between variables in the dataset, and uses statistical models to predict missing values based on the information in other columns. This helps to preserve the integrity of the dataset by avoiding the introduction of bias or systematic error into the results.
upvoted 5 times
...
[Removed]
2 years, 5 months ago
I am trying to understand why Mean Substitution is not the solution. Imputation typically uses the mean if the missing data is random, implying the substitution is not biased.
upvoted 2 times
cpal012
1 year, 1 month ago
Mean substitution is limited to the current column. In this case, the requirement is to impute missing data from other columns
upvoted 3 times
...
rhuanca
1 year, 11 months ago
Reason is if you replace 30% of the missing values , likely you will bias the variable.
upvoted 1 times
...
...
syu31svc
2 years, 5 months ago
If it's handling missing data then imputation comes into play Answer is C 100%
upvoted 1 times
...
Wira
2 years, 6 months ago
https://www.countants.com/blogs/heres-how-you-can-configure-automatic-imputation-of-missing-data/ C
upvoted 1 times
...
roytruong
2 years, 6 months ago
it's C
upvoted 1 times
...
dhs227
2 years, 6 months ago
A common strategy used to impute missing values is to replace missing values with the mean or median value. It is important to understand your data before choosing a strategy for replacing missing values. https://docs.aws.amazon.com/machine-learning/latest/dg/feature-processing.html
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago