Exam DP-100 All Questions

View all questions & answers for the DP-100 exam

Exam DP-100 topic 2 question 38 discussion

Actual exam question from Microsoft's DP-100

Question #: 38
Topic #: 2

You are creating a new experiment in Azure Machine Learning Studio. You have a small dataset that has missing values in many columns. The data does not require the application of predictors for each column. You plan to use the Clean Missing Data.
You need to select a data cleaning method.
Which method should you use?

A. Replace using Probabilistic PCA
B. Normalization
C. Synthetic Minority Oversampling Technique (SMOTE)
D. Replace using MICE

Show Suggested Answer

Suggested Answer: A 🗳️

by Andrexx at Nov. 13, 2020, 12:51 a.m.

Comments

Submit Cancel

ajaysdr

4 months ago

Selected Answer: D

The MICE (Multiple Imputation by Chained Equations) method is effective for imputing missing values by considering the relationships between different columns, making it suitable for datasets with many missing values

upvoted 1 times

...

raidenstrike1945

4 months, 2 weeks ago

Selected Answer: D

CoPilot game me this ans: D. Replace using MICE (Multiple Imputation by Chained Equations) MICE is an effective imputation technique that can handle multiple columns with missing values by using regression models to iteratively impute the missing data, making it suitable for your needs.

upvoted 1 times

...

Hisayuki

5 months, 3 weeks ago

Selected Answer: A

The point is "The data does not require the application of predictors for each column." So, it means reducing the dimension and use the PCA - Primary Component Analysis

upvoted 3 times

...

PI_Team

9 months, 2 weeks ago

Question is outdated in my opinion. In Clean Missing Data, you can see only: Replace with meanmedian/mode/ and remove enitre row/column https://learn.microsoft.com/en-us/azure/machine-learning/component-reference/clean-missing-data?view=azureml-api-2

upvoted 4 times

...

phdykd

10 months, 1 week ago

A is the answer. It is in classic version.

upvoted 1 times

...

krishna1818

11 months ago

Selected Answer: A

As a predictor is not required we can use PPCA method

upvoted 2 times

...

ajay0011

1 year ago

Answer is PPCA. MICE is wrong totally check documentation.

upvoted 1 times

...

phdykd

1 year, 2 months ago

D. Replace using MICE (Multiple Imputation by Chained Equations) is a method that should be used to clean missing data in this scenario. It is commonly used when the data has missing values and the aim is to impute the missing values while preserving the relationships among variables in the data. A. Replace using Probabilistic PCA (Principal Component Analysis) is not the most suitable method for cleaning missing data in this scenario, as it is typically used for dimensionality reduction and feature extraction, rather than imputing missing values. The method of choice for cleaning missing data in this scenario is D. Replace using MICE (Multiple Imputation by Chained Equations), as it is commonly used for imputing missing values while preserving the relationships among variables in the data.

upvoted 4 times

...

ruggerofreddi

1 year, 11 months ago

PCA is for dimentionality reduction: it diagonalize the covariance matrix (being simmetric for the spectral theorem u can always diagonalize it) and than cuts off the dimensions with small eigenvalues/variance... I am not aware of any variant of this algoritm to impute missing values. do you have any reference? thank you

upvoted 1 times

lewitt

1 year, 7 months ago

Only did it once in uni, but PCA is a legit method for imputing missing values. If I remember well the whole idea was that you generate the missing values through a linear regression using the features z generated by the PCA process. Either way, I might be very wrong and this link seems to explain better than I do: https://stats.stackexchange.com/a/43125

upvoted 1 times

...

ning

1 year, 11 months ago

Selected Answer: A

MICE vs PPCA, this is not so easy to answer in practice, for exam purpose, I agree with A

upvoted 1 times

...

MohammadKhubeb

2 years, 2 months ago

A, is the correct answer. For dimension reduction. PCA algo is significantly used.

upvoted 1 times

...

adamwar

2 years, 6 months ago

What does "application of predictors" for each column mean?

upvoted 2 times

Padilha

1 year, 3 months ago

It means you will not need to used all the other columns to predict (or replace) the missing values in one column. Basically it's saying that you will not apply a method like linear regression using all the other columns to fill the missing column. That's what MICE do, so they said that to you eliminate that option

upvoted 1 times

...

Samuela

2 years, 5 months ago

I have the same question, could someone plz explain?

upvoted 1 times

Sichlis

2 years, 4 months ago

I think this just means, that MICE uses the last value before a NULL value to calculate a good representive for this NULL values, but in case there are a lot of NULL values this technique isn´t a good solution and therefor Probabilistic PCA (which doesn´t need the predessesor values) is the better choice.

upvoted 3 times

DingDongSingSong

2 years ago

You're describing Last Observation Carried Forward not MICE. Application of predictors reference makes no sense here with respect to data cleansing

upvoted 1 times

...

Vipuls

3 years, 4 months ago

Yes, given Answer is right

upvoted 4 times

...

Andrexx

3 years, 5 months ago

Agree with the answer

upvoted 3 times

...

Exam DP-100 All Questions

View all questions & answers for the DP-100 exam

Exam DP-100 topic 2 question 38 discussion

Comments

ajaysdr

raidenstrike1945

Hisayuki

PI_Team

phdykd

krishna1818

ajay0011

phdykd

ruggerofreddi

lewitt

ning

MohammadKhubeb

adamwar

Padilha

Samuela

Sichlis

DingDongSingSong

Vipuls

Andrexx

SY0-701