exam questions

Exam DP-100 All Questions

View all questions & answers for the DP-100 exam

Exam DP-100 topic 1 question 21 discussion

Actual exam question from Microsoft's DP-100
Question #: 21
Topic #: 1
[All DP-100 Questions]

HOTSPOT -
Complete the sentence by selecting the correct option in the answer area.
Hot Area:

Show Suggested Answer Hide Answer
Suggested Answer:
Replace using Probabilistic PCA: Compared to other options, such as Multiple Imputation using Chained Equations (MICE), this option has the advantage of not requiring the application of predictors for each column. Instead, it approximates the covariance for the full dataset. Therefore, it might offer better performance for datasets that have missing values in many columns.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/clean-missing-data

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
pancman
Highly Voted 3 years ago
I don't think that this is a real exam question. Median and custom substitution techniques don't require a predictor either.
upvoted 19 times
...
rishi_ram
Highly Voted 1 year, 11 months ago
Replace using Probabilistic PCA: Replaces the missing values by using a linear model that analyzes the correlations between the columns and estimates a low-dimensional approximation of the data, from which the full data is reconstructed. The underlying dimensionality reduction is a probabilistic form of Principal Component Analysis (PCA), and it implements a variant of the model proposed in the Journal of the Royal Statistical Society, Series B 21(3), 611–622 by Tipping and Bishop. Compared to other options, such as Multiple Imputation using Chained Equations (MICE), this option has the advantage of not requiring the application of predictors for each column. Instead, it approximates the covariance for the full dataset. Therefore, it might offer better performance for datasets that have missing values in many columns. https://learn.microsoft.com/en-us/previous-versions/azure/machine-learning/studio-module-reference/clean-missing-data
upvoted 5 times
...
geethavkr
Most Recent 8 months, 2 weeks ago
correct.. Outdated but in previous versions it says Replace using Probabilistic PCA: Replaces the missing values by using a linear model that analyzes the correlations between the columns and estimates a low-dimensional approximation of the data, from which the full data is reconstructed. The underlying dimensionality reduction is a probabilistic form of Principal Component Analysis (PCA), and it implements a variant of the model proposed in the Journal of the Royal Statistical Society, Series B 21(3), 611–622 by Tipping and Bishop. Compared to other options, such as Multiple Imputation using Chained Equations (MICE), this option has the advantage of not requiring the application of predictors for each column. Instead, it approximates the covariance for the full dataset. Therefore, it might offer better performance for datasets that have missing values in many columns.
upvoted 1 times
...
kay1101
11 months ago
I think this is an outdated question. as of may 2024, PCA is no longer in the clean missing data module. reference: https://learn.microsoft.com/en-us/azure/machine-learning/component-reference/clean-missing-data?view=azureml-api-2 however, in the past, PCA did in the clean missing data module. reference:https://learn.microsoft.com/en-us/previous-versions/azure/machine-learning/studio-module-reference/clean-missing-data at the time of the question was created, PCA may be correct. but now, i thick is either median or custom substitution value.
upvoted 1 times
...
InversaRadice
1 year, 4 months ago
answer is 100% correct ... Replace using Probabilistic PCA: ... Compared to other options, such as Multiple Imputation using Chained Equations (MICE), this option has the advantage of not requiring the application of predictors for each column. https://learn.microsoft.com/en-us/previous-versions/azure/machine-learning/studio-module-reference/clean-missing-data
upvoted 2 times
...
eternaleclipse
1 year, 6 months ago
What pancman said. outdated question
upvoted 1 times
...
IvanTT
1 year, 6 months ago
It can't be "A. Probabilistic PCA" because it isn't an option for the Clean Missing Data module. Here is the reference: https://learn.microsoft.com/en-us/azure/machine-learning/component-reference/clean-missing-data?view=azureml-api-2 It could be "D. Custom Substitution Value". The option "B. Median" isn't the exact option for the module which it's "Replace with median".
upvoted 1 times
...
james2033
1 year, 6 months ago
Qutote "Replace using Probabilistic PCA: Replaces the missing values by using a linear model that analyzes the correlations between the columns and estimates a low-dimensional approximation of the data, from which the full data is reconstructed. The underlying dimensionality reduction is a probabilistic form of Principal Component Analysis (PCA), and it implements a variant of the model proposed in the Journal of the Royal Statistical Society, Series B 21(3), 611–622 by Tipping and Bishop. Compared to other options, such as Multiple Imputation using Chained Equations (MICE), this option has the advantage of not requiring the application of predictors for each column." Reference https://learn.microsoft.com/en-us/previous-versions/azure/machine-learning/studio-module-reference/clean-missing-data#:~:text=this%20option%20has%20the%20advantage%20of%20not%20requiring%20the%20application%20of%20predictors%20for%20each%20column.
upvoted 1 times
...
rakeshmk
1 year, 7 months ago
PCA is a dimensionality reduction technique.. Median can be the answer
upvoted 3 times
...
PradhanManva
1 year, 7 months ago
PCA -This is the answer.
upvoted 1 times
...
MarinaMijailovic
2 years ago
Correct answer is medain - it only calulates the medain from the given column, no other columns required pca - needs predictors to calculate the probabilities smote - needs predictors to generate synthetic samples for the minority class csv - doesn't really need predictors per se, but still requires some knoweldge about the data to pick the right value
upvoted 3 times
...
Truman
2 years ago
One data cleaning option that does not require predictors for each column in the Clean Missing Data module is the "Replace with mean" option. This option replaces missing values in a column with the mean of the available values in that column All these options are false
upvoted 1 times
...
Vic9
2 years ago
A https://learn.microsoft.com/en-us/previous-versions/azure/machine-learning/studio-module-reference/clean-missing-data "Replace using Probabilistic PCA: Replaces the missing values by using a linear model that analyzes the correlations between the columns and estimates a low-dimensional approximation of the data, from which the full data is reconstructed. The underlying dimensionality reduction is a probabilistic form of Principal Component Analysis (PCA), and it implements a variant of the model proposed in the Journal of the Royal Statistical Society, Series B 21(3), 611–622 by Tipping and Bishop. Compared to other options, such as Multiple Imputation using Chained Equations (MICE), this option has the advantage of not requiring the application of predictors for each column. Instead, it approximates the covariance for the full dataset. Therefore, it might offer better performance for datasets that have missing values in many columns."
upvoted 2 times
...
phdykd
2 years, 2 months ago
A) Probabilistic PCA and C) SMOTE are not data cleaning options in the clean missing data module. Probabilistic PCA is a technique used for dimensionality reduction and feature extraction in machine learning, and it is not specifically designed to handle missing data. SMOTE (Synthetic Minority Over-sampling Technique) is a technique used for dealing with imbalanced datasets in machine learning, and it is not designed to handle missing data. Therefore, the correct answer to the question "..... is a data cleaning option of the clean missing data module that does not require predictors for each column" is either B) Median or D) Custom substitution value.
upvoted 2 times
...
Peeking
2 years, 2 months ago
PCA is wrong.
upvoted 2 times
...
ranjsi01
3 years, 2 months ago
correct
upvoted 3 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago