Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 287 discussion

A data engineer is evaluating customer data in Amazon SageMaker Data Wrangler. The data engineer will use the customer data to create a new model to predict customer behavior.

The engineer needs to increase the model performance by checking for multicollinearity in the dataset.

Which steps can the data engineer take to accomplish this with the LEAST operational effort? (Choose two.)

  • A. Use SageMaker Data Wrangler to refit and transform the dataset by applying one-hot encoding to category-based variables.
  • B. Use SageMaker Data Wrangler diagnostic visualization. Use principal components analysis (PCA) and singular value decomposition (SVD) to calculate singular values.
  • C. Use the SageMaker Data Wrangler Quick Model visualization to quickly evaluate the dataset and to produce importance scores for each feature.
  • D. Use the SageMaker Data Wrangler Min Max Scaler transform to normalize the data.
  • E. Use SageMaker Data Wrangler diagnostic visualization. Use least absolute shrinkage and selection operator (LASSO) to plot coefficient values from a LASSO model that is trained on the dataset.
Show Suggested Answer Hide Answer
Suggested Answer: BE 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
spinatram
1 week, 5 days ago
B,E https://aws.amazon.com/about-aws/whats-new/2021/08/detect-multicollinearity-amazon-sagemaker-data-wrangler/
upvoted 1 times
...
MultiCloudIronMan
1 month, 3 weeks ago
Selected Answer: BE
Use SageMaker Data Wrangler diagnostic visualization. Use principal components analysis (PCA) and singular value decomposition (SVD) to calculate singular values (Option B). PCA and SVD are effective techniques for identifying multicollinearity by reducing the dimensionality of the data and highlighting the relationships between variables1. Use SageMaker Data Wrangler diagnostic visualization. Use least absolute shrinkage and selection operator (LASSO) to plot coefficient values from a LASSO model that is trained on the dataset (Option E). LASSO helps in identifying and mitigating multicollinearity by shrinking some coefficients to zero, effectively selecting a subset of predictors
upvoted 1 times
...
JonSno
6 months, 1 week ago
B. Use SageMaker Data Wrangler diagnostic visualization. Use principal components analysis (PCA) and singular value decomposition (SVD) to calculate singular values. PCA and SVD: These methods help identify multicollinearity by reducing the dataset's dimensionality, revealing relationships among variables. Multicollinear features often become evident through high correlations in principal components or singular values. C. Use the SageMaker Data Wrangler Quick Model visualization to quickly evaluate the dataset and to produce importance scores for each feature. Quick Model Visualization: This feature enables rapid evaluation of feature importance scores, which can help detect multicollinearity by identifying features that may be overly correlated and thus less impactful independently.
upvoted 1 times
...
vkbajoria
8 months, 2 weeks ago
Selected Answer: BE
B and E make sense
upvoted 1 times
...
prash_vz
10 months ago
Selected Answer: BE
https://aws.amazon.com/about-aws/whats-new/2021/08/detect-multicollinearity-amazon-sagemaker-data-wrangler/
upvoted 2 times
...
taustin2
11 months ago
Selected Answer: BE
PCA and SVD calculate singular values, which indicate the contribution of each feature to the overall variance. Features with high singular values have less multicollinearity. LASSO regularization shrinks coefficient values of highly correlated features towards zero, highlighting potential multicollinearity through their relative sizes.
upvoted 2 times
...
aquanaveen
11 months ago
Selected Answer: BD
B. Use SageMaker Data Wrangler diagnostic visualization. Use principal components analysis (PCA) and singular value decomposition (SVD) to calculate singular values. PCA and SVD can help in identifying multicollinearity by analyzing the correlation structure of the variables. High condition numbers or small singular values may indicate multicollinearity issues. D. Use the SageMaker Data Wrangler Min Max Scaler transform to normalize the data. Normalizing the data using techniques like Min-Max scaling can mitigate the impact of multicollinearity. Normalization helps in bringing the features to a similar scale, reducing the sensitivity to differences in magnitudes.
upvoted 1 times
...
xiaoeason
11 months ago
B and E Explanation: Option B: Principal components analysis (PCA) and singular value decomposition (SVD) are techniques used to identify multicollinearity in a dataset. By visualizing the singular values, the data engineer can assess the level of multicollinearity present in the features. This approach is effective for detecting relationships among variables. Option E: LASSO (Least Absolute Shrinkage and Selection Operator) is a regularization technique that can be used to penalize certain coefficients and, in turn, highlight the most important features. By plotting the coefficient values from a LASSO model, the data engineer can identify variables that contribute the most to the model. This can be useful for identifying and mitigating multicollinearity.
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...