Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Machine Learning Associate All Questions

View all questions & answers for the Certified Machine Learning Associate exam

Exam Certified Machine Learning Associate topic 1 question 20 discussion

Actual exam question from Databricks's Certified Machine Learning Associate
Question #: 20
Topic #: 1
[All Certified Machine Learning Associate Questions]

A data scientist is using Spark ML to engineer features for an exploratory machine learning project.
They decide they want to standardize their features using the following code block:

Upon code review, a colleague expressed concern with the features being standardized prior to splitting the data into a training set and a test set.
Which of the following changes can the data scientist make to address the concern?

  • A. Utilize the MinMaxScaler object to standardize the training data according to global minimum and maximum values
  • B. Utilize the MinMaxScaler object to standardize the test data according to global minimum and maximum values
  • C. Utilize a cross-validation process rather than a train-test split process to remove the need for standardizing data
  • D. Utilize the Pipeline API to standardize the training data according to the test data's summary statistics
  • E. Utilize the Pipeline API to standardize the test data according to the training data's summary statistics
Show Suggested Answer Hide Answer
Suggested Answer: E 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
2d84e25
4 months, 2 weeks ago
The concern raised by the colleague is valid. Standardizing the entire dataset before splitting into training and test sets can cause data leakage, where information from the test set influences the training process. To avoid this, the data should be standardized based on the training set statistics only, and then those statistics should be applied to the test set.
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...