A machine learning engineer is trying to scale a machine learning pipeline by distributing its feature engineering process. Which of the following feature engineering tasks will be the least efficient to distribute?
A.
One-hot encoding categorical features
B.
Target encoding categorical features
C.
Imputing missing feature values with the mean
D.
Imputing missing feature values with the true median
E.
Creating binary indicator features for missing values
B. Target encoding involves replacing each category of a categorical variable with a statistic related to the target variable (like the mean of the target for that category).
would argue that the answer is b - Target encoding (also known as mean encoding) involves replacing each category in a categorical feature with the mean of the target variable for that category. This process is more complex and challenging to distribute efficiently because it requires calculating and applying the mean target value for each category.
upvoted 3 times
...
Log in to ExamTopics
Sign in:
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one.
So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
ricorosol
1 month, 3 weeks agoEricP99
5 months, 1 week ago