Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Machine Learning Associate All Questions

View all questions & answers for the Certified Machine Learning Associate exam

Exam Certified Machine Learning Associate topic 1 question 7 discussion

Actual exam question from Databricks's Certified Machine Learning Associate
Question #: 7
Topic #: 1
[All Certified Machine Learning Associate Questions]

An organization is developing a feature repository and is electing to one-hot encode all categorical feature variables. A data scientist suggests that the categorical feature variables should not be one-hot encoded within the feature repository.
Which of the following explanations justifies this suggestion?

  • A. One-hot encoding is not supported by most machine learning libraries.
  • B. One-hot encoding is dependent on the target variable’s values which differ for each application.
  • C. One-hot encoding is computationally intensive and should only be performed on small samples of training sets for individual machine learning problems.
  • D. One-hot encoding is not a common strategy for representing categorical feature variables numerically.
  • E. One-hot encoding is a potentially problematic categorical variable strategy for some machine learning algorithms.
Show Suggested Answer Hide Answer
Suggested Answer: E 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
jaydip650
1 month ago
. One-hot encoding is dependent on the target variable’s values which differ for each application. This one doesn't hold up because one-hot encoding isn't dependent on the target variable at all. It only represents categorical features and their individual categories as binary vectors. E. One-hot encoding is a potentially problematic categorical variable strategy for some machine learning algorithms. Certain algorithms, like tree-based models, are less sensitive to the one-hot encoding and might perform better with other encoding techniques. However, the primary issue often boils down to the high dimensionality that one-hot encoding can introduce, which can affect algorithms that don't handle sparse data well. So, you're still left with C as the best justification.
upvoted 2 times
...
8605246
5 months, 1 week ago
It might actually be E. According to these docs, this is the reason why the change was introduced was to allow algorithms that expect continuous features, such as logistic regression to use categorical features
upvoted 3 times
...
EricP99
5 months, 1 week ago
Correct answer B
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...