exam questions

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 163 discussion

A machine learning specialist is developing a regression model to predict rental rates from rental listings. A variable named Wall_Color represents the most prominent exterior wall color of the property. The following is the sample data, excluding all other variables:

The specialist chose a model that needs numerical input data.
Which feature engineering approaches should the specialist use to allow the regression model to learn from the Wall_Color data? (Choose two.)

  • A. Apply integer transformation and set Red = 1, White = 5, and Green = 10.
  • B. Add new columns that store one-hot representation of colors.
  • C. Replace the color name string by its length.
  • D. Create three columns to encode the color in RGB format.
  • E. Replace each color name by its training set frequency.
Show Suggested Answer Hide Answer
Suggested Answer: BE 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
edvardo
Highly Voted 2 years, 11 months ago
B, and E (frequency encoding)
upvoted 13 times
...
bluer1
Highly Voted 2 years, 11 months ago
BD? any thought?
upvoted 8 times
vanluigi
2 years, 11 months ago
I think D cannot be because distances in RGB format are not representive of points. CIELAB correlates numerical color values consistently with human visual perception.
upvoted 1 times
...
...
MultiCloudIronMan
Most Recent 6 months, 1 week ago
Selected Answer: BD
These methods ensure that the color data is represented numerically while preserving the information’s integrity and relevance for the regression model
upvoted 2 times
...
pandkast
10 months, 1 week ago
Selected Answer: BD
Using frequency encoding may help in some contexts but can introduce bias, especially if the frequency of a color is not related to the rental rate. This method does not leverage the actual differences between colors.
upvoted 1 times
...
kyuhuck
1 year, 2 months ago
Selected Answer: BD
In this scenario, the specialist should use one-hot encoding and RGB encoding to allow the regression model to learn from the Wall_Color data. One-hot encoding is a technique used to convert categorical data into numerical data. It creates new columns that store one-hot representation of colors. For example, a variable named color has three categories: red, green, and blue. After one-hot encoding RGB encoding can capture the intensity and hue of a color, but it may also introduce correlation among the three columns. Therefore, using both one-hot encoding and RGB encoding can providemore information to the regression model than using either one alone.
upvoted 2 times
...
kaike_reis
1 year, 8 months ago
Selected Answer: BE
Here we have a non-ordinal categorical variable to receive a numerical conversion for the model. Letter A is wrong as it is not an ordinal variable. Letter C is wrong as we are not going to retain any significant information for the model. The best solutions would be: Letter B and E. Letter D would be very interesting, but it would generate a problem of information fragmentation: most models consider the variables as being independent of each other, and these 3 columns by definition would not be independent.
upvoted 3 times
...
Mickey321
1 year, 8 months ago
Selected Answer: BE
B, and E (frequency encoding)
upvoted 2 times
...
ADVIT
1 year, 9 months ago
A+B make sense to me
upvoted 1 times
...
WilianCB
1 year, 12 months ago
Selected Answer: BD
B for sure D. This approach involves breaking down each color into its Red, Green, and Blue components and creating separate columns for each component. This allows the model to capture the information about the intensity of each color component, which can be useful in predicting the target variable. A, C, and E are not suitable for encoding color data in a way that can be used by a regression model. The integer transformation approach in option A arbitrarily assigns values to colors without any meaningful relationship between them. The approach in option C replaces the color names with their length, which does not provide any useful information for the model. Option E replaces each color name with its frequency in the training set, which does not capture any information about the color itself.
upvoted 1 times
...
alp_ileri
2 years, 1 month ago
I think frequency encoding cannot be. What if some colors have same amount of frequency?
upvoted 7 times
...
AjoseO
2 years, 2 months ago
Selected Answer: BE
B. Add new columns that store one-hot representation of colors. One-hot encoding is a common approach to represent categorical variables as numerical values. This approach creates new binary variables for each category and assigns a value of 1 to the corresponding category and 0 to the others. In this case, the specialist can create three new binary variables, one for each color (Red, White, and Green) and use them as input to the regression model. E. Replace each color name by its training set frequency. Another approach to convert categorical variables into numerical ones is to replace each category with its frequency of occurrence in the training set. In this case, the specialist can replace the color names with their respective frequencies (1/3 for Red, 1/3 for White, and 1/3 for Green) to represent them numerically.
upvoted 7 times
AjoseO
2 years, 2 months ago
Frequency encoding is a feature engineering technique used to convert categorical variables into numerical ones by replacing each category with the frequency of its occurrence in the training set. This approach can be useful when dealing with high-cardinality categorical variables, which are categorical variables with a large number of distinct categories.
upvoted 5 times
...
...
maxkm
2 years, 2 months ago
Selected Answer: BD
These are the only options preserving what "color" is. One-hot encoding is a default standard for any categorical data to be fed to a model that takes in numeric input. RGB format is a good numeric representation of any color by preserving its nature
upvoted 2 times
...
Sonoko
2 years, 4 months ago
Selected Answer: AB
A&B https://victorzhou.com/blog/one-hot/#:~:text=One-Hot%20Encoding%20takes%20a%20single%20integer%20and%20produces,of%20colors%20are%20possible%3A%20red%2C%20green%2C%20or%20blue.
upvoted 2 times
wolfsong
2 years, 2 months ago
B & E. It cannot be A because your URL specifically states that: "This is known as integer encoding. For Machine Learning, this encoding can be problematic - in this example, we’re essentially saying “green” is the average of “red” and “blue”, which can lead to weird unexpected outcomes."
upvoted 3 times
...
...
ovokpus
2 years, 10 months ago
Selected Answer: BE
Frequency encoding
upvoted 3 times
...
[Removed]
2 years, 10 months ago
B and E
upvoted 3 times
...
tgaos
2 years, 10 months ago
BE is correct. For e, please refer: https://medium.com/analytics-vidhya/different-type-of-feature-engineering-encoding-techniques-for-categorical-variable-encoding-214363a016fb#:~:text=One%20Hot%20Encoding%3A%20%E2%80%94%20In%20this,slows%20down%20the%20learning%20significantly.
upvoted 5 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago