exam questions

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 156 discussion

A company supplies wholesale clothing to thousands of retail stores. A data scientist must create a model that predicts the daily sales volume for each item for each store. The data scientist discovers that more than half of the stores have been in business for less than 6 months. Sales data is highly consistent from week to week. Daily data from the database has been aggregated weekly, and weeks with no sales are omitted from the current dataset. Five years (100 MB) of sales data is available in Amazon S3.
Which factors will adversely impact the performance of the forecast model to be developed, and which actions should the data scientist take to mitigate them?
(Choose two.)

  • A. Detecting seasonality for the majority of stores will be an issue. Request categorical data to relate new stores with similar stores that have more historical data.
  • B. The sales data does not have enough variance. Request external sales data from other industries to improve the model's ability to generalize.
  • C. Sales data is aggregated by week. Request daily sales data from the source database to enable building a daily model.
  • D. The sales data is missing zero entries for item sales. Request that item sales data from the source database include zero entries to enable building the model.
  • E. Only 100 MB of sales data is available in Amazon S3. Request 10 years of sales data, which would provide 200 MB of training data for the model.
Show Suggested Answer Hide Answer
Suggested Answer: AC 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
cron0001
Highly Voted 3 years ago
Selected Answer: AC
AC would be my answer. As half the stores have only been open for 6 months, no seasonality would be captured. The aggregation of the daily also removes trends we see during the week which is also not great when we are looking for the daily predicated sales figure
upvoted 30 times
rb39
2 years, 7 months ago
B - no reason to assume there is not enough variance D - missing data can be assumed to be 0, no need to ask for empty data E - no reason to ask for two years of data having one already
upvoted 12 times
...
...
MultiCloudIronMan
Most Recent 6 months ago
Selected Answer: CD
Missing data and achieving daily predictions with weekly data will be issues.
upvoted 1 times
...
Antoh1978
10 months, 2 weeks ago
Selected Answer: AD
I would go for AD A : Many stores have been in business for < 6 months --> unable to capture seasonality D : Zero sales are also sales records and will result in bias if omitted.
upvoted 2 times
...
vkbajoria
1 year ago
Selected Answer: AD
Since half of the stores are 6 months old seasonality would be a problem for them. instead of omitting weeks with no sales could lead to bias, requesting zero entries will help in predicting better
upvoted 1 times
vkbajoria
1 year ago
I changed my mind. It should be C and D. Since both of them foundation aspect of training.
upvoted 1 times
...
...
Stokvisss
1 year, 1 month ago
Selected Answer: AD
A as missing seasonality is an issue for the majority of the stores. D as we need to impute zeros as we would otherwise miss data. C won't do anything on performance.
upvoted 2 times
...
kyuhuck
1 year, 2 months ago
Selected Answer: CD
The factors that will adversely impact the performance of the forecast model are: Sales data is aggregated by week. This will reduce the granularity and resolution of the data, and make it harder to capture the daily patterns and variations in sales volume. The data scientist should request daily sales data from the source database to enable building a daily model, which will be more accurate and useful for the prediction task. Sales data is missing zero entries for item sales. This will introduce bias and incompleteness in the data, and make it difficult to account for the items that have no demand or are out of stock. The data scientist should request that item sales data from the source database include zero entries to enable building the model, which will be more robust and realistic
upvoted 2 times
...
CloudHandsOn
1 year, 3 months ago
Selected Answer: CD
C. Aggregated Weekly Data: Since the objective is to predict daily sales volume, weekly aggregated data might mask important daily trends and variations. Requesting daily sales data will provide a finer granularity of information that is crucial for building an accurate daily sales prediction model. D. Missing Zero Entries for Item Sales: The omission of weeks with no sales can lead to biased predictions, as the model might not correctly account for periods of no sales. Including zero entries for item sales would provide a more accurate representation of sales patterns, including the absence of sales, which is valuable information for the model. Based on this analysis, the factors that would most adversely impact the model's performance are the aggregated weekly data (Option C) and the omission of weeks with no sales (Option D).
upvoted 2 times
...
endeesa
1 year, 4 months ago
Selected Answer: AC
A - six months is likely not enough to detect clear seasonality C - Can do weekly from daily but cant reliably do daily from weekly
upvoted 1 times
...
kaike_reis
1 year, 8 months ago
Selected Answer: AC
Letters A and C are correct: we want to do a daily model (our base is on weeks) and we need to deal with new stores VS old stores. It is important to emphasize that the letter D also makes sense: we need to know the days when there were no sales, however the way it is written means saving lines (days of sales) with zero in the database, which is not practical.
upvoted 1 times
...
Mickey321
1 year, 9 months ago
Selected Answer: AC
the two factors that will adversely impact the forecast model's performance are seasonality detection for new stores and the aggregation of sales data on a weekly basis. The data scientist should request categorical data to relate new stores with historical data and request daily sales data from the source database to build a daily model, respectively, to mitigate these issues effectively.
upvoted 2 times
...
SRB1337
1 year, 10 months ago
Selected Answer: AD
AD. rest makes no sense.
upvoted 1 times
...
AjoseO
2 years, 2 months ago
Selected Answer: AC
A. Since more than half of the stores have been in business for less than 6 months, it will be challenging to detect seasonality patterns for these new stores. Therefore, one solution is to request categorical data to relate new stores with similar stores that have more historical data. This will help the model to identify common patterns and accurately forecast sales for new stores. C. Since the sales data is aggregated by week, it may not be possible to identify daily patterns or trends. Hence, one solution is to request daily sales data from the source database to enable building a daily model. This will help the model to identify daily patterns and improve its forecasting accuracy.
upvoted 2 times
...
peterfish
2 years, 9 months ago
Selected Answer: CD
I go with CD. How could we ignore the days with 0 sales? The model should be trained so that it can predict 0 sales days as well.
upvoted 4 times
...
KlaudYu
2 years, 10 months ago
B, C, D are possible. A couldn't be an answer because the model must predict daily sales volumes while A says 'Request categorical data'.
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago