exam questions

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 108 discussion

A data scientist is developing a pipeline to ingest streaming web traffic data. The data scientist needs to implement a process to identify unusual web traffic patterns as part of the pipeline. The patterns will be used downstream for alerting and incident response. The data scientist has access to unlabeled historic data to use, if needed.
The solution needs to do the following:
✑ Calculate an anomaly score for each web traffic entry.
Adapt unusual event identification to changing web patterns over time.

Which approach should the data scientist implement to meet these requirements?

  • A. Use historic web traffic data to train an anomaly detection model using the Amazon SageMaker Random Cut Forest (RCF) built-in model. Use an Amazon Kinesis Data Stream to process the incoming web traffic data. Attach a preprocessing AWS Lambda function to perform data enrichment by calling the RCF model to calculate the anomaly score for each record.
  • B. Use historic web traffic data to train an anomaly detection model using the Amazon SageMaker built-in XGBoost model. Use an Amazon Kinesis Data Stream to process the incoming web traffic data. Attach a preprocessing AWS Lambda function to perform data enrichment by calling the XGBoost model to calculate the anomaly score for each record.
  • C. Collect the streaming data using Amazon Kinesis Data Firehose. Map the delivery stream as an input source for Amazon Kinesis Data Analytics. Write a SQL query to run in real time against the streaming data with the k-Nearest Neighbors (kNN) SQL extension to calculate anomaly scores for each record using a tumbling window.
  • D. Collect the streaming data using Amazon Kinesis Data Firehose. Map the delivery stream as an input source for Amazon Kinesis Data Analytics. Write a SQL query to run in real time against the streaming data with the Amazon Random Cut Forest (RCF) SQL extension to calculate anomaly scores for each record using a sliding window.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
jiadong
Highly Voted 2 years, 7 months ago
I think the answer is D - RCF works together with Data Analytics, and sliding window helped on new information
upvoted 24 times
SophieSu
2 years, 6 months ago
better to say "RCF is a built-in algorithm/function in Kinesis Data Analytics"
upvoted 3 times
...
...
Mickey321
Most Recent 8 months ago
Selected Answer: D
D uses the built-in RCF algorithm, which is designed for anomaly detection on streaming data and can adapt to changing patterns over time.It does not require any training data or preprocessing steps, as the RCF algorithm can learn from the streaming data directly. It uses a sliding window, which allows for continuous updating of the anomaly scores based on the most recent data points. It leverages the Amazon Kinesis Data Analytics service, which provides a scalable and managed platform for running SQL queries on streaming data. Option A requires training an RCF model on historic data, which may not reflect the current web traffic patterns. It also adds complexity and latency by invoking a Lambda function for each record.
upvoted 4 times
...
Mickey321
8 months ago
Selected Answer: D
Answer D
upvoted 1 times
...
kaike_reis
8 months, 3 weeks ago
Selected Answer: D
Letra B está descartada, pois trás um modelo supervisionado de classificação para um problema não supervisionado. Letra C trás outro modelo que não é recomendado também, em comparação ao RCF. A solução mais fácil de implementar e que atinge os critérios pedidos é a Letra D. Letra A está errada, pois usamos KDS para ingestão apenas.
upvoted 1 times
...
ccpmad
8 months, 3 weeks ago
Selected Answer: D
the data scientist needs to identify unusual web traffic patterns in real-time and adapt to changing web patterns over time. Amazon Kinesis Data Analytics provides real-time analytics capabilities on streaming data. The Amazon Random Cut Forest (RCF) SQL extension is designed for anomaly detection in streaming data, which fits the requirement to calculate an anomaly score for each web traffic entry.
upvoted 2 times
...
Sidekick
1 year, 8 months ago
Answer is D "The algorithm starts developing the machine learning model using current records in the stream when you start the application. The algorithm does not use older records in the stream for machine learning, nor does it use statistics from previous executions of the application." https://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/sqlrf-random-cut-forest.html
upvoted 4 times
...
apprehensive_scar
2 years, 2 months ago
Selected Answer: D
D it is. easy one
upvoted 2 times
...
vetaal
2 years, 2 months ago
Selected Answer: D
RCF is dynamic and adapts with time. D seems more appropriate.
upvoted 3 times
...
hess
2 years, 3 months ago
It is A. The only way to handle the historic data is using sagemaker and you can preprocess a data stream using a lambda.
upvoted 2 times
ZSun
1 year ago
But, the question does not require using historical data. BTW, it only has unlabled historic data, and unlabled data is not really useful training a detection model.
upvoted 1 times
...
...
AMEJack
2 years, 5 months ago
Definitly D, Data Anaytics is using RCF, Using window for selecting data with SQL
upvoted 1 times
...
Huy
2 years, 5 months ago
One more reason to select D, not A, is there is no Lambda function to preprocess record in Kinesis Data Stream.
upvoted 1 times
DimLam
6 months, 1 week ago
That's not true: https://docs.aws.amazon.com/kinesisanalytics/latest/dev/lambda-preprocessing.html
upvoted 1 times
...
...
gbrnq
2 years, 5 months ago
“ Adapt unusual event identification to changing web patterns over time.” -> option A does not satisfy this, only mentions build the model once
upvoted 2 times
...
randomnamer
2 years, 5 months ago
The data scientist has access to unlabeled historic data to use, if needed. D has no mention of this. Also, A says the lambda function provides data enrichment. For me it's A.
upvoted 4 times
...
seanLu
2 years, 6 months ago
A and D both seems to works. But A does not satisfy requirement 2, adapt to patterns over time. Since the model is only trained on old data. So D may be better.
upvoted 3 times
...
astonm13
2 years, 6 months ago
It is definitely D
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago