Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 108 discussion

Exam question from Amazon's AWS Certified Machine Learning - Specialty

Question #: 108
Topic #: 1

[All AWS Certified Machine Learning - Specialty Questions]

A data scientist is developing a pipeline to ingest streaming web traffic data. The data scientist needs to implement a process to identify unusual web traffic patterns as part of the pipeline. The patterns will be used downstream for alerting and incident response. The data scientist has access to unlabeled historic data to use, if needed.
The solution needs to do the following:
✑ Calculate an anomaly score for each web traffic entry.
Adapt unusual event identification to changing web patterns over time.

Which approach should the data scientist implement to meet these requirements?

A. Use historic web traffic data to train an anomaly detection model using the Amazon SageMaker Random Cut Forest (RCF) built-in model. Use an Amazon Kinesis Data Stream to process the incoming web traffic data. Attach a preprocessing AWS Lambda function to perform data enrichment by calling the RCF model to calculate the anomaly score for each record.
B. Use historic web traffic data to train an anomaly detection model using the Amazon SageMaker built-in XGBoost model. Use an Amazon Kinesis Data Stream to process the incoming web traffic data. Attach a preprocessing AWS Lambda function to perform data enrichment by calling the XGBoost model to calculate the anomaly score for each record.
C. Collect the streaming data using Amazon Kinesis Data Firehose. Map the delivery stream as an input source for Amazon Kinesis Data Analytics. Write a SQL query to run in real time against the streaming data with the k-Nearest Neighbors (kNN) SQL extension to calculate anomaly scores for each record using a tumbling window.
D. Collect the streaming data using Amazon Kinesis Data Firehose. Map the delivery stream as an input source for Amazon Kinesis Data Analytics. Write a SQL query to run in real time against the streaming data with the Amazon Random Cut Forest (RCF) SQL extension to calculate anomaly scores for each record using a sliding window.

Show Suggested Answer

Suggested Answer: D 🗳️

by jiadong at Feb. 12, 2021, 11:08 a.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

jiadong

Highly Voted 2 years, 9 months ago

I think the answer is D - RCF works together with Data Analytics, and sliding window helped on new information

upvoted 24 times

SophieSu

2 years, 9 months ago

better to say "RCF is a built-in algorithm/function in Kinesis Data Analytics"

upvoted 3 times

...

Mickey321

Most Recent 10 months, 3 weeks ago

Selected Answer: D

D uses the built-in RCF algorithm, which is designed for anomaly detection on streaming data and can adapt to changing patterns over time.It does not require any training data or preprocessing steps, as the RCF algorithm can learn from the streaming data directly. It uses a sliding window, which allows for continuous updating of the anomaly scores based on the most recent data points. It leverages the Amazon Kinesis Data Analytics service, which provides a scalable and managed platform for running SQL queries on streaming data. Option A requires training an RCF model on historic data, which may not reflect the current web traffic patterns. It also adds complexity and latency by invoking a Lambda function for each record.

upvoted 4 times

...

Mickey321

10 months, 3 weeks ago

Selected Answer: D

Answer D

upvoted 1 times

...

kaike_reis

11 months, 2 weeks ago

Selected Answer: D

Letra B está descartada, pois trás um modelo supervisionado de classificação para um problema não supervisionado. Letra C trás outro modelo que não é recomendado também, em comparação ao RCF. A solução mais fácil de implementar e que atinge os critérios pedidos é a Letra D. Letra A está errada, pois usamos KDS para ingestão apenas.

upvoted 1 times

...

ccpmad

11 months, 2 weeks ago

Selected Answer: D

the data scientist needs to identify unusual web traffic patterns in real-time and adapt to changing web patterns over time. Amazon Kinesis Data Analytics provides real-time analytics capabilities on streaming data. The Amazon Random Cut Forest (RCF) SQL extension is designed for anomaly detection in streaming data, which fits the requirement to calculate an anomaly score for each web traffic entry.

upvoted 2 times

...

Sidekick

1 year, 11 months ago

Answer is D "The algorithm starts developing the machine learning model using current records in the stream when you start the application. The algorithm does not use older records in the stream for machine learning, nor does it use statistics from previous executions of the application." https://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/sqlrf-random-cut-forest.html

upvoted 4 times

...

apprehensive_scar

2 years, 5 months ago

Selected Answer: D

D it is. easy one

upvoted 2 times

...

vetaal

2 years, 5 months ago

Selected Answer: D

RCF is dynamic and adapts with time. D seems more appropriate.

upvoted 3 times

...

hess

2 years, 6 months ago

It is A. The only way to handle the historic data is using sagemaker and you can preprocess a data stream using a lambda.

upvoted 2 times

ZSun

1 year, 2 months ago

But, the question does not require using historical data. BTW, it only has unlabled historic data, and unlabled data is not really useful training a detection model.

upvoted 1 times

...

AMEJack

2 years, 8 months ago

Definitly D, Data Anaytics is using RCF, Using window for selecting data with SQL

upvoted 1 times

...

Huy

2 years, 8 months ago

One more reason to select D, not A, is there is no Lambda function to preprocess record in Kinesis Data Stream.

upvoted 1 times

DimLam

9 months ago

That's not true: https://docs.aws.amazon.com/kinesisanalytics/latest/dev/lambda-preprocessing.html

upvoted 1 times

...

gbrnq

2 years, 8 months ago

“ Adapt unusual event identification to changing web patterns over time.” -> option A does not satisfy this, only mentions build the model once

upvoted 2 times

...

randomnamer

2 years, 8 months ago

The data scientist has access to unlabeled historic data to use, if needed. D has no mention of this. Also, A says the lambda function provides data enrichment. For me it's A.

upvoted 4 times

...

seanLu

2 years, 8 months ago

A and D both seems to works. But A does not satisfy requirement 2, adapt to patterns over time. Since the model is only trained on old data. So D may be better.

upvoted 3 times

...

astonm13

2 years, 9 months ago

It is definitely D

upvoted 1 times

...