Exam AWS Certified Solutions Architect - Associate SAA-C03 All Questions

View all questions & answers for the AWS Certified Solutions Architect - Associate SAA-C03 exam

Exam AWS Certified Solutions Architect - Associate SAA-C03 topic 1 question 59 discussion

Exam question from Amazon's AWS Certified Solutions Architect - Associate SAA-C03

Question #: 59
Topic #: 1

[All AWS Certified Solutions Architect - Associate SAA-C03 Questions]

A company hosts more than 300 global websites and applications. The company requires a platform to analyze more than 30 TB of clickstream data each day.
What should a solutions architect do to transmit and process the clickstream data?

A. Design an AWS Data Pipeline to archive the data to an Amazon S3 bucket and run an Amazon EMR cluster with the data to generate analytics.
B. Create an Auto Scaling group of Amazon EC2 instances to process the data and send it to an Amazon S3 data lake for Amazon Redshift to use for analysis.
C. Cache the data to Amazon CloudFront. Store the data in an Amazon S3 bucket. When an object is added to the S3 bucket. run an AWS Lambda function to process the data for analysis.
D. Collect the data from Amazon Kinesis Data Streams. Use Amazon Kinesis Data Firehose to transmit the data to an Amazon S3 data lake. Load the data in Amazon Redshift for analysis.

Show Suggested Answer

Suggested Answer: D 🗳️

by ArielSchivo at Oct. 18, 2022, 1:29 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

Buruguduystunstugudunstuy

Highly Voted 2 years, 7 months ago

Selected Answer: D

Option D is the most appropriate solution for transmitting and processing the clickstream data in this scenario. Amazon Kinesis Data Streams is a highly scalable and durable service that enables real-time processing of streaming data at a high volume and high rate. You can use Kinesis Data Streams to collect and process the clickstream data in real-time. Amazon Kinesis Data Firehose is a fully managed service that loads streaming data into data stores and analytics tools. You can use Kinesis Data Firehose to transmit the data from Kinesis Data Streams to an Amazon S3 data lake. Once the data is in the data lake, you can use Amazon Redshift to load the data and perform analysis on it. Amazon Redshift is a fully managed, petabyte-scale data warehouse service that allows you to quickly and efficiently analyze data using SQL and your existing business intelligence tools.

upvoted 40 times

Buruguduystunstugudunstuy

2 years, 7 months ago

Option A, which involves using AWS Data Pipeline to archive the data to an Amazon S3 bucket and running an Amazon EMR cluster with the data to generate analytics, is not the most appropriate solution because it does not involve real-time processing of the data. Option B, which involves creating an Auto Scaling group of Amazon EC2 instances to process the data and sending it to an Amazon S3 data lake for Amazon Redshift to use for analysis, is not the most appropriate solution because it does not involve a fully managed service for transmitting the data from the processing layer to the data lake. Option C, which involves caching the data to Amazon CloudFront, storing the data in an Amazon S3 bucket, and running an AWS Lambda function to process the data for analysis when an object is added to the S3 bucket, is not the most appropriate solution because it does not involve a scalable and durable service for collecting and processing the data in real-time.

upvoted 10 times

MutiverseAgent

2 years ago

The question does not say that real-time is needed here

upvoted 3 times

pentium75

1 year, 6 months ago

Question asks how to "transmit and process the clickstream data", NOT how to analyze it. Thus D.

upvoted 2 times

...

ArielSchivo

Highly Voted 2 years, 9 months ago

Selected Answer: D

Option D. https://aws.amazon.com/es/blogs/big-data/real-time-analytics-with-amazon-redshift-streaming-ingestion/

upvoted 17 times

RBSK

2 years, 7 months ago

Unsure if this is right URL for this scenario. Option D is referring to S3 and then Redshift. Whereas URL discuss about eliminating S3 :- We’re excited to launch Amazon Redshift streaming ingestion for Amazon Kinesis Data Streams, which enables you to ingest data directly from the Kinesis data stream without having to stage the data in Amazon Simple Storage Service (Amazon S3). Streaming ingestion allows you to achieve low latency in the order of seconds while ingesting hundreds of megabytes of data into your Amazon Redshift cluster.

upvoted 5 times

...

satyaammm

Most Recent 6 months, 2 weeks ago

Selected Answer: D

Kinesis data firehouse is the most suitable for streaming data and Redshift is the most suitable for large data sets.

upvoted 1 times

...

PaulGa

10 months, 1 week ago

Selected Answer: D

Ans D - using Kinesis Streams / Firehouse (data in/out) is fast and reliable. Using Redshift allows all sorts of permutations of data analyses and interfacing to user apps

upvoted 2 times

...

effiecancode

1 year ago

D is the best option

upvoted 1 times

...

awsgeek75

1 year, 6 months ago

Selected Answer: D

A: Not sure how recent this question is but Data Pipeline is not really a product AWS is recommending anymore https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/what-is-datapipeline.html B: 30TB of clickstream data could be done with EC2 but it would be challenging C: CloudFront is for CDN and caching and mostly outgoing data, not incoming. D: Kinesis, S3 data lake and Redshift will work perfectly for this case

upvoted 4 times

...

clumsyninja4life

1 year, 6 months ago

Selected Answer: A

The answer should be A. Clickstream does not mean real time, it just means they capture user interactions on the web page. Kinesis data streaming is not required. Furthermore, redshift is a data warehousing solution, it cant run complex analysis as well as EMR. My vote goes for A

upvoted 1 times

pentium75

1 year, 6 months ago

Question asks how to "transmit and process the clickstream data", NOT how to analyze it. Also question does NOT ask how to archive the data (as is mentioned in A). Thus D.

upvoted 1 times

...

Reckless_Jas

1 year, 11 months ago

when you see clickstream data, think about Kinesis Data Stream

upvoted 6 times

...

Guru4Cloud

1 year, 11 months ago

Selected Answer: D

The key reasons are: Kinesis Data Streams can continuously capture and ingest high volumes of clickstream data in real-time. This handles the large 30TB daily data intake. Kinesis Firehose can automatically load the streaming data into S3. This creates a data lake for further analysis. Firehose can transform and analyze the data in flight before loading to S3 using Lambda. This enables real-time processing. The data in S3 can be easily loaded into Amazon Redshift for interactive analysis at scale. Kinesis auto scales to handle the high data volumes. Minimal effort is needed for infrastructure management.

upvoted 2 times

...

miki111

1 year, 12 months ago

Option D is the correct answer

upvoted 2 times

...

cookieMr

2 years ago

Selected Answer: D

A. This option utilizes S3 for data storage and EMR for analytics, Data Pipeline is not ideal service for real-time streaming data ingestion and processing. It is better suited for batch processing scenarios. B. This option involves managing and scaling EC2, which adds operational overhead. It is also not real-time streaming solution. Additionally, use of Redshift for analyzing clickstream data might not be most efficient or cost-effective approach. C. CloudFront is CDN service and is not designed for real-time data processing or analytics. While using Lambda to process data can be an option, it may not be most efficient solution for processing large volumes of clickstream data. Therefore, collecting the data from Kinesis Data Streams, using Kinesis Data Firehose to transmit it to S3 data lake, and loading it into Redshift for analysis is the recommended approach. This combination provides scalable, real-time streaming solution with storage and analytics capabilities that can handle high volume of clickstream data.

upvoted 2 times

...

Rahulbit34

2 years, 2 months ago

Clickstream is the key - Answer is D

upvoted 1 times

...

PaoloRoma

2 years, 3 months ago

Selected Answer: A

I am going to be unpopular here and I'll go for A). Even if here are other services that offer a better experience, data Pipeline can do the job here. "you can use AWS Data Pipeline to archive your web server's logs to Amazon Simple Storage Service (Amazon S3) each day and then run a weekly Amazon EMR (Amazon EMR) cluster over those logs to generate traffic reports" https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/what-is-datapipeline.html In the question there is no specific timing requirement for analytics. Also the EMR cluster job can be scheduled be executed daily. Option D) is a valid answer too, however with Amazon Redshift Streaming Ingestion "you can connect to Amazon Kinesis Data Streams data streams and pull data directly to Amazon Redshift without staging data in S3" https://aws.amazon.com/redshift/redshift-streaming-ingestion. So in this scenario Kinesis Data Firehose and S3 are redundant.

upvoted 6 times

MutiverseAgent

2 years ago

I think I agree with you, I does not make sense in option D) using Amazon Kinesis Data Firehose to transmit the data to an Amazon S3 data lake and then to Redshift, as you can send directly the data from Firehose to Redshift.

upvoted 2 times

juanrasus2

1 year, 9 months ago

Also the Kinesis family is related to real time or near real time services. This is not a requirement at all. We have to process data daily, but not need to do it in real time

upvoted 2 times

...

pentium75

1 year, 6 months ago

Question asks how to "transmit and process the clickstream data", NOT how to analyze it. This picture shows exactly scenario D: Producer - Kinesis - Intermediate S3 bucket - Redshift https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2020/07/30/StreamTransformAnalyzeKinesisLambdaRedshift1.png

upvoted 1 times

...

career360guru

2 years, 7 months ago

Selected Answer: D

Option D

upvoted 1 times

...

studis

2 years, 7 months ago

It is C. The image in here https://aws.amazon.com/kinesis/data-firehose/ shows how kinesis can send data collected to firehose who can send it to Redshift. It is also possible to use an intermediary S3 bucket between firehose and redshift. See image in here https://aws.amazon.com/blogs/big-data/stream-transform-and-analyze-xml-data-in-real-time-with-amazon-kinesis-aws-lambda-and-amazon-redshift/

upvoted 1 times