Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam AWS Certified Solutions Architect - Associate SAA-C03 All Questions

View all questions & answers for the AWS Certified Solutions Architect - Associate SAA-C03 exam

Exam AWS Certified Solutions Architect - Associate SAA-C03 topic 1 question 59 discussion

A company hosts more than 300 global websites and applications. The company requires a platform to analyze more than 30 TB of clickstream data each day.
What should a solutions architect do to transmit and process the clickstream data?

  • A. Design an AWS Data Pipeline to archive the data to an Amazon S3 bucket and run an Amazon EMR cluster with the data to generate analytics.
  • B. Create an Auto Scaling group of Amazon EC2 instances to process the data and send it to an Amazon S3 data lake for Amazon Redshift to use for analysis.
  • C. Cache the data to Amazon CloudFront. Store the data in an Amazon S3 bucket. When an object is added to the S3 bucket. run an AWS Lambda function to process the data for analysis.
  • D. Collect the data from Amazon Kinesis Data Streams. Use Amazon Kinesis Data Firehose to transmit the data to an Amazon S3 data lake. Load the data in Amazon Redshift for analysis.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
Buruguduystunstugudunstuy
Highly Voted 1 year, 9 months ago
Selected Answer: D
Option D is the most appropriate solution for transmitting and processing the clickstream data in this scenario. Amazon Kinesis Data Streams is a highly scalable and durable service that enables real-time processing of streaming data at a high volume and high rate. You can use Kinesis Data Streams to collect and process the clickstream data in real-time. Amazon Kinesis Data Firehose is a fully managed service that loads streaming data into data stores and analytics tools. You can use Kinesis Data Firehose to transmit the data from Kinesis Data Streams to an Amazon S3 data lake. Once the data is in the data lake, you can use Amazon Redshift to load the data and perform analysis on it. Amazon Redshift is a fully managed, petabyte-scale data warehouse service that allows you to quickly and efficiently analyze data using SQL and your existing business intelligence tools.
upvoted 35 times
Buruguduystunstugudunstuy
1 year, 9 months ago
Option A, which involves using AWS Data Pipeline to archive the data to an Amazon S3 bucket and running an Amazon EMR cluster with the data to generate analytics, is not the most appropriate solution because it does not involve real-time processing of the data. Option B, which involves creating an Auto Scaling group of Amazon EC2 instances to process the data and sending it to an Amazon S3 data lake for Amazon Redshift to use for analysis, is not the most appropriate solution because it does not involve a fully managed service for transmitting the data from the processing layer to the data lake. Option C, which involves caching the data to Amazon CloudFront, storing the data in an Amazon S3 bucket, and running an AWS Lambda function to process the data for analysis when an object is added to the S3 bucket, is not the most appropriate solution because it does not involve a scalable and durable service for collecting and processing the data in real-time.
upvoted 9 times
MutiverseAgent
1 year, 2 months ago
The question does not say that real-time is needed here
upvoted 3 times
pentium75
9 months ago
Question asks how to "transmit and process the clickstream data", NOT how to analyze it. Thus D.
upvoted 1 times
...
...
...
...
ArielSchivo
Highly Voted 1 year, 11 months ago
Selected Answer: D
Option D. https://aws.amazon.com/es/blogs/big-data/real-time-analytics-with-amazon-redshift-streaming-ingestion/
upvoted 16 times
RBSK
1 year, 9 months ago
Unsure if this is right URL for this scenario. Option D is referring to S3 and then Redshift. Whereas URL discuss about eliminating S3 :- We’re excited to launch Amazon Redshift streaming ingestion for Amazon Kinesis Data Streams, which enables you to ingest data directly from the Kinesis data stream without having to stage the data in Amazon Simple Storage Service (Amazon S3). Streaming ingestion allows you to achieve low latency in the order of seconds while ingesting hundreds of megabytes of data into your Amazon Redshift cluster.
upvoted 4 times
...
...
PaulGa
Most Recent 1 week, 2 days ago
Selected Answer: D
Ans D - using Kinesis Streams / Firehouse (data in/out) is fast and reliable. Using Redshift allows all sorts of permutations of data analyses and interfacing to user apps
upvoted 1 times
...
effiecancode
2 months, 2 weeks ago
D is the best option
upvoted 1 times
...
awsgeek75
8 months, 1 week ago
Selected Answer: D
A: Not sure how recent this question is but Data Pipeline is not really a product AWS is recommending anymore https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/what-is-datapipeline.html B: 30TB of clickstream data could be done with EC2 but it would be challenging C: CloudFront is for CDN and caching and mostly outgoing data, not incoming. D: Kinesis, S3 data lake and Redshift will work perfectly for this case
upvoted 3 times
...
clumsyninja4life
9 months ago
Selected Answer: A
The answer should be A. Clickstream does not mean real time, it just means they capture user interactions on the web page. Kinesis data streaming is not required. Furthermore, redshift is a data warehousing solution, it cant run complex analysis as well as EMR. My vote goes for A
upvoted 1 times
pentium75
9 months ago
Question asks how to "transmit and process the clickstream data", NOT how to analyze it. Also question does NOT ask how to archive the data (as is mentioned in A). Thus D.
upvoted 1 times
...
...
Reckless_Jas
1 year, 1 month ago
when you see clickstream data, think about Kinesis Data Stream
upvoted 6 times
...
Guru4Cloud
1 year, 1 month ago
Selected Answer: D
The key reasons are: Kinesis Data Streams can continuously capture and ingest high volumes of clickstream data in real-time. This handles the large 30TB daily data intake. Kinesis Firehose can automatically load the streaming data into S3. This creates a data lake for further analysis. Firehose can transform and analyze the data in flight before loading to S3 using Lambda. This enables real-time processing. The data in S3 can be easily loaded into Amazon Redshift for interactive analysis at scale. Kinesis auto scales to handle the high data volumes. Minimal effort is needed for infrastructure management.
upvoted 2 times
...
miki111
1 year, 2 months ago
Option D is the correct answer
upvoted 2 times
...
cookieMr
1 year, 3 months ago
Selected Answer: D
A. This option utilizes S3 for data storage and EMR for analytics, Data Pipeline is not ideal service for real-time streaming data ingestion and processing. It is better suited for batch processing scenarios. B. This option involves managing and scaling EC2, which adds operational overhead. It is also not real-time streaming solution. Additionally, use of Redshift for analyzing clickstream data might not be most efficient or cost-effective approach. C. CloudFront is CDN service and is not designed for real-time data processing or analytics. While using Lambda to process data can be an option, it may not be most efficient solution for processing large volumes of clickstream data. Therefore, collecting the data from Kinesis Data Streams, using Kinesis Data Firehose to transmit it to S3 data lake, and loading it into Redshift for analysis is the recommended approach. This combination provides scalable, real-time streaming solution with storage and analytics capabilities that can handle high volume of clickstream data.
upvoted 2 times
...
Rahulbit34
1 year, 4 months ago
Clickstream is the key - Answer is D
upvoted 1 times
...
PaoloRoma
1 year, 6 months ago
Selected Answer: A
I am going to be unpopular here and I'll go for A). Even if here are other services that offer a better experience, data Pipeline can do the job here. "you can use AWS Data Pipeline to archive your web server's logs to Amazon Simple Storage Service (Amazon S3) each day and then run a weekly Amazon EMR (Amazon EMR) cluster over those logs to generate traffic reports" https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/what-is-datapipeline.html In the question there is no specific timing requirement for analytics. Also the EMR cluster job can be scheduled be executed daily. Option D) is a valid answer too, however with Amazon Redshift Streaming Ingestion "you can connect to Amazon Kinesis Data Streams data streams and pull data directly to Amazon Redshift without staging data in S3" https://aws.amazon.com/redshift/redshift-streaming-ingestion. So in this scenario Kinesis Data Firehose and S3 are redundant.
upvoted 6 times
MutiverseAgent
1 year, 2 months ago
I think I agree with you, I does not make sense in option D) using Amazon Kinesis Data Firehose to transmit the data to an Amazon S3 data lake and then to Redshift, as you can send directly the data from Firehose to Redshift.
upvoted 2 times
juanrasus2
11 months, 1 week ago
Also the Kinesis family is related to real time or near real time services. This is not a requirement at all. We have to process data daily, but not need to do it in real time
upvoted 2 times
...
...
pentium75
9 months ago
Question asks how to "transmit and process the clickstream data", NOT how to analyze it. This picture shows exactly scenario D: Producer - Kinesis - Intermediate S3 bucket - Redshift https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2020/07/30/StreamTransformAnalyzeKinesisLambdaRedshift1.png
upvoted 1 times
...
...
career360guru
1 year, 9 months ago
Selected Answer: D
Option D
upvoted 1 times
...
studis
1 year, 9 months ago
It is C. The image in here https://aws.amazon.com/kinesis/data-firehose/ shows how kinesis can send data collected to firehose who can send it to Redshift. It is also possible to use an intermediary S3 bucket between firehose and redshift. See image in here https://aws.amazon.com/blogs/big-data/stream-transform-and-analyze-xml-data-in-real-time-with-amazon-kinesis-aws-lambda-and-amazon-redshift/
upvoted 1 times
pentium75
9 months ago
Makes sense, but this is D, not C
upvoted 1 times
...
...
sebasta
1 year, 9 months ago
Why not A? You can collect data with AWS Data Pipeline and then analyze it with EMR. Whats wrong with this option?
upvoted 4 times
bearcandy
1 year, 9 months ago
It's not A, the wording is tricky! It says "to archive the data to S3" - there is no mention of archiving in the question, so it has to be D :)
upvoted 3 times
pentium75
9 months ago
And, the the question is not asking about analyzing the data at all, just about "transmitting and processing".
upvoted 1 times
...
...
...
Wpcorgan
1 year, 10 months ago
D is correct
upvoted 1 times
...
PS_R
1 year, 10 months ago
Click Stream & Analyse/ process- Think KDS,
upvoted 3 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...