exam questions

Exam AWS Certified Solutions Architect - Associate SAA-C03 All Questions

View all questions & answers for the AWS Certified Solutions Architect - Associate SAA-C03 exam

Exam AWS Certified Solutions Architect - Associate SAA-C03 topic 1 question 320 discussion

A company is using a fleet of Amazon EC2 instances to ingest data from on-premises data sources. The data is in JSON format and ingestion rates can be as high as 1 MB/s. When an EC2 instance is rebooted, the data in-flight is lost. The company’s data science team wants to query ingested data in near-real time.

Which solution provides near-real-time data querying that is scalable with minimal data loss?

  • A. Publish data to Amazon Kinesis Data Streams, Use Kinesis Data Analytics to query the data.
  • B. Publish data to Amazon Kinesis Data Firehose with Amazon Redshift as the destination. Use Amazon Redshift to query the data.
  • C. Store ingested data in an EC2 instance store. Publish data to Amazon Kinesis Data Firehose with Amazon S3 as the destination. Use Amazon Athena to query the data.
  • D. Store ingested data in an Amazon Elastic Block Store (Amazon EBS) volume. Publish data to Amazon ElastiCache for Redis. Subscribe to the Redis channel to query the data.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
LuckyAro
Highly Voted 1 year, 11 months ago
Selected Answer: A
A: is the solution for the company's requirements. Publishing data to Amazon Kinesis Data Streams can support ingestion rates as high as 1 MB/s and provide real-time data processing. Kinesis Data Analytics can query the ingested data in real-time with low latency, and the solution can scale as needed to accommodate increases in ingestion rates or querying needs. This solution also ensures minimal data loss in the event of an EC2 instance reboot since Kinesis Data Streams has a persistent data store for up to 7 days by default.
upvoted 14 times
...
bogobob
Highly Voted 1 year, 2 months ago
Selected Answer: B
The fact they specifically mention "near real-time" twice tells me the correct answer is KDF. On top of which its easier to setup and maintain. KDS is really only needed if you need real-time. Also using redshift will mean permanent data retention. The data in A could be lost after a year. Redshift queries are slow but you're still querying near real-time data
upvoted 6 times
Ernestokoro
1 year, 1 month ago
You are very correct. see supporting link https://jayendrapatil.com/aws-kinesis-data-streams-vs-kinesis-firehose/#:~:text=vs%20Kine...-,Purpose,into%20AWS%20products%20for%20processing.
upvoted 2 times
...
...
wwwxxch
Most Recent 3 weeks, 3 days ago
Selected Answer: B
near-real time --> Kinesis Data Firehose And retention day of Kinesis Data Streams cannot be longer than 365 days
upvoted 1 times
...
EllenLiu
1 month ago
Selected Answer: A
A: focus on performing complex data processing without in-flight data lost, not mention data persistence B: focus on data persist for later analysis
upvoted 1 times
...
LeonSauveterre
1 month, 3 weeks ago
Selected Answer: A
Instance store & ElasiCache are all temporary storages, which cannot address data loss. That rules out C & D. B: Kinesis Data Firehose is optimized for batch processing rather than real-time querying. It can indeed deliver data to S3 or Redshift, but there's a good chance the delay between ingestion and query availability cannot meet the "near-real-time" requirement.
upvoted 1 times
...
Lin878
7 months, 2 weeks ago
Selected Answer: B
https://aws.amazon.com/pm/kinesis/?gclid=CjwKCAjwvIWzBhAlEiwAHHWgvRQuJmBubZDnO2GasDWwc2iBapfVD6GBeIgj2JV6qkldm-K_CmMzmxoCdCwQAvD_BwE&trk=ee1218b7-7c10-4762-97df-274836a44566&sc_channel=ps&ef_id=CjwKCAjwvIWzBhAlEiwAHHWgvRQuJmBubZDnO2GasDWwc2iBapfVD6GBeIgj2JV6qkldm-K_CmMzmxoCdCwQAvD_BwE:G:s&s_kwcid=AL!4422!3!651510255264!p!!g!!kinesis%20stream!19836376690!149589222920
upvoted 2 times
...
ray320x
11 months, 3 weeks ago
Option A is actually correct. The question ask for minimal data loss and that query of data should be near real time, not the ingestion. Kinesis data analytics is near real time. Recent changes to Redshift actually make B correct as well, but A is also correct.
upvoted 2 times
dkw2342
10 months, 3 weeks ago
Streaming ingestion provides low-latency, high-speed ingestion of stream data from Amazon Kinesis Data Streams and Amazon Managed Streaming for Apache Kafka into an Amazon Redshift provisioned or Amazon Redshift Serverless materialized view.[1] Option B mentions Kinesis Data Firehose (now just Firehose), so this won't work. Option A is the correct answer. [1]https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-streaming-ingestion.html
upvoted 2 times
...
...
farnamjam
12 months ago
Selected Answer: A
Comparison to other options: B. Kinesis Data Firehose with Redshift: While Redshift is scalable, it doesn't offer real-time querying capabilities. Data needs to be loaded into Redshift from Firehose, introducing latency. C. EC2 instance store with Kinesis Data Firehose and S3: Storing data in an EC2 instance store is not persistent and data will be lost during reboots. EBS volumes are more appropriate for persistent storage, but the architecture becomes more complex. D. EBS volume with ElastiCache and Redis: While ElastiCache offers fast in-memory storage, it's not designed for high-volume data ingestion like 1 MB/s. It might struggle with scalability and persistence.
upvoted 3 times
...
Firdous586
1 year ago
I don't understand why people are giving wrong information in the QUESTION its clearly mentioned near Real Time Kinesis Data Streams is for Real time Where are Kinesis Datafirehose is for Near real time there for answer is B only
upvoted 5 times
...
Marco_St
1 year, 1 month ago
Selected Answer: A
Read the question: near real-time querying of data.... it is more about real-time data query once the data is ingested, It does not mention how long time the data needs to be stored. A is better option. B introduces delay of data buffer before it can be queried in redshift
upvoted 1 times
...
practice_makes_perfect
1 year, 2 months ago
Selected Answer: B
A is not correct because Kinesis can only store data up to 1 year. The solution need to support querying ALL data instead of "recent" data.
upvoted 3 times
pentium75
1 year ago
Says who? They want to "query ingested data in near-real time", it does not say anything about historical data.
upvoted 2 times
...
...
Ruffyit
1 year, 2 months ago
A: is the solution for the company's requirements. Publishing data to Amazon Kinesis Data Streams can support ingestion rates as high as 1 MB/s and provide real-time data processing. Kinesis Data Analytics can query the ingested data in real-time with low latency, and the solution can scale as needed to accommodate increases in ingestion rates or querying needs. This solution also ensures minimal data loss in the event of an EC2 instance reboot since Kinesis Data Streams has a persistent data store for up to 7 days by default.
upvoted 2 times
...
TariqKipkemei
1 year, 3 months ago
Selected Answer: A
Publish data to Amazon Kinesis Data Streams, Use Kinesis Data Analytics to query the data
upvoted 3 times
...
Guru4Cloud
1 year, 4 months ago
Selected Answer: A
• Provide near-real-time data ingestion into Kinesis Data Streams with the ability to handle the 1 MB/s ingestion rate. Data would be stored redundantly across shards. • Enable near-real-time querying of the data using Kinesis Data Analytics. SQL queries can be run directly against the Kinesis data stream. • Minimize data loss since data is replicated across shards. If an EC2 instance is rebooted, the data stream is still accessible. • Scale seamlessly to handle varying ingestion and query rates.
upvoted 4 times
...
Nikki013
1 year, 4 months ago
Selected Answer: A
Answer is A as it will provide a more streamlined solution. Using B (Firehose + Redshift) will involve sending the data to an S3 bucket first and then copying the data to Redshift which will take more time. https://docs.aws.amazon.com/firehose/latest/dev/what-is-this-service.html
upvoted 5 times
...
nublit
1 year, 8 months ago
Selected Answer: B
Amazon Kinesis Data Firehose can deliver data in real-time to Amazon Redshift, making it immediately available for queries. Amazon Redshift, on the other hand, is a powerful data analytics service that allows fast and scalable querying of large volumes of data.
upvoted 2 times
pentium75
1 year ago
Redshift is a Data Warehouse in the first place, but the question says nothing about storing the data. They want to analyze it in near-real time, nobody says they need to store or access or analyze historical data.
upvoted 2 times
...
...
kruasan
1 year, 8 months ago
Selected Answer: A
• Provide near-real-time data ingestion into Kinesis Data Streams with the ability to handle the 1 MB/s ingestion rate. Data would be stored redundantly across shards. • Enable near-real-time querying of the data using Kinesis Data Analytics. SQL queries can be run directly against the Kinesis data stream. • Minimize data loss since data is replicated across shards. If an EC2 instance is rebooted, the data stream is still accessible. • Scale seamlessly to handle varying ingestion and query rates.
upvoted 3 times
kruasan
1 year, 8 months ago
The other options would not fully meet the requirements: B) Kinesis Firehose + Redshift would introduce latency since data must be loaded from Firehose into Redshift before querying. Redshift would lack real-time capabilities. C) An EC2 instance store and Kinesis Firehose to S3 with Athena querying would risk data loss from instance store if an instance reboots. Athena querying data in S3 also lacks real-time capabilities. D) Using EBS storage, Kinesis Firehose to Redis and subscribing to Redis may provide near-real-time ingestion and querying but risks data loss if an EBS volume or EC2 instance fails. Recovery requires re-hydrating data from a backup which impacts real-time needs.
upvoted 4 times
joechen2023
1 year, 7 months ago
I voted A as well, although not 100% sure why B is not correct. I just selected what seems the most simple solution between A and B. Reason Kruasan gave "Redshift would lack real-time capabilities." This is not true. Redshift could do real-time. evidence https://aws.amazon.com/blogs/big-data/real-time-analytics-with-amazon-redshift-streaming-ingestion/
upvoted 1 times
...
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago