Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam AWS Certified Data Engineer - Associate DEA-C01 All Questions

View all questions & answers for the AWS Certified Data Engineer - Associate DEA-C01 exam

Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 27 discussion

A company wants to implement real-time analytics capabilities. The company wants to use Amazon Kinesis Data Streams and Amazon Redshift to ingest and process streaming data at the rate of several gigabytes per second. The company wants to derive near real-time insights by using existing business intelligence (BI) and analytics tools.
Which solution will meet these requirements with the LEAST operational overhead?

  • A. Use Kinesis Data Streams to stage data in Amazon S3. Use the COPY command to load data from Amazon S3 directly into Amazon Redshift to make the data immediately available for real-time analysis.
  • B. Access the data from Kinesis Data Streams by using SQL queries. Create materialized views directly on top of the stream. Refresh the materialized views regularly to query the most recent stream data.
  • C. Create an external schema in Amazon Redshift to map the data from Kinesis Data Streams to an Amazon Redshift object. Create a materialized view to read data from the stream. Set the materialized view to auto refresh.
  • D. Connect Kinesis Data Streams to Amazon Kinesis Data Firehose. Use Kinesis Data Firehose to stage the data in Amazon S3. Use the COPY command to load the data from Amazon S3 to a table in Amazon Redshift.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
helpaws
Highly Voted 8 months ago
Selected Answer: C
Key word here is near real-time. If it's involve S3 and COPY, it's not gonna be near real-time
upvoted 7 times
markill123
2 months ago
Redshift cannot create external schemas that map directly to Kinesis Data Streams. You would still need an intermediary step, such as Firehose or S3, to handle data ingestion. Additionally, maintaining auto-refreshing materialized views directly from a stream isn't feasible with Redshift.
upvoted 3 times
...
...
blackgamer
Highly Voted 7 months, 2 weeks ago
Selected Answer: C
The answer is C. It can provide near real-time insight analysis. Refer the article from AWS - https://aws.amazon.com/blogs/big-data/real-time-analytics-with-amazon-redshift-streaming-ingestion/
upvoted 5 times
...
Asen_Cat
Most Recent 1 week, 1 day ago
Selected Answer: D
D could be the most standard way to handle this case. How to use C to implement it is questionable for me.
upvoted 2 times
...
heavenlypearl
1 week, 1 day ago
Selected Answer: C
Amazon Redshift can automatically refresh materialized views with up-to-date data from its base tables when materialized views are created with or altered to have the autorefresh option. Amazon Redshift autorefreshes materialized views as soon as possible after base tables changes. https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-refresh.html
upvoted 1 times
...
royalrum
1 week, 5 days ago
Firehose is Near-Real time, you can set your buffer size and stream to either Redshift or S3 directly. Since Redshift is not in the option, use s3...
upvoted 1 times
...
Shatheesh
3 weeks ago
Selected Answer: D
Kinesis Data Streams , option D using Kinesis Data Firehose is a fully managed service that automatically handles the ingestion of data
upvoted 1 times
...
markill123
2 months ago
Selected Answer: D
Here’s why D is the best choice: Kinesis Data Firehose is a fully managed service that automatically handles the ingestion of data from Kinesis Data Streams and stages it in S3, which significantly reduces operational overhead compared to managing custom data ingestion pipelines. S3 as a staging area: Using Amazon S3 as a staging location allows for flexible data management, high durability, and direct loading into Redshift without needing to manage complex buffering or data handling processes. COPY command: The COPY command in Amazon Redshift is highly optimized for loading large datasets efficiently, making it a common and effective method to load bulk data from S3 into Redshift for near real-time analysis. Firehose to Redshift: Firehose can automatically buffer, batch, and transform data before loading it into Redshift, reducing manual intervention and ensuring data is readily available for real-time analytics.
upvoted 3 times
...
shammous
2 months, 1 week ago
Selected Answer: D
Option C has an issue: Redshift does not natively support direct querying or mapping of Kinesis Data Streams. D is the only correct option.
upvoted 2 times
...
V0811
3 months, 1 week ago
Selected Answer: D
Option D
upvoted 2 times
...
bakarys
4 months, 1 week ago
Selected Answer: A
Option A (using Kinesis Data Streams to stage data in Amazon S3 and loading it directly into Amazon Redshift) is the most straightforward and efficient approach. It minimizes operational overhead and ensures immediate availability of data for analysis. Options B and C introduce additional complexity and may not provide the same level of efficiency
upvoted 1 times
...
d8945a1
6 months, 1 week ago
Selected Answer: C
MVs in Redshift with auto refresh is the best option for near real time.
upvoted 2 times
...
Christina666
7 months ago
Selected Answer: C
Using materialized views with auto-refresh directly on a Redshift external schema of Kinesis Data Stream offers the most streamlined and efficient approach for near real-time insights using existing BI tools.
upvoted 3 times
...
fceb2c1
7 months, 3 weeks ago
Selected Answer: C
https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-streaming-ingestion-getting-started.html C is correct. (KDS -> Redshift) D is wrong as it has more operational overhead (KDS -> KDF -> S3 -> Redshift)
upvoted 5 times
...
certplan
7 months, 4 weeks ago
1. Amazon Kinesis Data Firehose: It's designed to reliably load streaming data into data lakes and data stores with minimal configuration and management overhead. It handles tasks like buffering, scaling, and delivering data to destinations like Amazon S3 and Amazon Redshift automatically. 2. Amazon S3 as a staging area: Storing data in Amazon S3 provides a scalable and durable solution for data storage without needing to manage infrastructure. It also allows for easy integration with other AWS services and existing BI and analytics tools. 3. Amazon Redshift: While Redshift requires some setup and management, loading data from Amazon S3 using the COPY command is a straightforward process. Once data is loaded into Redshift, existing BI and analytics tools can query the data directly, enabling near real-time insights. 4. Minimal operational overhead: This solution minimizes operational overhead because much of the management tasks, such as scaling, buffering, and delivery of data, are handled by Amazon Kinesis Data Firehose. Additionally, using Amazon S3 as a staging area simplifies data storage and integration with other services.
upvoted 2 times
...
certplan
7 months, 4 weeks ago
By considering the characteristics and capabilities of each AWS service and approach, along with insights from AWS documentation, it becomes evident that option D offers the most streamlined and operationally efficient solution for the scenario described. This idea/concept is also straight out of the Amazon Solutions Architect course material.
upvoted 1 times
...
certplan
7 months, 4 weeks ago
Point: "Which solution will meet these requirements with the LEAST operational overhead?" C. - This approach involves creating an external schema in Amazon Redshift to map data from Kinesis Data Streams, which adds complexity compared to directly loading data from Amazon S3 using Amazon Kinesis Data Firehose. - While materialized views with auto-refresh can provide near real-time insights, managing them and ensuring proper synchronization with the streaming data source may require more operational effort. - AWS documentation for Amazon Redshift primarily focuses on traditional data loading methods and querying, with limited guidance on integrating with real-time data sources like Kinesis Data Streams.
upvoted 1 times
...
GiorgioGss
8 months, 1 week ago
Selected Answer: D
I think D. It could be C but because of "LEAST operational overhead" I will go with D.
upvoted 3 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...