Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 88 discussion

Exam question from Amazon's AWS Certified Data Engineer - Associate DEA-C01

Question #: 88
Topic #: 1

[All AWS Certified Data Engineer - Associate DEA-C01 Questions]

A manufacturing company has many IoT devices in facilities around the world. The company uses Amazon Kinesis Data Streams to collect data from the devices. The data includes device ID, capture date, measurement type, measurement value, and facility ID. The company uses facility ID as the partition key.

The company's operations team recently observed many WriteThroughputExceeded exceptions. The operations team found that some shards were heavily used but other shards were generally idle.

How should the company resolve the issues that the operations team observed?

A. Change the partition key from facility ID to a randomly generated key.
B. Increase the number of shards.
C. Archive the data on the producer's side.
D. Change the partition key from facility ID to capture date.

Show Suggested Answer

Suggested Answer: A 🗳️

by tgv at June 15, 2024, 9:43 a.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

tgv

Highly Voted 1 year ago

Selected Answer: A

The best solution to resolve the issue of uneven shard usage and WriteThroughputExceeded exceptions is to balance the load more evenly across the shards. This can be effectively achieved by changing the partition key to something that ensures a more uniform distribution of data across the shards.

upvoted 6 times

...

Tester_TKK

Most Recent 2 months, 1 week ago

Selected Answer: A

https://aws.amazon.com/blogs/big-data/under-the-hood-scaling-your-kinesis-data-streams/

upvoted 1 times

...

bakarys

12 months ago

Selected Answer: A

The correct answer is **A. Change the partition key from facility ID to a randomly generated key.** Amazon Kinesis Data Streams uses the partition key that you specify to segregate the data records in the stream into shards. If the company uses the facility ID as the partition key, and if some facilities produce more data than others, then the data will be unevenly distributed across the shards. This can lead to some shards being heavily used while others are idle, and can cause `WriteThroughputExceeded` exceptions. By changing the partition key to a randomly generated key, the data records are more likely to be evenly distributed across all the shards, which can help to avoid the issue of some shards being heavily used and others being idle. This solution requires the least operational overhead and does not involve increasing costs (as in option B), archiving data (which might not be desirable or feasible, as in option C), or changing to a partition key that might also lead to uneven distribution (as in option D).

upvoted 2 times

...

didorins

1 year ago

Selected Answer: A

D is not good, because you're effectively making things worse by partitioning by date. My answer is A

upvoted 2 times

...