exam questions

Exam AWS Certified Solutions Architect - Professional SAP-C02 All Questions

View all questions & answers for the AWS Certified Solutions Architect - Professional SAP-C02 exam

Exam AWS Certified Solutions Architect - Professional SAP-C02 topic 1 question 75 discussion

A solutions architect is designing the data storage and retrieval architecture for a new application that a company will be launching soon. The application is designed to ingest millions of small records per minute from devices all around the world. Each record is less than 4 KB in size and needs to be stored in a durable location where it can be retrieved with low latency. The data is ephemeral and the company is required to store the data for 120 days only, after which the data can be deleted.

The solutions architect calculates that, during the course of a year, the storage requirements would be about 10-15 TB.

Which storage strategy is the MOST cost-effective and meets the design requirements?

  • A. Design the application to store each incoming record as a single .csv file in an Amazon S3 bucket to allow for indexed retrieval. Configure a lifecycle policy to delete data older than 120 days.
  • B. Design the application to store each incoming record in an Amazon DynamoDB table properly configured for the scale. Configure the DynamoDB Time to Live (TTL) feature to delete records older than 120 days.
  • C. Design the application to store each incoming record in a single table in an Amazon RDS MySQL database. Run a nightly cron job that runs a query to delete any records older than 120 days.
  • D. Design the application to batch incoming records before writing them to an Amazon S3 bucket. Update the metadata for the object to contain the list of records in the batch and use the Amazon S3 metadata search feature to retrieve the data. Configure a lifecycle policy to delete the data after 120 days.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️
Community vote distribution
B (76%)
D (24%)

Comments

Chosen Answer:
This is a voting comment. You can switch to a simple comment. It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
masetromain
Highly Voted 2 years, 2 months ago
Selected Answer: B
The most cost-effective and efficient solution that meets the design requirements would be option B, Design the application to store each incoming record in an Amazon DynamoDB table properly configured for the scale. Configure the DynamoDB Time to Live (TTL) feature to delete records older than 120 days. DynamoDB is a NoSQL key-value store designed for high scale and performance. It is fully managed by AWS and can easily handle millions of small records per minute. Additionally, with the TTL feature, you can set an expiration time for each record, so that the data can be automatically deleted after the specified time period.
upvoted 23 times
masetromain
2 years, 2 months ago
Option A, storing each incoming record as a single .csv file in an Amazon S3 bucket, would not be a good option because it would be difficult to retrieve individual records from the .csv files, and will likely increase the cost of data retrieval. Option C, storing each incoming record in a single table in an Amazon RDS MySQL database, would be a more expensive option as RDS is typically more expensive than DynamoDB. Additionally, running a cron job to delete old data could lead to additional operational overhead. Option D, storing incoming records in batches in an S3 bucket, would be a less efficient option as it would require additional processing and parsing of the data to retrieve individual records.
upvoted 7 times
...
...
dkx
Highly Voted 1 year, 8 months ago
A. No, because millions of writes to a single .csv file would cause read and write latency B. Yes, because DynamoDB can support peaks of more than 20 million requests per second. C. No, because creating nightly cron is unnecessary, and a relation database isn't designed to ingest millions of small records per minute D. No, because S3 supports 210,000 PUT requests per minute (3,500 requests per second * 60 seconds per min) which is far less than 1,000,000+ writes per minute
upvoted 6 times
ahhatem
3 months ago
Actually, the limit you mentioned for point D is per prefix or path…. Not the whole bucket. With proper data distribution across prefixes it can accommodate easily for the load mentioned.
upvoted 2 times
...
...
vmia159
Most Recent 3 days, 17 hours ago
Selected Answer: D
For those who said B, how many WCU is needed for dynamoDB? Given: 1 million records per minute 4KB per record This translates to approximately 16,667 records per second (1,000,000 / 60) For DynamoDB WCU calculation: 1 WCU = 1 write per second for items up to 1KB For items larger than 1KB, the WCU is rounded up to the next 1KB For 4KB items, each write will consume 4 WCUs Therefore: WCUs needed = (Records per second) × (Item size in KB rounded up) WCUs = 16,667 × 4 WCUs = 66,668 WCUs First, you need to increase the quotas for that table by submitting a support ticket. https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ServiceQuotas.html Second, this is very expensive. Obviously, combine it with kinesis data agent and firehorse that write to S3 will be much reliable options but it will increase the cost significantly. But still cheaper than the dynamo db options. https://calculator.aws/#/estimate?id=87f1df21449660b0b9d61a6c1153632b1983d2e4
upvoted 1 times
...
soulation
1 week, 5 days ago
Selected Answer: D
Option B is too expensive.
upvoted 1 times
...
sergza
3 months ago
Selected Answer: D
If you really think about being cost effective than Option D is the right choice
upvoted 1 times
...
Heman31in
3 months, 1 week ago
Selected Answer: D
Why Option D Might Be Cost-Effective: Lower Storage Costs: S3 storage is generally cheaper than DynamoDB when dealing with large amounts of data (e.g., $0.023/GB/month for S3 Standard vs. $0.25/GB/month for DynamoDB on-demand). Batching Reduces API Call Costs: By batching multiple records into a single object, you reduce the number of PUT requests to S3. This can lead to lower API costs compared to writing each record individually to DynamoDB. Lifecycle Policies for Data Expiry: S3 lifecycle policies automatically clean up data older than 120 days, similar to DynamoDB's TTL feature.
upvoted 1 times
...
amministrazione
6 months, 2 weeks ago
D. Design the application to batch incoming records before writing them to an Amazon S3 bucket. Update the metadata for the object to contain the list of records in the batch and use the Amazon S3 metadata search feature to retrieve the data. Configure a lifecycle policy to delete the data after 120 days.
upvoted 1 times
...
ahhatem
9 months, 1 week ago
Selected Answer: B
Obviously it is DynamoDB. Although as a side node I would say it is probably a very bad choice as it would be astronomically expensive for millions of writes per minute…. A Kinesis Data Streams would make much more sense especially that the data is only needed for 3 months…
upvoted 2 times
ahhatem
3 months ago
After a second thought, I am not sure it is B. D would be much cheaper if it means that objects buffered and combined before write. But the word “batch” doesn’t make me comfortable, batching means writing the objects in one go… nothing implies the objects would be combined …
upvoted 1 times
...
...
gofavad926
1 year ago
Selected Answer: B
B, dynamodb is the best option
upvoted 1 times
...
8608f25
1 year, 1 month ago
Selected Answer: B
For small records less than 4 KB, DynamoDB can efficiently handle the ingestion of millions o records per minute from devices around the world, meeting the application's design requirements for low-latency data access. Additionally, DynamoDB's Time to Live (TTL) feature allows for automatic deletion of items after a specific period, aligning with the requirement to store data for only 120 days.
upvoted 1 times
...
ninomfr64
1 year, 2 months ago
Selected Answer: B
A = S3 is not great with small files and searching for data based on index (a common pattern is to store object metadata in a database like DDB, OpenSearch or RDS/Aurora). Many small files can lead to high costs for retrieval B = correct C = single-table design, high volume write/retrieval os small object and no need for complex query are better served and cost less with DDB rather than RDS D = more efficient than A, but still S3 metadata search feature is limited
upvoted 1 times
...
severlight
1 year, 4 months ago
Selected Answer: B
see uC6rW1aB's answer
upvoted 1 times
...
vjp_training
1 year, 5 months ago
Selected Answer: B
B is the best for cost-effective. D is more cost for S3 request
upvoted 1 times
...
uC6rW1aB
1 year, 6 months ago
Selected Answer: B
Ref: https://aws.amazon.com/dynamodb/pricing/on-demand/ DynamoDB read requests can be either strongly consistent, eventually consistent, or transactional. A strongly consistent read request of up to 4 KB requires one read request unit. For items larger than 4 KB, additional read request units are required.
upvoted 3 times
uC6rW1aB
1 year, 6 months ago
for a US East write object price: S3 Standard put object per thound cost $0.005 -> 1 million put cost $5 ( per minutes in this situation ) Dynamo DB 1 million write cost $1.25 is a lot of cheaper
upvoted 4 times
...
...
Gmail78
1 year, 6 months ago
Selected Answer: D
Dinamo DB is at least 5X more expensive than S3 for this use case. There are million of writing and each is 4K, total disk space is 10-15TB.
upvoted 1 times
vn_thanhtung
1 year, 6 months ago
D - S3 metadata search feature does not exist
upvoted 2 times
...
...
Soweetadad
1 year, 6 months ago
Selected Answer: D
Although both B and D are correct, Option D is more cost effective.
upvoted 1 times
...
YodaMaster
1 year, 8 months ago
Selected Answer: D
Going with D as it's more cost effective. Question didn't ask for more efficient.
upvoted 1 times
blackgamer
1 year, 4 months ago
B satisfies the requirement but D is not. The keyword here is Low latency - “ a durable location where it can be retrieved with low latency”
upvoted 3 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
MS-102
Perth, 1 minute ago