I am not sure whether to go for B or C. Can anyone comment on this?
B: No problem, but not available if Parquet is Gzip compressed. But the problem statement doesn't say Parquet is Gzip compressed.
C: Correct if Parquet is Gzip compressed, but B is more cost-effective if csv or json is Gzip compressed
I think the solution is either B or D but I would go with B because they mentioned storing the data in gzip and not parquet which is optimised for Athena queries
B. Store the data in Amazon S3. Use Amazon S3 Select to query the data.
Amazon S3 is a cost-effective object storage service, and S3 Select allows you to retrieve only a subset of data from an object by using simple SQL expressions. S3 Select works on objects stored in CSV, JSON, or Apache Parquet format. It also supports GZIP and BZIP2 compression formats, which makes it suitable for the given scenario where the data is compressed with gzip.
While Amazon Athena is a powerful query service, it can be more expensive than S3 Select for occasional queries. Amazon Glacier and Glacier Select are designed for long-term archival storage and not for frequent access or queries, which might not be suitable for occasional audits. Therefore, option B is the most cost-effective choice for this scenario.
B. Store the data in Amazon S3. Use Amazon S3 Select to query the data.
Amazon S3 is a cost-effective storage service, and S3 Select allows you to retrieve only a subset of data from an object by using simple SQL expressions. S3 Select works on objects stored in CSV, JSON, or Apache Parquet format. It also supports GZIP compression, which is the format used by the company. This makes it a cost-effective solution for occasional queries needed for audits.
Option B (Amazon S3 with S3 Select) is generally more cost-effective and operationally efficient for occasional audits of gzip-compressed data. It provides faster access to data and lower querying costs, which are critical factors for ad-hoc and timely data retrievals. While Option A (Amazon Glacier Flexible Retrieval with S3 Glacier Select) offers cheaper storage, its longer retrieval times and potential higher querying costs make it less suitable for use cases requiring timely access to data.
A voting comment increases the vote count for the chosen answer by one.
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one.
So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
tgv
Highly Voted 5 months, 1 week agoctndba
Most Recent 1 week agomohamedTR
1 month, 2 weeks agomanig
1 month, 4 weeks agoLR2023
2 months agoPashoQ
2 months, 1 week agocas_tori
3 months, 1 week agoIanJang
3 months, 2 weeks agomns0173
3 months, 2 weeks agolenneth39
3 months, 3 weeks agoandrologin
4 months, 1 week ago4bc91ae
4 months, 2 weeks agocatoteja
3 months, 2 weeks agobakarys
4 months, 3 weeks agoFunkyFresco
4 months, 3 weeks agobakarys
4 months, 3 weeks agoAlagong
4 months, 3 weeks agoHunkyBunky
5 months, 1 week ago