A company has stored 10 TB of log files in Apache Parquet format in an Amazon S3 bucket. The company occasionally needs to use SQL to analyze the log files.
Which solution will meet these requirements MOST cost-effectively?
A.
Create an Amazon Aurora MySQL database. Migrate the data from the S3 bucket into Aurora by using AWS Database Migration Service (AWS DMS). Issue SQL statements to the Aurora database.
B.
Create an Amazon Redshift cluster. Use Redshift Spectrum to run SQL statements directly on the data in the S3 bucket.
C.
Create an AWS Glue crawler to store and retrieve table metadata from the S3 bucket. Use Amazon Athena to run SQL statements directly on the data in the S3 bucket.
D.
Create an Amazon EMR cluster. Use Apache Spark SQL to run SQL statements directly on the data in the S3 bucket.
A - Aurora is cool but migrating 10 TB of data incurs significant costs and operational overhead.
B - Redshift Spectrum allows querying data directly in S3 without loading it into Redshift, but costs are really high especially for infrequent use.
C - Athena is serverless and charges only for the data scanned by queries. Glue Crawler automatically extracts metadata and schema information from the Parquet files. No need to migrate anything.
D - Just by the look of it I know I'll go bankrupt if I choose that.
A voting comment increases the vote count for the chosen answer by one.
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one.
So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
LeonSauveterre
3 months agosandordini
5 months agoKezuko
6 months, 1 week agoasdfcdsxdfc
6 months, 3 weeks agokempes
7 months, 3 weeks agoAndy_09
7 months, 3 weeks ago