A manufacturing company has structured and unstructured data stored in an Amazon S3 bucket. A Machine Learning Specialist wants to use SQL to run queries on this data. Which solution requires the LEAST effort to be able to query this data?
A.
Use AWS Data Pipeline to transform the data and Amazon RDS to run queries.
B.
Use AWS Glue to catalogue the data and Amazon Athena to run queries.
C.
Use AWS Batch to run ETL on the data and Amazon Aurora to run the queries.
D.
Use AWS Lambda to transform the data and Amazon Kinesis Data Analytics to run queries.
The correct answer HAS TO be B
Using Glue Use AWS Glue to catalogue the data and Amazon Athena to run queries against data on S3 are very typical use cases for those services.
D is not ideal, Lambda can surely do many things but it requires development/testing effort, and Amazon Kinesis Data Analytics is not ideal for ad-hoc queries.
B. Use AWS Glue to catalog the data and Amazon Athena to run queries.
Why is this the best choice?
AWS Glue can automatically catalog both structured and unstructured data in S3.
Amazon Athena is a serverless SQL query service that allows direct SQL queries on S3 data without moving it.
No infrastructure setup is required—just define a Glue Data Catalog and start querying with Athena.
AWS Glue is a fully managed ETL service that makes it easy to move data between data stores. It can automatically crawl, catalogue, and classify data stored in Amazon S3, and make it available for querying and analysis. With AWS Glue, you don't have to worry about the underlying infrastructure and can focus on your data.
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. It integrates with AWS Glue, so you can use the catalogued data directly in Athena without any additional data movement or transformation.
The reason for this choice is that AWS Glue is a fully managed service that provides a data catalogue to make your data in S3 searchable and queryable1. AWS Glue crawls your data sources, identifies data formats, and suggests schemas and transformations1. You can use AWS Glue to catalogue both structured and unstructured data, such as relational data, JSON, XML, CSV files, images, or media files2.
B. I don't think that you even need Glue to transform anything. Just use Glue to define the schemas and then use Athena to query based on those schemas.
Answer is B.
Queries Against an Amazon S3 Data Lake
Data lakes are an increasingly popular way to store and analyze both structured and unstructured data. If you want to build your own custom Amazon S3 data lake, AWS Glue can make all your data immediately available for analytics without moving the data.
https://aws.amazon.com/glue/
A voting comment increases the vote count for the chosen answer by one.
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one.
So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
cybe001
Highly Voted 3 years, 7 months agodhs227
Highly Voted 3 years, 6 months agoJonSno
Most Recent 2 months, 1 week agoreginav
4 months, 1 week agoAjoseO
7 months agoMickey321
1 year, 8 months agoVenkatesh_Babu
1 year, 9 months agoSK27
2 years, 4 months agoryuhei
2 years, 7 months agovetaal
3 years, 3 months agogcpwhiz
3 years, 5 months agocloud_trail
3 years, 5 months agoWillnguyen22
3 years, 6 months agosyu31svc
3 years, 6 months agoroytruong
3 years, 6 months agoJayraam
3 years, 6 months agoPRC
3 years, 6 months agoUrban_Life
3 years, 6 months ago