Exam AWS Certified Solutions Architect - Professional SAP-C02 All Questions

View all questions & answers for the AWS Certified Solutions Architect - Professional SAP-C02 exam

Exam AWS Certified Solutions Architect - Professional SAP-C02 topic 1 question 92 discussion

Exam question from Amazon's AWS Certified Solutions Architect - Professional SAP-C02

Question #: 92
Topic #: 1

[All AWS Certified Solutions Architect - Professional SAP-C02 Questions]

A company is running an application in the AWS Cloud. The application collects and stores a large amount of unstructured data in an Amazon S3 bucket. The S3 bucket contains several terabytes of data and uses the S3 Standard storage class. The data increases in size by several gigabytes every day.

The company needs to query and analyze the data. The company does not access data that is more than 1 year old. However, the company must retain all the data indefinitely for compliance reasons.

Which solution will meet these requirements MOST cost-effectively?

A. Use S3 Select to query the data. Create an S3 Lifecycle policy to transition data that is more than 1 year old to S3 Glacier Deep Archive.
B. Use Amazon Redshift Spectrum to query the data. Create an S3 Lifecycle policy to transition data that is more than 1 year old 10 S3 Glacier Deep Archive.
C. Use an AWS Glue Data Catalog and Amazon Athena to query the data. Create an S3 Lifecycle policy to transition data that is more than 1 year old to S3 Glacier Deep Archive.
D. Use Amazon Redshift Spectrum to query the data. Create an S3 Lifecycle policy to transition data that is more than 1 year old to S3 Intelligent-Tiering.

Show Suggested Answer

Suggested Answer: C 🗳️

by masetromain at Jan. 15, 2023, 9:54 a.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

masetromain

Highly Voted 2 years, 5 months ago

Selected Answer: C

The correct answer is C. Use an AWS Glue Data Catalog and Amazon Athena to query the data. Create an S3 Lifecycle policy to transition data that is more than 1 year old to S3 Glacier Deep Archive. This solution allows you to use Amazon Athena and the AWS Glue Data Catalog to query and analyze the data in an S3 bucket. Amazon Athena is a serverless, interactive query service that allows you to analyze data in S3 using SQL. The AWS Glue Data Catalog is a managed metadata repository that can be used to store and retrieve table definitions for data stored in S3. Together, these services can provide a cost-effective way to query and analyze large amounts of unstructured data. Additionally, by using an S3 Lifecycle policy to transition data that is more than 1 year old to S3 Glacier Deep Archive, you can retain the data indefinitely for compliance reasons while also reducing storage costs.

upvoted 21 times

masetromain

2 years, 5 months ago

The other options are not correct because: A. Using S3 Select is good for filtering data in S3, but it may not be a suitable solution for querying and analyzing large amounts of data. B. Amazon Redshift Spectrum can be used to query data stored in S3, but it may not be as cost-effective as using Amazon Athena for querying unstructured data D. Using Amazon Redshift Spectrum with S3 Intelligent-Tiering could be a good solution, but S3 Intelligent-Tiering is designed to optimize storage costs based on access patterns and it would not be the best solution for compliance reasons as S3 Intelligent-Tiering will move data to other storage classes according to access patterns.

upvoted 9 times

Japanese1

1 year, 7 months ago

This is a nonsense explanation. In the first place, Redshift cannot handle unstructured data.

upvoted 4 times

dankositzke

1 year, 4 months ago

Amazon Redshift is designed for structured data. However, Amazon Redshift Spectrum enables you to run queries against exabytes of unstructured data in Amazon S3, with no loading or ETL required.

upvoted 3 times

...

Untamables

Highly Voted 2 years, 5 months ago

Selected Answer: C

Generally, unstructured data should be converted structured data before querying them. AWS Glue can do that. https://docs.aws.amazon.com/glue/latest/dg/schema-relationalize.html https://docs.aws.amazon.com/athena/latest/ug/glue-athena.html

upvoted 7 times

...

GabrielShiao

Most Recent 9 months ago

Selected Answer: C

B, C seem both acceptable. The reason C is selected is because redshift spectrum need Glue Data Catalog as well which is not mentioned there.

upvoted 1 times

...

gofavad926

1 year, 3 months ago

Selected Answer: C

C, aws glue + amazon athena

upvoted 1 times

...

AimarLeo

1 year, 5 months ago

Many comments were not convincing of not using Redshift Spectrum.. the only reason I see it to exclude that option is a Redshift Spectrum MUST have a Redshift Cluster available to start the query to S3..

upvoted 1 times

djeong95

1 year, 4 months ago

This question is actually pretty difficult since both Redshift Spectrum and AWS Glue + Athena could query unstructured data. Redshift Spectrum and Athena actually cost about the same per TB. However, with Athena, you could lower the cost by compressing the data. Glue doesn't seem to cost that much either. https://aws.amazon.com/redshift/pricing/ https://aws.amazon.com/athena/pricing/ https://aws.amazon.com/glue/pricing/

upvoted 1 times

...

ninomfr64

1 year, 5 months ago

Selected Answer: C

A = S3 Select good for filtering an retrieve subset of data, not enough to analyze B = need a Redshift instance that is expensive C = correct (Glue Data Catalog can help putting some structure to data and Athena is good for both query and analytics, transition to Deep Archive after 1 year) D = see answer B + Intelligent-Tiering not the best option here

upvoted 2 times

...

nzin4x

1 year, 5 months ago

redshift spectrum vs athena: https://www.upsolver.com/blog/aws-serverless-redshift-spectrum-athena Both are good solutions to query s3 data. However, redshift spectrum is useful for joining S3 data with other data in Redshift, and if the data is only in S3, it would be preferable to choose athena.

upvoted 1 times

...

career360guru

1 year, 6 months ago

Selected Answer: C

C is the right answer as Data needs to be queried and Analyzed.

upvoted 2 times

...

subbupro

1 year, 7 months ago

Athena and aws glue is more cost , so better go with A . and what is the purpose for aws glue here. AWS glue is for ETL purpose unnecessary

upvoted 1 times

...

Andy16240

1 year, 7 months ago

C correct: S3 copy command in AWS CLI is less operational processes than the batch operation.

upvoted 1 times

...

uC6rW1aB

1 year, 10 months ago

Selected Answer: C

In this particular scenario, using Amazon Athena and AWS Glue Data Catalog might be a better fit due to the large amount of data stored in S3 buckets and growing every day. Athena can query data across an entire S3 bucket or across multiple buckets, which is useful when parsing multiple files and large amounts of data.

upvoted 2 times

...

chico2023

1 year, 11 months ago

Selected Answer: C

Answer: C Criminally tricky question. S3 Select does the same thing as Athena but there are some differences. The key here is "...a large amount of unstructured data..." If wasn't this, S3 Select hands down.

upvoted 3 times

chico2023

1 year, 11 months ago

Using an Olabiba to explain the differences between the two: 1. Query Capability: Amazon Athena is a fully managed interactive query service that allows you to run SQL queries directly on your data in S3. It supports complex queries, joins, aggregations, and even nested data structures. Athena is designed for ad-hoc querying and analysis of large datasets. On the other hand, S3 Select is a feature of Amazon S3 that allows you to retrieve a subset of data from an object using SQL expressions. It is primarily used for selective retrieval of specific data within an object, rather than running complex queries across multiple objects.

upvoted 2 times

chico2023

1 year, 11 months ago

2. Data Format: Amazon Athena supports various data formats such as CSV, JSON, Parquet, Avro, and more. It can automatically infer the schema of your data or you can provide a schema explicitly. Athena can handle structured, semi-structured, and unstructured data. S3 Select, on the other hand, is limited to querying CSV, JSON, and Parquet files. It requires the data to be in a specific format and does not support nested data structures.

upvoted 2 times

chico2023

1 year, 11 months ago

3. Performance: Amazon Athena is optimized for running queries on large datasets and can parallelize the query execution across multiple nodes. It automatically scales resources based on the query complexity and data size, providing fast and efficient query performance. S3 Select, on the other hand, is designed for retrieving a subset of data from an object. It can significantly reduce the amount of data transferred over the network and improve query performance by only retrieving the necessary data. 4. Cost: Both Amazon Athena and S3 Select have different pricing models. Amazon Athena charges based on the amount of data scanned by your queries, while S3 Select charges based on the amount of data selected and returned by your queries. The cost will depend on the size of your data and the complexity of your queries.

upvoted 3 times

...

Jonalb

1 year, 11 months ago

Selected Answer: C

its a C , true question!

upvoted 1 times

...

NikkyDicky

2 years ago

C for sure

upvoted 1 times

...

johnballs221

2 years, 1 month ago

Selected Answer: B

redshift spectrum can run sql queries directly on s3

upvoted 1 times

rxhan

2 years ago

Not the best for cost.

upvoted 1 times

...

mfsec

2 years, 3 months ago

Selected Answer: C

C is the best choice for unstructured data

upvoted 3 times

...

God_Is_Love

2 years, 4 months ago

Selected Answer: C

S3 select only to select few parts of the data and here its lot of unstructured data. So A is wrong. Use Athena console to create Glue crawler as referred here - https://docs.aws.amazon.com/athena/latest/ug/data-sources-glue.html

upvoted 4 times

...

Load full discussion...