exam questions

Exam AWS Certified Solutions Architect - Professional SAP-C02 All Questions

View all questions & answers for the AWS Certified Solutions Architect - Professional SAP-C02 exam

Exam AWS Certified Solutions Architect - Professional SAP-C02 topic 1 question 92 discussion

A company is running an application in the AWS Cloud. The application collects and stores a large amount of unstructured data in an Amazon S3 bucket. The S3 bucket contains several terabytes of data and uses the S3 Standard storage class. The data increases in size by several gigabytes every day.

The company needs to query and analyze the data. The company does not access data that is more than 1 year old. However, the company must retain all the data indefinitely for compliance reasons.

Which solution will meet these requirements MOST cost-effectively?

  • A. Use S3 Select to query the data. Create an S3 Lifecycle policy to transition data that is more than 1 year old to S3 Glacier Deep Archive.
  • B. Use Amazon Redshift Spectrum to query the data. Create an S3 Lifecycle policy to transition data that is more than 1 year old 10 S3 Glacier Deep Archive.
  • C. Use an AWS Glue Data Catalog and Amazon Athena to query the data. Create an S3 Lifecycle policy to transition data that is more than 1 year old to S3 Glacier Deep Archive.
  • D. Use Amazon Redshift Spectrum to query the data. Create an S3 Lifecycle policy to transition data that is more than 1 year old to S3 Intelligent-Tiering.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
masetromain
Highly Voted 2 years, 3 months ago
Selected Answer: C
The correct answer is C. Use an AWS Glue Data Catalog and Amazon Athena to query the data. Create an S3 Lifecycle policy to transition data that is more than 1 year old to S3 Glacier Deep Archive. This solution allows you to use Amazon Athena and the AWS Glue Data Catalog to query and analyze the data in an S3 bucket. Amazon Athena is a serverless, interactive query service that allows you to analyze data in S3 using SQL. The AWS Glue Data Catalog is a managed metadata repository that can be used to store and retrieve table definitions for data stored in S3. Together, these services can provide a cost-effective way to query and analyze large amounts of unstructured data. Additionally, by using an S3 Lifecycle policy to transition data that is more than 1 year old to S3 Glacier Deep Archive, you can retain the data indefinitely for compliance reasons while also reducing storage costs.
upvoted 21 times
masetromain
2 years, 3 months ago
The other options are not correct because: A. Using S3 Select is good for filtering data in S3, but it may not be a suitable solution for querying and analyzing large amounts of data. B. Amazon Redshift Spectrum can be used to query data stored in S3, but it may not be as cost-effective as using Amazon Athena for querying unstructured data D. Using Amazon Redshift Spectrum with S3 Intelligent-Tiering could be a good solution, but S3 Intelligent-Tiering is designed to optimize storage costs based on access patterns and it would not be the best solution for compliance reasons as S3 Intelligent-Tiering will move data to other storage classes according to access patterns.
upvoted 9 times
Japanese1
1 year, 5 months ago
This is a nonsense explanation. In the first place, Redshift cannot handle unstructured data.
upvoted 4 times
dankositzke
1 year, 2 months ago
Amazon Redshift is designed for structured data. However, Amazon Redshift Spectrum enables you to run queries against exabytes of unstructured data in Amazon S3, with no loading or ETL required.
upvoted 2 times
...
...
...
...
Untamables
Highly Voted 2 years, 3 months ago
Selected Answer: C
Generally, unstructured data should be converted structured data before querying them. AWS Glue can do that. https://docs.aws.amazon.com/glue/latest/dg/schema-relationalize.html https://docs.aws.amazon.com/athena/latest/ug/glue-athena.html
upvoted 7 times
...
GabrielShiao
Most Recent 6 months, 3 weeks ago
Selected Answer: C
B, C seem both acceptable. The reason C is selected is because redshift spectrum need Glue Data Catalog as well which is not mentioned there.
upvoted 1 times
...
gofavad926
1 year, 1 month ago
Selected Answer: C
C, aws glue + amazon athena
upvoted 1 times
...
AimarLeo
1 year, 2 months ago
Many comments were not convincing of not using Redshift Spectrum.. the only reason I see it to exclude that option is a Redshift Spectrum MUST have a Redshift Cluster available to start the query to S3..
upvoted 1 times
djeong95
1 year, 1 month ago
This question is actually pretty difficult since both Redshift Spectrum and AWS Glue + Athena could query unstructured data. Redshift Spectrum and Athena actually cost about the same per TB. However, with Athena, you could lower the cost by compressing the data. Glue doesn't seem to cost that much either. https://aws.amazon.com/redshift/pricing/ https://aws.amazon.com/athena/pricing/ https://aws.amazon.com/glue/pricing/
upvoted 1 times
...
...
ninomfr64
1 year, 3 months ago
Selected Answer: C
A = S3 Select good for filtering an retrieve subset of data, not enough to analyze B = need a Redshift instance that is expensive C = correct (Glue Data Catalog can help putting some structure to data and Athena is good for both query and analytics, transition to Deep Archive after 1 year) D = see answer B + Intelligent-Tiering not the best option here
upvoted 2 times
...
nzin4x
1 year, 3 months ago
redshift spectrum vs athena: https://www.upsolver.com/blog/aws-serverless-redshift-spectrum-athena Both are good solutions to query s3 data. However, redshift spectrum is useful for joining S3 data with other data in Redshift, and if the data is only in S3, it would be preferable to choose athena.
upvoted 1 times
...
career360guru
1 year, 4 months ago
Selected Answer: C
C is the right answer as Data needs to be queried and Analyzed.
upvoted 2 times
...
subbupro
1 year, 4 months ago
Athena and aws glue is more cost , so better go with A . and what is the purpose for aws glue here. AWS glue is for ETL purpose unnecessary
upvoted 1 times
...
Andy16240
1 year, 5 months ago
C correct: S3 copy command in AWS CLI is less operational processes than the batch operation.
upvoted 1 times
...
uC6rW1aB
1 year, 7 months ago
Selected Answer: C
In this particular scenario, using Amazon Athena and AWS Glue Data Catalog might be a better fit due to the large amount of data stored in S3 buckets and growing every day. Athena can query data across an entire S3 bucket or across multiple buckets, which is useful when parsing multiple files and large amounts of data.
upvoted 2 times
...
chico2023
1 year, 8 months ago
Selected Answer: C
Answer: C Criminally tricky question. S3 Select does the same thing as Athena but there are some differences. The key here is "...a large amount of unstructured data..." If wasn't this, S3 Select hands down.
upvoted 3 times
chico2023
1 year, 8 months ago
Using an Olabiba to explain the differences between the two: 1. Query Capability: Amazon Athena is a fully managed interactive query service that allows you to run SQL queries directly on your data in S3. It supports complex queries, joins, aggregations, and even nested data structures. Athena is designed for ad-hoc querying and analysis of large datasets. On the other hand, S3 Select is a feature of Amazon S3 that allows you to retrieve a subset of data from an object using SQL expressions. It is primarily used for selective retrieval of specific data within an object, rather than running complex queries across multiple objects.
upvoted 2 times
chico2023
1 year, 8 months ago
2. Data Format: Amazon Athena supports various data formats such as CSV, JSON, Parquet, Avro, and more. It can automatically infer the schema of your data or you can provide a schema explicitly. Athena can handle structured, semi-structured, and unstructured data. S3 Select, on the other hand, is limited to querying CSV, JSON, and Parquet files. It requires the data to be in a specific format and does not support nested data structures.
upvoted 2 times
chico2023
1 year, 8 months ago
3. Performance: Amazon Athena is optimized for running queries on large datasets and can parallelize the query execution across multiple nodes. It automatically scales resources based on the query complexity and data size, providing fast and efficient query performance. S3 Select, on the other hand, is designed for retrieving a subset of data from an object. It can significantly reduce the amount of data transferred over the network and improve query performance by only retrieving the necessary data. 4. Cost: Both Amazon Athena and S3 Select have different pricing models. Amazon Athena charges based on the amount of data scanned by your queries, while S3 Select charges based on the amount of data selected and returned by your queries. The cost will depend on the size of your data and the complexity of your queries.
upvoted 3 times
...
...
...
...
Jonalb
1 year, 9 months ago
Selected Answer: C
its a C , true question!
upvoted 1 times
...
NikkyDicky
1 year, 10 months ago
C for sure
upvoted 1 times
...
johnballs221
1 year, 11 months ago
Selected Answer: B
redshift spectrum can run sql queries directly on s3
upvoted 1 times
rxhan
1 year, 10 months ago
Not the best for cost.
upvoted 1 times
...
...
mfsec
2 years, 1 month ago
Selected Answer: C
C is the best choice for unstructured data
upvoted 3 times
...
God_Is_Love
2 years, 1 month ago
Selected Answer: C
S3 select only to select few parts of the data and here its lot of unstructured data. So A is wrong. Use Athena console to create Glue crawler as referred here - https://docs.aws.amazon.com/athena/latest/ug/data-sources-glue.html
upvoted 4 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago