Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam AWS Certified Solutions Architect - Associate SAA-C03 All Questions

View all questions & answers for the AWS Certified Solutions Architect - Associate SAA-C03 exam

Exam AWS Certified Solutions Architect - Associate SAA-C03 topic 1 question 557 discussion

A solutions architect manages an analytics application. The application stores large amounts of semistructured data in an Amazon S3 bucket. The solutions architect wants to use parallel data processing to process the data more quickly. The solutions architect also wants to use information that is stored in an Amazon Redshift database to enrich the data.

Which solution will meet these requirements?

  • A. Use Amazon Athena to process the S3 data. Use AWS Glue with the Amazon Redshift data to enrich the S3 data.
  • B. Use Amazon EMR to process the S3 data. Use Amazon EMR with the Amazon Redshift data to enrich the S3 data.
  • C. Use Amazon EMR to process the S3 data. Use Amazon Kinesis Data Streams to move the S3 data into Amazon Redshift so that the data can be enriched.
  • D. Use AWS Glue to process the S3 data. Use AWS Lake Formation with the Amazon Redshift data to enrich the S3 data.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
Guru4Cloud
Highly Voted 1 year, 1 month ago
Selected Answer: B
Option B is the correct solution that meets the requirements: Use Amazon EMR to process the semi-structured data in Amazon S3. EMR provides a managed Hadoop framework optimized for processing large datasets in S3. EMR supports parallel data processing across multiple nodes to speed up the processing. EMR can integrate directly with Amazon Redshift using the EMR-Redshift integration. This allows querying the Redshift data from EMR and joining it with the S3 data. This enables enriching the semi-structured S3 data with the information stored in Redshift
upvoted 15 times
...
zjcorpuz
Highly Voted 1 year, 2 months ago
By combining AWS Glue and Amazon Redshift, you can process the semistructured data in parallel using Glue ETL jobs and then store the processed and enriched data in a structured format in Amazon Redshift. This approach allows you to perform complex analytics efficiently and at scale.
upvoted 8 times
...
upliftinghut
Most Recent 8 months, 3 weeks ago
Selected Answer: B
D: not relevant, data is semistructured and Glue is more batch than stream data A: not correct, Athena is for querying data B & C look ok but C is out => redundant with Kinesis data stream; EMR already processed data as input into Redshift for parallel processing Only B is most logical
upvoted 3 times
...
awsgeek75
9 months ago
Selected Answer: B
Key requirement: parallel data processing parallel data processing is EMR (Kind of Apache Hadoop) so it only leave B and C C is Kinesis to Redshift which is pointless logic here B EMR for S3 and EMR for Redshift gives maximum parallel processing here
upvoted 2 times
...
pentium75
9 months, 2 weeks ago
Selected Answer: B
A has a pitfall, "use Amazon Athena to PROCESS the data". With Athena you can query, not process, data. C is wrong because Kinesis has no place here. D is wrong because it does not process the Redshift data, and Glue does ETL, not analyze Thus it's B. EMR can use semi-structured data from from S3 and structured data from Redshift and is ideal for "parallel data processing" of "large amounts" of data.
upvoted 4 times
...
aws94
10 months, 1 week ago
Selected Answer: B
large amount of data + parallel data processing = EMR
upvoted 2 times
...
Wuhao
10 months, 1 week ago
Selected Answer: A
Amazon Athena is an interactive query service that makes it easy to analyze data directly in Amazon Simple Storage Service (Amazon S3) using standard SQL.
upvoted 1 times
pentium75
9 months, 2 weeks ago
Y, but A says "process", not "query" data with Athena.
upvoted 1 times
...
...
SHAAHIBHUSHANAWS
10 months, 2 weeks ago
Selected Answer: D Glue use apache pyspark cluster for parallel processing. EMR or Glue are possible options. Glue is serverless so better use this plus pyspark is in memory parallel processing.
upvoted 1 times
...
aragornfsm
10 months, 3 weeks ago
i think a is correct semistructured data ==> Athena
upvoted 1 times
pentium75
9 months, 2 weeks ago
"Hadoop [as used by EMR] helps you turn petabytes of un-structured or semi-structured data into useful insights" https://aws.amazon.com/emr/features/hadoop/
upvoted 1 times
...
...
riyasara
10 months, 3 weeks ago
Athena is not designed for parallel data processing. So it's B
upvoted 2 times
...
TariqKipkemei
11 months ago
Selected Answer: A
Answer is A
upvoted 1 times
...
TariqKipkemei
11 months ago
Selected Answer: B
From this documentation looks like EMR cannot interface with S3. https://aws.amazon.com/emr/ I will settle with option A.
upvoted 1 times
pentium75
9 months, 2 weeks ago
Of course EMR can access S3 https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-file-systems.html
upvoted 1 times
...
...
bogobob
11 months ago
Selected Answer: B
For those answering A, AWS Glue can directly query S3, it can't use Athena as a source of data. The questions say the Redshift data should be user to "enrich" which means thats the redshift data needs to be "added" to the s3 data. A doesn't allow that.
upvoted 1 times
...
hungta
11 months ago
Selected Answer: B
Choose option B. Option A is not correct. Amazon Athena is suitable for querying data directly from S3 using SQL and allows parallel processing of S3 data. AWS Glue can be used for data preparation and enrichment but might not directly integrate with Amazon Redshift for enrichment.
upvoted 1 times
...
potomac
11 months, 2 weeks ago
Selected Answer: A
Athena and Redshift both do SQL query
upvoted 1 times
...
Sab123
1 year ago
Selected Answer: A
semi-structure supported by Athena not by EMR
upvoted 4 times
pentium75
9 months, 2 weeks ago
"Hadoop helps you turn petabytes of un-structured or semi-structured data into useful insights about your applications or users." https://aws.amazon.com/emr/features/hadoop/?nc1=h_ls
upvoted 1 times
...
...
JKevin778
1 year ago
Selected Answer: A
athena for s3
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...