Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 88 discussion

Exam question from Amazon's AWS Certified Data Analytics - Specialty

Question #: 88
Topic #: 1

[All AWS Certified Data Analytics - Specialty Questions]

An online retail company is migrating its reporting system to AWS. The company's legacy system runs data processing on online transactions using a complex series of nested Apache Hive queries. Transactional data is exported from the online system to the reporting system several times a day. Schemas in the files are stable between updates.
A data analyst wants to quickly migrate the data processing to AWS, so any code changes should be minimized. To keep storage costs low, the data analyst decides to store the data in Amazon S3. It is vital that the data from the reports and associated analytics is completely up to date based on the data in Amazon S3.
Which solution meets these requirements?

A. Create an AWS Glue Data Catalog to manage the Hive metadata. Create an AWS Glue crawler over Amazon S3 that runs when data is refreshed to ensure that data changes are updated. Create an Amazon EMR cluster and use the metadata in the AWS Glue Data Catalog to run Hive processing queries in Amazon EMR.
B. Create an AWS Glue Data Catalog to manage the Hive metadata. Create an Amazon EMR cluster with consistent view enabled. Run emrfs sync before each analytics step to ensure data changes are updated. Create an EMR cluster and use the metadata in the AWS Glue Data Catalog to run Hive processing queries in Amazon EMR.
C. Create an Amazon Athena table with CREATE TABLE AS SELECT (CTAS) to ensure data is refreshed from underlying queries against the raw dataset. Create an AWS Glue Data Catalog to manage the Hive metadata over the CTAS table. Create an Amazon EMR cluster and use the metadata in the AWS Glue Data Catalog to run Hive processing queries in Amazon EMR.
D. Use an S3 Select query to ensure that the data is properly updated. Create an AWS Glue Data Catalog to manage the Hive metadata over the S3 Select table. Create an Amazon EMR cluster and use the metadata in the AWS Glue Data Catalog to run Hive processing queries in Amazon EMR.

Show Suggested Answer

Suggested Answer: A 🗳️

by VikG12 at May 3, 2021, 5:56 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

VikG12

Highly Voted 3 years, 7 months ago

looks like 'A' it is.

upvoted 25 times

Dr_Kiko

3 years, 6 months ago

wrong; it says schema is stable so you dont need to jerk crawlers every time B

upvoted 3 times

Ipc01

3 years, 2 months ago

B is wrong, they did not even mention the usage of S3 dude

upvoted 1 times

...

Thiya

Highly Voted 3 years, 4 months ago

Answer should be A, 1. Consistent View is no more required 2. Though schema is stable in this running Glue Crawler is one of the way to get the partition metadata updated

upvoted 5 times

...

wally_1995

Most Recent 1 year, 9 months ago

I think this question is just outdated. For what the question says, B should be the answer. A is too broad, as it doesn't mention that it needs something to trigger the crawler (like a Lambda). And it states that the schema is stable, so no need to run crawler all the time a file is updated. B doesn't need to mention s3 because the "enable consistent view" already means that! It's out dated because Amazon got rid off the consistent view from EMR. But looking 3 years back, B would be the perfect answer.

upvoted 2 times

...

pk349

1 year, 11 months ago

A: I passed the test

upvoted 1 times

...

rocky48

2 years, 9 months ago

Selected Answer: A

Answer is A

upvoted 1 times

...

Fazil_Cp

3 years, 5 months ago

I think answer is A. It can also be that the question is bit outdated , as now S3 has strong read after write consistency , EMRFS consistent view might not make sense now.

upvoted 2 times

...

carlosrochacardoso

3 years, 5 months ago

I think it's A You no longer need to use EMRFS Consistent View as Amazon S3 supports strong read-after-write Consistency. See Strong read-after-write consistency. This works with all Amazon EMR versions. https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-consistent-view.html

upvoted 3 times

...

uninit

3 years, 6 months ago

I believe it is B. The key point is "It is vital that the data from the reports and associated analytics is completely up to date based on the data in Amazon S3." EMRFS Consistent view allows EMR clusters to check for list and read-after-write consistency for Amazon S3 objects written by or synced with EMRFS. If you directly delete objects from Amazon S3 that are tracked in EMRFS metadata, EMRFS treats the object as inconsistent and throws an exception after it has exhausted retries. Use EMRFS to delete objects in Amazon S3 that are tracked using consistent view. Alternatively, you can use the emrfs command line to purge metadata entries for objects that have been directly deleted, or you can sync the consistent view with Amazon S3 immediately after you delete the objects. https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-consistent-view.html To run Glue Crawlers when S3 is refreshed needs S3 Trigger Lambda functions that run the crawler. It seems too much of an overhead especially when EMRFS consistent view / sync command can maintain consistency with already provisioned EMR cluster.

upvoted 4 times

...

kc1982

3 years, 6 months ago

Glue crawler is responsible for the schema definition , which is mentioned as stable in this case .A glue ETL job could have moved the data .Having said that , I go for B

upvoted 4 times

...

Huy

3 years, 6 months ago

B is correct. Because schema is stable, you don't need to run Glue crawler again. Moreover, data stored in S3 therefore EMRFS is needed and with consistent view, data is updated.

upvoted 1 times

Huy

3 years, 6 months ago

Sorry, may be A is correct. When you use AWS Glue Data Catalog as the metastore for Hive, no need to configure EMRFS.

upvoted 1 times

...

Monika14Sharma

3 years, 6 months ago

Correct Answer is A

upvoted 1 times

...

AjithkumarSL

3 years, 7 months ago

Agree with A

upvoted 1 times

...

Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 88 discussion

Comments

VikG12

Dr_Kiko

Ipc01

Thiya

wally_1995

pk349

rocky48

Fazil_Cp

carlosrochacardoso

uninit

kc1982

Huy

Huy

Monika14Sharma

AjithkumarSL

SY0-701