exam questions

Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 88 discussion

An online retail company is migrating its reporting system to AWS. The company's legacy system runs data processing on online transactions using a complex series of nested Apache Hive queries. Transactional data is exported from the online system to the reporting system several times a day. Schemas in the files are stable between updates.
A data analyst wants to quickly migrate the data processing to AWS, so any code changes should be minimized. To keep storage costs low, the data analyst decides to store the data in Amazon S3. It is vital that the data from the reports and associated analytics is completely up to date based on the data in Amazon S3.
Which solution meets these requirements?

  • A. Create an AWS Glue Data Catalog to manage the Hive metadata. Create an AWS Glue crawler over Amazon S3 that runs when data is refreshed to ensure that data changes are updated. Create an Amazon EMR cluster and use the metadata in the AWS Glue Data Catalog to run Hive processing queries in Amazon EMR.
  • B. Create an AWS Glue Data Catalog to manage the Hive metadata. Create an Amazon EMR cluster with consistent view enabled. Run emrfs sync before each analytics step to ensure data changes are updated. Create an EMR cluster and use the metadata in the AWS Glue Data Catalog to run Hive processing queries in Amazon EMR.
  • C. Create an Amazon Athena table with CREATE TABLE AS SELECT (CTAS) to ensure data is refreshed from underlying queries against the raw dataset. Create an AWS Glue Data Catalog to manage the Hive metadata over the CTAS table. Create an Amazon EMR cluster and use the metadata in the AWS Glue Data Catalog to run Hive processing queries in Amazon EMR.
  • D. Use an S3 Select query to ensure that the data is properly updated. Create an AWS Glue Data Catalog to manage the Hive metadata over the S3 Select table. Create an Amazon EMR cluster and use the metadata in the AWS Glue Data Catalog to run Hive processing queries in Amazon EMR.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
VikG12
Highly Voted 3 years, 7 months ago
looks like 'A' it is.
upvoted 25 times
Dr_Kiko
3 years, 6 months ago
wrong; it says schema is stable so you dont need to jerk crawlers every time B
upvoted 3 times
Ipc01
3 years, 2 months ago
B is wrong, they did not even mention the usage of S3 dude
upvoted 1 times
...
...
...
Thiya
Highly Voted 3 years, 4 months ago
Answer should be A, 1. Consistent View is no more required 2. Though schema is stable in this running Glue Crawler is one of the way to get the partition metadata updated
upvoted 5 times
...
wally_1995
Most Recent 1 year, 9 months ago
I think this question is just outdated. For what the question says, B should be the answer. A is too broad, as it doesn't mention that it needs something to trigger the crawler (like a Lambda). And it states that the schema is stable, so no need to run crawler all the time a file is updated. B doesn't need to mention s3 because the "enable consistent view" already means that! It's out dated because Amazon got rid off the consistent view from EMR. But looking 3 years back, B would be the perfect answer.
upvoted 2 times
...
pk349
1 year, 11 months ago
A: I passed the test
upvoted 1 times
...
rocky48
2 years, 9 months ago
Selected Answer: A
Answer is A
upvoted 1 times
...
Fazil_Cp
3 years, 5 months ago
I think answer is A. It can also be that the question is bit outdated , as now S3 has strong read after write consistency , EMRFS consistent view might not make sense now.
upvoted 2 times
...
carlosrochacardoso
3 years, 5 months ago
I think it's A You no longer need to use EMRFS Consistent View as Amazon S3 supports strong read-after-write Consistency. See Strong read-after-write consistency. This works with all Amazon EMR versions. https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-consistent-view.html
upvoted 3 times
...
uninit
3 years, 6 months ago
I believe it is B. The key point is "It is vital that the data from the reports and associated analytics is completely up to date based on the data in Amazon S3." EMRFS Consistent view allows EMR clusters to check for list and read-after-write consistency for Amazon S3 objects written by or synced with EMRFS. If you directly delete objects from Amazon S3 that are tracked in EMRFS metadata, EMRFS treats the object as inconsistent and throws an exception after it has exhausted retries. Use EMRFS to delete objects in Amazon S3 that are tracked using consistent view. Alternatively, you can use the emrfs command line to purge metadata entries for objects that have been directly deleted, or you can sync the consistent view with Amazon S3 immediately after you delete the objects. https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-consistent-view.html To run Glue Crawlers when S3 is refreshed needs S3 Trigger Lambda functions that run the crawler. It seems too much of an overhead especially when EMRFS consistent view / sync command can maintain consistency with already provisioned EMR cluster.
upvoted 4 times
...
kc1982
3 years, 6 months ago
Glue crawler is responsible for the schema definition , which is mentioned as stable in this case .A glue ETL job could have moved the data .Having said that , I go for B
upvoted 4 times
...
Huy
3 years, 6 months ago
B is correct. Because schema is stable, you don't need to run Glue crawler again. Moreover, data stored in S3 therefore EMRFS is needed and with consistent view, data is updated.
upvoted 1 times
Huy
3 years, 6 months ago
Sorry, may be A is correct. When you use AWS Glue Data Catalog as the metastore for Hive, no need to configure EMRFS.
upvoted 1 times
...
...
Monika14Sharma
3 years, 6 months ago
Correct Answer is A
upvoted 1 times
...
AjithkumarSL
3 years, 7 months ago
Agree with A
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago