Exam AWS Certified Data Engineer - Associate DEA-C01 All Questions

View all questions & answers for the AWS Certified Data Engineer - Associate DEA-C01 exam

Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 56 discussion

Exam question from Amazon's AWS Certified Data Engineer - Associate DEA-C01

Question #: 56
Topic #: 1

[All AWS Certified Data Engineer - Associate DEA-C01 Questions]

A security company stores IoT data that is in JSON format in an Amazon S3 bucket. The data structure can change when the company upgrades the IoT devices. The company wants to create a data catalog that includes the IoT data. The company's analytics department will use the data catalog to index the data.
Which solution will meet these requirements MOST cost-effectively?

A. Create an AWS Glue Data Catalog. Configure an AWS Glue Schema Registry. Create a new AWS Glue workload to orchestrate the ingestion of the data that the analytics department will use into Amazon Redshift Serverless.
B. Create an Amazon Redshift provisioned cluster. Create an Amazon Redshift Spectrum database for the analytics department to explore the data that is in Amazon S3. Create Redshift stored procedures to load the data into Amazon Redshift.
C. Create an Amazon Athena workgroup. Explore the data that is in Amazon S3 by using Apache Spark through Athena. Provide the Athena workgroup schema and tables to the analytics department.
D. Create an AWS Glue Data Catalog. Configure an AWS Glue Schema Registry. Create AWS Lambda user defined functions (UDFs) by using the Amazon Redshift Data API. Create an AWS Step Functions job to orchestrate the ingestion of the data that the analytics department will use into Amazon Redshift Serverless.

Show Suggested Answer

Suggested Answer: A 🗳️

by rralucard_ at Feb. 2, 2024, 11:43 a.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

rralucard_

Highly Voted 1 year, 2 months ago

Selected Answer: A

Option A, creating an AWS Glue Data Catalog with Glue Schema Registry and orchestrating data ingestion into Amazon Redshift Serverless using AWS Glue, appears to be the most cost-effective and suitable solution. It offers a serverless approach to manage the evolving data schema of the IoT data and efficiently supports data analytics needs without the overhead of managing a provisioned database cluster or complex orchestration setups.

upvoted 9 times

nyaopoko

1 year ago

Selected Answer: A Amazon Redshift Serverless is a serverless option for Amazon Redshift, which means you don't have to provision and manage clusters. This makes it a cost-effective choice for the analytics department's use case.

upvoted 2 times

...

VerRi

Most Recent 11 months, 1 week ago

Selected Answer: A

Athena is not able to create new data catalog

upvoted 1 times

...

sdas1

11 months, 3 weeks ago

Option C Cost-effectiveness: Amazon Athena allows you to query data directly from Amazon S3 without the need for any infrastructure setup or management. You pay only for the queries you run, making it cost-effective, especially for sporadic or exploratory analysis. Flexibility: Since the data structure can change with IoT device upgrades, using Athena allows for flexibility in querying and analyzing the data regardless of its structure. You don't need to define a fixed schema upfront, enabling you to adapt to changes seamlessly. Apache Spark Support: Athena supports querying data using Apache Spark, which is powerful for processing and analyzing large datasets. This capability ensures that the analytics department can leverage Spark for more advanced analytics if needed. https://www.youtube.com/watch?v=Q93NZJBFSWw

upvoted 1 times

...

khchan123

12 months ago

Selected Answer: A

The correct solution is A. Create an AWS Glue Data Catalog. Configure an AWS Glue Schema Registry. Create a new AWS Glue workload to orchestrate the ingestion of the data that the analytics department will use into Amazon Redshift Serverless. Option C (Amazon Athena and Apache Spark) is suitable for ad-hoc querying and exploration but may not be the best choice for the analytics department's ongoing data analysis needs, as Athena is designed for interactive querying rather than complex data transformations.

upvoted 2 times

altonh

4 months, 3 weeks ago

However, combined with Notebook, Athena+Spark can be a powerful tool for analytics.

upvoted 1 times

...

chris_spencer

1 year ago

Selected Answer: A

The objective is to create a data catalog that includes the IoT data and AWS Glue Data Catalog is the best option for this requirement. https://docs.aws.amazon.com/glue/latest/dg/catalog-and-crawler.html C is incorrect. While Athena makes it easy to read from S3 using SQL, it does not crawl the data source and create a data catalog.

upvoted 4 times

...

Christina666

1 year ago

Selected Answer: C

Why Option C is the Most Cost-Effective Serverless and Pay-as-you-go: Athena is a serverless query service, meaning you only pay for the queries the analytics department runs. No need to provision and manage always-running clusters. Flexible Schema Handling: Athena works well with semi-structured data like JSON and can handle schema evolution on the fly. This is perfect for the scenario where IoT data structures might change. Spark Integration: Integrating Apache Spark with Athena provides rich capabilities for data processing and transformation. Ease of Use for Analytics: Athena's familiar SQL-like interface and ability to directly query S3 data make it convenient for the analytics department.

upvoted 2 times

...

lucas_rfsb

1 year ago

Selected Answer: C

Options A, B, and D involve setting up additional infrastructure (e.g., AWS Glue, Redshift clusters, Lambda functions) which may incur unnecessary costs and complexity for the given requirements. Option C, on the other hand, utilizes a serverless and scalable solution directly querying data in S3, making it the most cost-effective choice.

upvoted 2 times

...

Exam AWS Certified Data Engineer - Associate DEA-C01 All Questions

View all questions & answers for the AWS Certified Data Engineer - Associate DEA-C01 exam

Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 56 discussion

Comments

rralucard_

nyaopoko

VerRi

sdas1

khchan123

altonh

chris_spencer

Christina666

lucas_rfsb

SY0-701