exam questions

Exam AWS Certified Data Engineer - Associate DEA-C01 All Questions

View all questions & answers for the AWS Certified Data Engineer - Associate DEA-C01 exam

Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 56 discussion

A security company stores IoT data that is in JSON format in an Amazon S3 bucket. The data structure can change when the company upgrades the IoT devices. The company wants to create a data catalog that includes the IoT data. The company's analytics department will use the data catalog to index the data.
Which solution will meet these requirements MOST cost-effectively?

  • A. Create an AWS Glue Data Catalog. Configure an AWS Glue Schema Registry. Create a new AWS Glue workload to orchestrate the ingestion of the data that the analytics department will use into Amazon Redshift Serverless.
  • B. Create an Amazon Redshift provisioned cluster. Create an Amazon Redshift Spectrum database for the analytics department to explore the data that is in Amazon S3. Create Redshift stored procedures to load the data into Amazon Redshift.
  • C. Create an Amazon Athena workgroup. Explore the data that is in Amazon S3 by using Apache Spark through Athena. Provide the Athena workgroup schema and tables to the analytics department.
  • D. Create an AWS Glue Data Catalog. Configure an AWS Glue Schema Registry. Create AWS Lambda user defined functions (UDFs) by using the Amazon Redshift Data API. Create an AWS Step Functions job to orchestrate the ingestion of the data that the analytics department will use into Amazon Redshift Serverless.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
rralucard_
Highly Voted 1 year, 2 months ago
Selected Answer: A
Option A, creating an AWS Glue Data Catalog with Glue Schema Registry and orchestrating data ingestion into Amazon Redshift Serverless using AWS Glue, appears to be the most cost-effective and suitable solution. It offers a serverless approach to manage the evolving data schema of the IoT data and efficiently supports data analytics needs without the overhead of managing a provisioned database cluster or complex orchestration setups.
upvoted 9 times
nyaopoko
1 year ago
Selected Answer: A Amazon Redshift Serverless is a serverless option for Amazon Redshift, which means you don't have to provision and manage clusters. This makes it a cost-effective choice for the analytics department's use case.
upvoted 2 times
...
...
VerRi
Most Recent 11 months, 1 week ago
Selected Answer: A
Athena is not able to create new data catalog
upvoted 1 times
...
sdas1
11 months, 3 weeks ago
Option C Cost-effectiveness: Amazon Athena allows you to query data directly from Amazon S3 without the need for any infrastructure setup or management. You pay only for the queries you run, making it cost-effective, especially for sporadic or exploratory analysis. Flexibility: Since the data structure can change with IoT device upgrades, using Athena allows for flexibility in querying and analyzing the data regardless of its structure. You don't need to define a fixed schema upfront, enabling you to adapt to changes seamlessly. Apache Spark Support: Athena supports querying data using Apache Spark, which is powerful for processing and analyzing large datasets. This capability ensures that the analytics department can leverage Spark for more advanced analytics if needed. https://www.youtube.com/watch?v=Q93NZJBFSWw
upvoted 1 times
...
khchan123
12 months ago
Selected Answer: A
The correct solution is A. Create an AWS Glue Data Catalog. Configure an AWS Glue Schema Registry. Create a new AWS Glue workload to orchestrate the ingestion of the data that the analytics department will use into Amazon Redshift Serverless. Option C (Amazon Athena and Apache Spark) is suitable for ad-hoc querying and exploration but may not be the best choice for the analytics department's ongoing data analysis needs, as Athena is designed for interactive querying rather than complex data transformations.
upvoted 2 times
altonh
4 months, 3 weeks ago
However, combined with Notebook, Athena+Spark can be a powerful tool for analytics.
upvoted 1 times
...
...
chris_spencer
1 year ago
Selected Answer: A
The objective is to create a data catalog that includes the IoT data and AWS Glue Data Catalog is the best option for this requirement. https://docs.aws.amazon.com/glue/latest/dg/catalog-and-crawler.html C is incorrect. While Athena makes it easy to read from S3 using SQL, it does not crawl the data source and create a data catalog.
upvoted 4 times
...
Christina666
1 year ago
Selected Answer: C
Why Option C is the Most Cost-Effective Serverless and Pay-as-you-go: Athena is a serverless query service, meaning you only pay for the queries the analytics department runs. No need to provision and manage always-running clusters. Flexible Schema Handling: Athena works well with semi-structured data like JSON and can handle schema evolution on the fly. This is perfect for the scenario where IoT data structures might change. Spark Integration: Integrating Apache Spark with Athena provides rich capabilities for data processing and transformation. Ease of Use for Analytics: Athena's familiar SQL-like interface and ability to directly query S3 data make it convenient for the analytics department.
upvoted 2 times
...
lucas_rfsb
1 year ago
Selected Answer: C
Options A, B, and D involve setting up additional infrastructure (e.g., AWS Glue, Redshift clusters, Lambda functions) which may incur unnecessary costs and complexity for the given requirements. Option C, on the other hand, utilizes a serverless and scalable solution directly querying data in S3, making it the most cost-effective choice.
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago