Exam AWS Certified Data Engineer - Associate DEA-C01 All Questions

View all questions & answers for the AWS Certified Data Engineer - Associate DEA-C01 exam

Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 39 discussion

Exam question from Amazon's AWS Certified Data Engineer - Associate DEA-C01

Question #: 39
Topic #: 1

[All AWS Certified Data Engineer - Associate DEA-C01 Questions]

A company is migrating its database servers from Amazon EC2 instances that run Microsoft SQL Server to Amazon RDS for Microsoft SQL Server DB instances. The company's analytics team must export large data elements every day until the migration is complete. The data elements are the result of SQL joins across multiple tables. The data must be in Apache Parquet format. The analytics team must store the data in Amazon S3.
Which solution will meet these requirements in the MOST operationally efficient way?

A. Create a view in the EC2 instance-based SQL Server databases that contains the required data elements. Create an AWS Glue job that selects the data directly from the view and transfers the data in Parquet format to an S3 bucket. Schedule the AWS Glue job to run every day.
B. Schedule SQL Server Agent to run a daily SQL query that selects the desired data elements from the EC2 instance-based SQL Server databases. Configure the query to direct the output .csv objects to an S3 bucket. Create an S3 event that invokes an AWS Lambda function to transform the output format from .csv to Parquet.
C. Use a SQL query to create a view in the EC2 instance-based SQL Server databases that contains the required data elements. Create and run an AWS Glue crawler to read the view. Create an AWS Glue job that retrieves the data and transfers the data in Parquet format to an S3 bucket. Schedule the AWS Glue job to run every day.
D. Create an AWS Lambda function that queries the EC2 instance-based databases by using Java Database Connectivity (JDBC). Configure the Lambda function to retrieve the required data, transform the data into Parquet format, and transfer the data into an S3 bucket. Use Amazon EventBridge to schedule the Lambda function to run every day.

Show Suggested Answer

Suggested Answer: A 🗳️

by evntdrvn76 at Feb. 3, 2024, 4:48 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

taka5094

Highly Voted 1 year, 1 month ago

Selected Answer: C

Choice A) is almost the same approach, but it doesn't use the AWS Glue crawler, so have to manage the view's metadata manually.

upvoted 7 times

michele_scar

5 months, 3 weeks ago

My fault: is correct A because during a migration process is more efficiently have a crawler that should catch eventually changes of a schema.

upvoted 2 times

michele_scar

5 months, 3 weeks ago

My fault: is correct C because during a migration process is more efficiently have a crawler that should catch eventually changes of a schema.

upvoted 2 times

...

Christina666

Highly Voted 1 year ago

Selected Answer: C

Leveraging SQL Views: Creating a view on the source database simplifies the data extraction process and keeps your SQL logic centralized. Glue Crawler Efficiency: Using a Glue crawler to automatically discover and catalog the view's metadata reduces manual setup. Glue Job for ETL: A dedicated Glue job is well-suited for the data transformation (to Parquet) and loading into S3. Glue jobs offer built-in scheduling capabilities. Operational Efficiency: This approach minimizes custom code and leverages native AWS services for data movement and cataloging.

upvoted 7 times

Dummy92yash

8 months ago

Glue crawler is used to catalog and find the schema. In this requirement the data was already stored in MS SQL server which a relational database. Hence I think A is correct

upvoted 3 times

...

Tester_TKK

Most Recent 6 days, 4 hours ago

Selected Answer: A

Option C in incorrect because it adds a Glue crawler to read the view, which is redundant if the schema is already defined in the view

upvoted 1 times

...

Tester_TKK

6 days, 4 hours ago

Selected Answer: A

Crawler not needed as the schema is already in the view

upvoted 1 times

...

Mperu08

1 week, 6 days ago

Selected Answer: A

Uses AWS Glue, a serverless ETL service optimized for large-scale data processing and Parquet output. The view simplifies query logic, and scheduling is straightforward. No EC2 dependency, minimal maintenance, and distributed processing ensure efficiency.

upvoted 1 times

...

Eltanany

1 month ago

Selected Answer: A

I'll go with A

upvoted 1 times

...

Certified101

2 months, 1 week ago

Selected Answer: A

A is correct - no need for crawler

upvoted 1 times

...

plutonash

3 months, 1 week ago

Selected Answer: A

the scrawler is not necessary, use GLUE job to read data from sql server and transfert to S3 with Apache Parquet format is enough.

upvoted 3 times

...

mtrianac

4 months, 3 weeks ago

Selected Answer: A

No, in this case, using an AWS Glue Crawler is not necessary. The schema is already defined in the SQL Server database, as the created view contains the required structure (columns and data types). AWS Glue can directly connect to the database via JDBC, extract the data, transform it into Parquet format, and store it in S3 without additional steps. A crawler is useful if you're working with data that doesn't have a predefined schema (e.g., files in S3) or if you need the data to be cataloged for services like Amazon Athena. However, for this ETL flow, using just a Glue Job simplifies the process and reduces operational complexity.

upvoted 3 times

...

michele_scar

5 months, 3 weeks ago

Selected Answer: A

Glue crawler is useless because the schema is already in place with a SQL database

upvoted 1 times

michele_scar

5 months, 3 weeks ago

My fault: is correct A because during a migration process is more efficiently have a crawler that should catch eventually changes of a schema.

upvoted 1 times

michele_scar

5 months, 3 weeks ago

My fault: is correct C because during a migration process is more efficiently have a crawler that should catch eventually changes of a schema.

upvoted 1 times

...

leonardoFelipe

5 months, 3 weeks ago

Selected Answer: A

Usually, views aren't true objects in a SGBD, they're just a "nickname" for a specific query string, different of Materialized Views. So, my questions is: can glue crawler understand their metadata? I'd go with A

upvoted 3 times

...

bakarys

9 months, 4 weeks ago

Selected Answer: A

Option A involves creating a view in the EC2 instance-based SQL Server databases that contains the required data elements. An AWS Glue job is then created to select the data directly from the view and transfer the data in Parquet format to an S3 bucket. This job is scheduled to run every day. This approach is operationally efficient as it leverages managed services (AWS Glue) and does not require additional transformation steps. Option D involves creating an AWS Lambda function that queries the EC2 instance-based databases using JDBC. The Lambda function is configured to retrieve the required data, transform the data into Parquet format, and transfer the data into an S3 bucket. This approach could work, but managing and scheduling Lambda functions could add operational overhead compared to using managed services like AWS Glue.

upvoted 3 times

...

GiorgioGss

1 year, 1 month ago

Selected Answer: C

Just beacuse it decouples the whole architecture I will go with C

upvoted 2 times

...

Felix_G

1 year, 1 month ago

Option C seems to be the most operationally efficient: It leverages Glue for both schema discovery (via the crawler) and data transfer (via the Glue job). The Glue job can directly handle the Parquet format conversion. Scheduling the Glue job ensures regular data export without manual intervention.

upvoted 1 times

helpaws

1 year, 1 month ago

you're right: https://aws.amazon.com/blogs/big-data/extracting-multidimensional-data-from-microsoft-sql-server-analysis-services-using-aws-glue/

upvoted 1 times

taka5094

1 year, 1 month ago

Is this right? https://aws.amazon.com/jp/blogs/big-data/extracting-multidimensional-data-from-microsoft-sql-server-analysis-services-using-aws-glue/

upvoted 1 times

...

rralucard_

1 year, 2 months ago

Selected Answer: A

Option A (Creating a view in the EC2 instance-based SQL Server databases and creating an AWS Glue job that selects data from the view, transfers it in Parquet format to S3, and schedules the job to run every day) seems to be the most operationally efficient solution. It leverages AWS Glue’s ETL capabilities for direct data extraction and transformation, minimizes manual steps, and effectively automates the process.

upvoted 3 times

...

evntdrvn76

1 year, 2 months ago

A. Create a view in the EC2 instance-based SQL Server databases that contains the required data elements. Create an AWS Glue job that selects the data directly from the view and transfers the data in Parquet format to an S3 bucket. Schedule the AWS Glue job to run every day. This solution is operationally efficient for exporting data in the required format.

upvoted 2 times

...

Exam AWS Certified Data Engineer - Associate DEA-C01 All Questions

View all questions & answers for the AWS Certified Data Engineer - Associate DEA-C01 exam

Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 39 discussion

Comments

taka5094

michele_scar

michele_scar

Christina666

Dummy92yash

Tester_TKK

Tester_TKK

Mperu08

Eltanany

Certified101

plutonash

mtrianac

michele_scar

michele_scar

michele_scar

leonardoFelipe

bakarys

GiorgioGss

Felix_G

helpaws

taka5094

rralucard_

evntdrvn76

SY0-701