exam questions

Exam AWS Certified Data Engineer - Associate DEA-C01 All Questions

View all questions & answers for the AWS Certified Data Engineer - Associate DEA-C01 exam

Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 39 discussion

A company is migrating its database servers from Amazon EC2 instances that run Microsoft SQL Server to Amazon RDS for Microsoft SQL Server DB instances. The company's analytics team must export large data elements every day until the migration is complete. The data elements are the result of SQL joins across multiple tables. The data must be in Apache Parquet format. The analytics team must store the data in Amazon S3.
Which solution will meet these requirements in the MOST operationally efficient way?

  • A. Create a view in the EC2 instance-based SQL Server databases that contains the required data elements. Create an AWS Glue job that selects the data directly from the view and transfers the data in Parquet format to an S3 bucket. Schedule the AWS Glue job to run every day.
  • B. Schedule SQL Server Agent to run a daily SQL query that selects the desired data elements from the EC2 instance-based SQL Server databases. Configure the query to direct the output .csv objects to an S3 bucket. Create an S3 event that invokes an AWS Lambda function to transform the output format from .csv to Parquet.
  • C. Use a SQL query to create a view in the EC2 instance-based SQL Server databases that contains the required data elements. Create and run an AWS Glue crawler to read the view. Create an AWS Glue job that retrieves the data and transfers the data in Parquet format to an S3 bucket. Schedule the AWS Glue job to run every day.
  • D. Create an AWS Lambda function that queries the EC2 instance-based databases by using Java Database Connectivity (JDBC). Configure the Lambda function to retrieve the required data, transform the data into Parquet format, and transfer the data into an S3 bucket. Use Amazon EventBridge to schedule the Lambda function to run every day.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
taka5094
Highly Voted 11 months ago
Selected Answer: C
Choice A) is almost the same approach, but it doesn't use the AWS Glue crawler, so have to manage the view's metadata manually.
upvoted 7 times
michele_scar
3 months, 1 week ago
My fault: is correct A because during a migration process is more efficiently have a crawler that should catch eventually changes of a schema.
upvoted 2 times
michele_scar
3 months, 1 week ago
My fault: is correct C because during a migration process is more efficiently have a crawler that should catch eventually changes of a schema.
upvoted 2 times
...
...
...
Christina666
Highly Voted 10 months ago
Selected Answer: C
Leveraging SQL Views: Creating a view on the source database simplifies the data extraction process and keeps your SQL logic centralized. Glue Crawler Efficiency: Using a Glue crawler to automatically discover and catalog the view's metadata reduces manual setup. Glue Job for ETL: A dedicated Glue job is well-suited for the data transformation (to Parquet) and loading into S3. Glue jobs offer built-in scheduling capabilities. Operational Efficiency: This approach minimizes custom code and leverages native AWS services for data movement and cataloging.
upvoted 7 times
Dummy92yash
5 months, 3 weeks ago
Glue crawler is used to catalog and find the schema. In this requirement the data was already stored in MS SQL server which a relational database. Hence I think A is correct
upvoted 3 times
...
...
Certified101
Most Recent 6 hours, 40 minutes ago
Selected Answer: A
A is correct - no need for crawler
upvoted 1 times
...
plutonash
1 month ago
Selected Answer: A
the scrawler is not necessary, use GLUE job to read data from sql server and transfert to S3 with Apache Parquet format is enough.
upvoted 3 times
...
mtrianac
2 months, 1 week ago
Selected Answer: A
No, in this case, using an AWS Glue Crawler is not necessary. The schema is already defined in the SQL Server database, as the created view contains the required structure (columns and data types). AWS Glue can directly connect to the database via JDBC, extract the data, transform it into Parquet format, and store it in S3 without additional steps. A crawler is useful if you're working with data that doesn't have a predefined schema (e.g., files in S3) or if you need the data to be cataloged for services like Amazon Athena. However, for this ETL flow, using just a Glue Job simplifies the process and reduces operational complexity.
upvoted 2 times
...
michele_scar
3 months, 1 week ago
Selected Answer: A
Glue crawler is useless because the schema is already in place with a SQL database
upvoted 1 times
michele_scar
3 months, 1 week ago
My fault: is correct A because during a migration process is more efficiently have a crawler that should catch eventually changes of a schema.
upvoted 1 times
michele_scar
3 months, 1 week ago
My fault: is correct C because during a migration process is more efficiently have a crawler that should catch eventually changes of a schema.
upvoted 1 times
...
...
...
leonardoFelipe
3 months, 1 week ago
Selected Answer: A
Usually, views aren't true objects in a SGBD, they're just a "nickname" for a specific query string, different of Materialized Views. So, my questions is: can glue crawler understand their metadata? I'd go with A
upvoted 3 times
...
bakarys
7 months, 2 weeks ago
Selected Answer: A
Option A involves creating a view in the EC2 instance-based SQL Server databases that contains the required data elements. An AWS Glue job is then created to select the data directly from the view and transfer the data in Parquet format to an S3 bucket. This job is scheduled to run every day. This approach is operationally efficient as it leverages managed services (AWS Glue) and does not require additional transformation steps. Option D involves creating an AWS Lambda function that queries the EC2 instance-based databases using JDBC. The Lambda function is configured to retrieve the required data, transform the data into Parquet format, and transfer the data into an S3 bucket. This approach could work, but managing and scheduling Lambda functions could add operational overhead compared to using managed services like AWS Glue.
upvoted 3 times
...
GiorgioGss
11 months ago
Selected Answer: C
Just beacuse it decouples the whole architecture I will go with C
upvoted 2 times
...
Felix_G
11 months, 2 weeks ago
Option C seems to be the most operationally efficient: It leverages Glue for both schema discovery (via the crawler) and data transfer (via the Glue job). The Glue job can directly handle the Parquet format conversion. Scheduling the Glue job ensures regular data export without manual intervention.
upvoted 1 times
helpaws
11 months ago
you're right: https://aws.amazon.com/blogs/big-data/extracting-multidimensional-data-from-microsoft-sql-server-analysis-services-using-aws-glue/
upvoted 1 times
taka5094
11 months ago
Is this right? https://aws.amazon.com/jp/blogs/big-data/extracting-multidimensional-data-from-microsoft-sql-server-analysis-services-using-aws-glue/
upvoted 1 times
...
...
...
rralucard_
1 year ago
Selected Answer: A
Option A (Creating a view in the EC2 instance-based SQL Server databases and creating an AWS Glue job that selects data from the view, transfers it in Parquet format to S3, and schedules the job to run every day) seems to be the most operationally efficient solution. It leverages AWS Glue’s ETL capabilities for direct data extraction and transformation, minimizes manual steps, and effectively automates the process.
upvoted 3 times
...
evntdrvn76
1 year ago
A. Create a view in the EC2 instance-based SQL Server databases that contains the required data elements. Create an AWS Glue job that selects the data directly from the view and transfers the data in Parquet format to an S3 bucket. Schedule the AWS Glue job to run every day. This solution is operationally efficient for exporting data in the required format.
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago