exam questions

Exam AWS Certified Data Engineer - Associate DEA-C01 All Questions

View all questions & answers for the AWS Certified Data Engineer - Associate DEA-C01 exam

Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 19 discussion

A data engineer must orchestrate a series of Amazon Athena queries that will run every day. Each query can run for more than 15 minutes.
Which combination of steps will meet these requirements MOST cost-effectively? (Choose two.)

  • A. Use an AWS Lambda function and the Athena Boto3 client start_query_execution API call to invoke the Athena queries programmatically.
  • B. Create an AWS Step Functions workflow and add two states. Add the first state before the Lambda function. Configure the second state as a Wait state to periodically check whether the Athena query has finished using the Athena Boto3 get_query_execution API call. Configure the workflow to invoke the next query when the current query has finished running.
  • C. Use an AWS Glue Python shell job and the Athena Boto3 client start_query_execution API call to invoke the Athena queries programmatically.
  • D. Use an AWS Glue Python shell script to run a sleep timer that checks every 5 minutes to determine whether the current Athena query has finished running successfully. Configure the Python shell script to invoke the next query when the current query has finished running.
  • E. Use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to orchestrate the Athena queries in AWS Batch.
Show Suggested Answer Hide Answer
Suggested Answer: AB 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
rralucard_
Highly Voted 1 year, 2 months ago
Selected Answer: AB
AWS Lambda can be effectively used to trigger Athena queries. By using the start_query_execution API from the Athena Boto3 client, you can programmatically start Athena queries. Lambda functions are cost-effective as they charge based on the compute time used, and there's no charge when the code is not running. However, Lambda has a maximum execution timeout of 15 minutes, which means it's not suitable for long-running operations but can be used to trigger or start queries. AWS Step Functions can orchestrate multiple AWS services in workflows. By using a Wait state, the workflow can periodically check the status of the Athena query, and proceed to the next step once the query is complete. This approach is more scalable and reliable compared to continuously running a Lambda function, as Step Functions can handle long-running processes better and can maintain the state of each step in the workflow.
upvoted 10 times
San_Juan
8 months ago
Lambda max timeout is 15 minutes, and the query takes more than 15 minutes. So Lambda should be ended prior the Athena query.
upvoted 3 times
...
...
GiorgioGss
Highly Voted 1 year, 1 month ago
Selected Answer: BE
B - because https://docs.aws.amazon.com/step-functions/latest/dg/sample-athena-query.html E - because https://aws.amazon.com/blogs/big-data/orchestrate-amazon-emr-serverless-spark-jobs-with-amazon-mwaa-and-data-validation-using-amazon-athena/
upvoted 8 times
San_Juan
8 months ago
I discarted E because Airflow is more expensive than Glue/Step-Functions. So B (step-function) and D (glue python shell).
upvoted 2 times
...
DevoteamAnalytix
11 months, 2 weeks ago
The question is about a "combination of steps" - MWAA and Step Functions are different options, so I would prefer AB
upvoted 3 times
sachin
8 months, 2 weeks ago
BE is right. A is only giving option to envoke the athena query. how about the response. if the execution is beyong 15 mins
upvoted 1 times
...
...
...
Evan_Lin
Most Recent 2 months, 2 weeks ago
Selected Answer: AB
After real-world testing, A is a valid answer. This is because the Lambda only sends the API request to Athena, which runs the query. Even if the Lambda times out, the query result is still stored in the designated S3 bucket.
upvoted 2 times
...
Udyan
3 months, 2 weeks ago
Selected Answer: AB
Why? B (Step Functions): Step Functions are ideal for orchestrating long-running workflows, including polling the Athena query status and invoking the next query when ready. A (Lambda): Lambda is used to programmatically trigger Athena queries within Step Functions, despite its 15-minute limitation, because Step Functions can manage the long runtime using Wait states. Why Not C, D, or E? C and D involve Glue, which is better suited for ETL jobs than orchestration, making them less efficient and cost-effective. E (Amazon MWAA) introduces unnecessary cost and complexity for a straightforward workflow.
upvoted 2 times
...
haby
4 months, 1 week ago
Selected Answer: BC
BC for me A - lambda function will stop at 900s, so it will stop before query finishes(more than 15mins) E - Airflow is way more complex and expensive than step function
upvoted 4 times
...
altonh
4 months, 3 weeks ago
Selected Answer: CE
AB - Because of the Lambda timeout CE—is correct. The query will be executed by a glue job, which will be orchestrated by Airflow. The job will be scheduled using AWS Batch.
upvoted 1 times
...
Eleftheriia
4 months, 3 weeks ago
Selected Answer: AB
Not E because "You should use Step Functions if you prioritize cost and performance" https://aws.amazon.com/managed-workflows-for-apache-airflow/faqs/ And also the fact that the queries take longer than 15 min can be handled with step functions, therefore AB
upvoted 2 times
...
truongnguyen86
5 months, 1 week ago
A.Why it's correct: AWS Lambda is a cost-effective, serverless option for invoking Athena queries using the Boto3 API. Lambda charges are based on execution time and memory usage, making it an efficient solution for periodic query execution. B. Why it's correct: Step Functions provide a serverless orchestration option with a pay-per-use pricing model. Adding a Wait state prevents excessive API calls and ensures queries are executed in sequence, making it a cost-effective and scalable solution. Why the other options are less optimal: -- E. Use Amazon Managed Workflows for Apache Airflow (MWAA): MWAA is powerful for complex workflows, but its pricing includes environment uptime costs, which can be higher than Lambda and Step Functions for simple tasks like orchestrating Athena queries. By choosing A and B, you balance cost-effectiveness and simplicity for orchestrating daily Athena queries.
upvoted 1 times
...
San_Juan
8 months ago
Selected Answer: BD Lambda maximum timeout is 15 minutes. So the query takes more than Lambda could manage. So you cannot use lambda. Use Step-Function (answer B) or glue python (answer D) Airflow is more expensive than Glue/Step-Functions, so E is discarted also.
upvoted 2 times
...
V0811
8 months, 3 weeks ago
Selected Answer: AB
It should be AB
upvoted 2 times
...
alex1991
9 months, 4 weeks ago
Selected Answer: AB
Since the Athena API supports async/await, users are able to separate the steps into trigger queries and get results after 15 minutes.
upvoted 2 times
...
pypelyncar
10 months, 2 weeks ago
Selected Answer: BE
tricky, A is valid. Still, cost effective: B no one doubt on it. then why E? MWAA offers a managed Apache Airflow environment for orchestrating complex workflows. It can handle long-running tasks like Athena queries efficiently. Batch Processing: Leveraging AWS Batch within the Airflow workflow allows for distributed and scalable execution of the Athena queries, improving overall processing efficiency.
upvoted 1 times
San_Juan
8 months ago
A could be not valid, as queries takes more than 15 minutes, and Lambda maximum timeout is 15 minutes. Lambda would be ended prior than the query is finished.
upvoted 1 times
JoeAWSOCM
4 months, 3 weeks ago
Lambda is just for triggering the query. Its not waiting for the query to finish. The status of the query will be checked using Step functions.
upvoted 1 times
...
...
...
valuedate
11 months ago
Selected Answer: AB
my opinian
upvoted 2 times
...
valuedate
11 months, 1 week ago
Selected Answer: AB
I would prefer AB
upvoted 2 times
...
VerRi
11 months, 1 week ago
Selected Answer: AB
Lambda for kick start Athena Step Functions for orchestration
upvoted 3 times
...
sdas1
11 months, 3 weeks ago
Option C and D involve using an AWS Glue Python shell script to run a sleep timer and periodically check whether the current Athena query has finished running. While this approach might seem cost-effective in terms of using AWS Glue, it's not the most efficient way to manage the execution of Athena queries. AWS Glue is primarily designed for ETL (Extract, Transform, Load) tasks rather than orchestrating long-running query execution. Therefore, while both options B, C and D could technically work, they might not be the most cost-effective or efficient solutions for orchestrating long-running Athena queries. Instead, options A and E would likely be more cost-effective and suitable for this scenario.
upvoted 1 times
...
sdas1
11 months, 3 weeks ago
Option B, utilizing AWS Step Functions, can be a cost-effective solution for orchestrating the execution of Athena queries, but it might not be the most cost-effective in this scenario because Step Functions are billed based on state transitions and the duration of state execution. Since each query can run for more than 15 minutes, using Step Functions to wait and periodically check the status of the queries could potentially result in higher costs, especially if the queries frequently take a long time to complete.
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago