A data engineer must orchestrate a series of Amazon Athena queries that will run every day. Each query can run for more than 15 minutes. Which combination of steps will meet these requirements MOST cost-effectively? (Choose two.)
A.
Use an AWS Lambda function and the Athena Boto3 client start_query_execution API call to invoke the Athena queries programmatically.
B.
Create an AWS Step Functions workflow and add two states. Add the first state before the Lambda function. Configure the second state as a Wait state to periodically check whether the Athena query has finished using the Athena Boto3 get_query_execution API call. Configure the workflow to invoke the next query when the current query has finished running.
C.
Use an AWS Glue Python shell job and the Athena Boto3 client start_query_execution API call to invoke the Athena queries programmatically.
D.
Use an AWS Glue Python shell script to run a sleep timer that checks every 5 minutes to determine whether the current Athena query has finished running successfully. Configure the Python shell script to invoke the next query when the current query has finished running.
E.
Use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to orchestrate the Athena queries in AWS Batch.
AWS Lambda can be effectively used to trigger Athena queries. By using the start_query_execution API from the Athena Boto3 client, you can programmatically start Athena queries. Lambda functions are cost-effective as they charge based on the compute time used, and there's no charge when the code is not running. However, Lambda has a maximum execution timeout of 15 minutes, which means it's not suitable for long-running operations but can be used to trigger or start queries.
AWS Step Functions can orchestrate multiple AWS services in workflows. By using a Wait state, the workflow can periodically check the status of the Athena query, and proceed to the next step once the query is complete. This approach is more scalable and reliable compared to continuously running a Lambda function, as Step Functions can handle long-running processes better and can maintain the state of each step in the workflow.
B - because
https://docs.aws.amazon.com/step-functions/latest/dg/sample-athena-query.html
E - because
https://aws.amazon.com/blogs/big-data/orchestrate-amazon-emr-serverless-spark-jobs-with-amazon-mwaa-and-data-validation-using-amazon-athena/
After real-world testing, A is a valid answer. This is because the Lambda only sends the API request to Athena, which runs the query. Even if the Lambda times out, the query result is still stored in the designated S3 bucket.
Why?
B (Step Functions): Step Functions are ideal for orchestrating long-running workflows, including polling the Athena query status and invoking the next query when ready.
A (Lambda): Lambda is used to programmatically trigger Athena queries within Step Functions, despite its 15-minute limitation, because Step Functions can manage the long runtime using Wait states.
Why Not C, D, or E?
C and D involve Glue, which is better suited for ETL jobs than orchestration, making them less efficient and cost-effective.
E (Amazon MWAA) introduces unnecessary cost and complexity for a straightforward workflow.
BC for me
A - lambda function will stop at 900s, so it will stop before query finishes(more than 15mins)
E - Airflow is way more complex and expensive than step function
AB - Because of the Lambda timeout
CE—is correct. The query will be executed by a glue job, which will be orchestrated by Airflow. The job will be scheduled using AWS Batch.
Not E because "You should use Step Functions if you prioritize cost and performance"
https://aws.amazon.com/managed-workflows-for-apache-airflow/faqs/
And also the fact that the queries take longer than 15 min can be handled with step functions, therefore AB
A.Why it's correct: AWS Lambda is a cost-effective, serverless option for invoking Athena queries using the Boto3 API. Lambda charges are based on execution time and memory usage, making it an efficient solution for periodic query execution.
B. Why it's correct: Step Functions provide a serverless orchestration option with a pay-per-use pricing model. Adding a Wait state prevents excessive API calls and ensures queries are executed in sequence, making it a cost-effective and scalable solution.
Why the other options are less optimal:
--
E. Use Amazon Managed Workflows for Apache Airflow (MWAA): MWAA is powerful for complex workflows, but its pricing includes environment uptime costs, which can be higher than Lambda and Step Functions for simple tasks like orchestrating Athena queries.
By choosing A and B, you balance cost-effectiveness and simplicity for orchestrating daily Athena queries.
Selected Answer: BD
Lambda maximum timeout is 15 minutes. So the query takes more than Lambda could manage. So you cannot use lambda. Use Step-Function (answer B) or glue python (answer D)
Airflow is more expensive than Glue/Step-Functions, so E is discarted also.
tricky, A is valid. Still, cost effective:
B no one doubt on it. then why E?
MWAA offers a managed Apache Airflow environment for orchestrating complex workflows.
It can handle long-running tasks like Athena queries efficiently.
Batch Processing: Leveraging AWS Batch within the Airflow workflow allows for distributed and scalable execution of the Athena queries, improving overall processing efficiency.
A could be not valid, as queries takes more than 15 minutes, and Lambda maximum timeout is 15 minutes. Lambda would be ended prior than the query is finished.
Option C and D involve using an AWS Glue Python shell script to run a sleep timer and periodically check whether the current Athena query has finished running. While this approach might seem cost-effective in terms of using AWS Glue, it's not the most efficient way to manage the execution of Athena queries. AWS Glue is primarily designed for ETL (Extract, Transform, Load) tasks rather than orchestrating long-running query execution.
Therefore, while both options B, C and D could technically work, they might not be the most cost-effective or efficient solutions for orchestrating long-running Athena queries. Instead, options A and E would likely be more cost-effective and suitable for this scenario.
Option B, utilizing AWS Step Functions, can be a cost-effective solution for orchestrating the execution of Athena queries, but it might not be the most cost-effective in this scenario because Step Functions are billed based on state transitions and the duration of state execution. Since each query can run for more than 15 minutes, using Step Functions to wait and periodically check the status of the queries could potentially result in higher costs, especially if the queries frequently take a long time to complete.
A voting comment increases the vote count for the chosen answer by one.
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one.
So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
rralucard_
Highly Voted 1 year, 2 months agoSan_Juan
8 months agoGiorgioGss
Highly Voted 1 year, 1 month agoSan_Juan
8 months agoDevoteamAnalytix
11 months, 2 weeks agosachin
8 months, 2 weeks agoEvan_Lin
Most Recent 2 months, 2 weeks agoUdyan
3 months, 2 weeks agohaby
4 months, 1 week agoaltonh
4 months, 3 weeks agoEleftheriia
4 months, 3 weeks agotruongnguyen86
5 months, 1 week agoSan_Juan
8 months agoV0811
8 months, 3 weeks agoalex1991
9 months, 4 weeks agopypelyncar
10 months, 2 weeks agoSan_Juan
8 months agoJoeAWSOCM
4 months, 3 weeks agovaluedate
11 months agovaluedate
11 months, 1 week agoVerRi
11 months, 1 week agosdas1
11 months, 3 weeks agosdas1
11 months, 3 weeks ago