exam questions

Exam AWS Certified Data Engineer - Associate DEA-C01 All Questions

View all questions & answers for the AWS Certified Data Engineer - Associate DEA-C01 exam

Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 48 discussion

A data engineer is using Amazon Athena to analyze sales data that is in Amazon S3. The data engineer writes a query to retrieve sales amounts for 2023 for several products from a table named sales_data. However, the query does not return results for all of the products that are in the sales_data table. The data engineer needs to troubleshoot the query to resolve the issue.
The data engineer's original query is as follows:
SELECT product_name, sum(sales_amount)

FROM sales_data -

WHERE year = 2023 -

GROUP BY product_name -
How should the data engineer modify the Athena query to meet these requirements?

  • A. Replace sum(sales_amount) with count(*) for the aggregation.
  • B. Change WHERE year = 2023 to WHERE extract(year FROM sales_data) = 2023.
  • C. Add HAVING sum(sales_amount) > 0 after the GROUP BY clause.
  • D. Remove the GROUP BY clause.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
GiorgioGss
Highly Voted 10 months, 3 weeks ago
Selected Answer: B
"SELECT product_name, sum(sales_amount) FROM sales_data WHERE extract(year FROM sales_date) = 2023 GROUP BY product_name;" A. This would change the query to count the number of rows instead of summing sales. C. This would filter out products with zero sales amounts. D. Removing the GROUP BY clause would result in a single sum of all sales amounts without grouping by product_name.
upvoted 12 times
...
pikuantne
Highly Voted 3 months, 1 week ago
None of these options make sense. I think the question is worded incorrectly. I understand that the problem is supposed to be: the products that did not have any sales in 2023 should also be visible in the report with sum of sales_amount = 0. So, the WHERE condition should be deleted and replaced with a CASE WHEN. That way all of the products in the table will be visible, but only sales for 2023 will be summed. Which is what I think this question is asking. None of the provided options do that.
upvoted 6 times
...
YUICH
Most Recent 1 week, 1 day ago
Selected Answer: B
hy Option (B) Works If the underlying table field is a date or timestamp (rather than a numeric year column), using WHERE year = 2023 filters out all rows that do not literally match year = 2023. By using extract(year FROM sales_data) = 2023, you are correctly filtering rows whose date (or timestamp) in the sales_data column corresponds to the year 2023. Hence, (B) resolves the problem by filtering on the correct year value from the actual date/timestamp column, ensuring all qualifying products are included in the results.
upvoted 2 times
...
Udyan
3 weeks, 4 days ago
Selected Answer: C
The issue might be that some products have sales amounts of 0 or NULL, and those records are being excluded from the results because Athena may not include them in the final output when performing aggregation. By using the HAVING clause, you can filter the groups based on the aggregated sales amount (sum). This ensures that only products with a non-zero sum of sales are returned in the results. The HAVING clause is used to filter results after the aggregation.
upvoted 1 times
...
MLOPS_eng
1 month, 1 week ago
Selected Answer: C
The HAVING clause filters the results to include only products with an aggregated sales amount greater than zero.
upvoted 1 times
...
Assassin27
1 month, 1 week ago
Selected Answer: C
SELECT product_name, sum(sales_amount) FROM sales_data WHERE year = 2023 GROUP BY product_name HAVING sum(sales_amount) > 0 Explanation: The HAVING clause ensures that only products with a non-zero aggregated sales amount are included in the results. This will address cases where products exist in the table but have no sales data for 2023.
upvoted 1 times
...
kailu
1 month, 2 weeks ago
Selected Answer: C
There is no issue with the WHERE clause from the original query, so B is not the right option IMO.
upvoted 1 times
...
Shatheesh
4 months ago
C, query in the question is correct you just need to get amounts grater than Zero
upvoted 2 times
...
valuedate
8 months, 2 weeks ago
Selected Answer: B
year should be the partition in s3 so its necessary to extract. its not a column
upvoted 5 times
...
VerRi
8 months, 3 weeks ago
Selected Answer: C
No need to extract the year again
upvoted 2 times
...
Just_Ninja
8 months, 3 weeks ago
Selected Answer: C
https://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/sql-reference-having-clause.html
upvoted 1 times
...
Snape
9 months, 1 week ago
Selected Answer: C
Wrong answers A. Replace sum(sales_amount) with count(*) for the aggregation. This option will return the count of records for each product, not the sum of sales amounts, which is the desired result. B. Change WHERE year = 2023 to WHERE extract(year FROM sales_data) = 2023. The year column likely stores the year value directly, so there's no need to extract it from a date or timestamp column. D. Remove the GROUP BY clause. Removing the GROUP BY clause will cause an error because the sum(sales_amount) aggregation function requires a GROUP BY clause to specify the grouping column (product_name in this case).
upvoted 1 times
...
khchan123
9 months, 1 week ago
B B. Change `WHERE year = 2023` to `WHERE extract(year FROM sales_data) = 2023`. The issue with the original query is that it assumes there is a column named `year` in the `sales_data` table. However, it's more likely that the date or timestamp information is stored in a single column, for example, a column named `sales_date`. To extract the year from a date or timestamp column, you need to use the `extract()` function in Athena SQL.
upvoted 3 times
...
chris_spencer
9 months, 3 weeks ago
None of the answer makes senses. Option C will exclude any amount that is 0. This option would be correct if it is: Add HAVING sum(sales_amount) >= 0 after the GROUP BY clause.
upvoted 2 times
...
Christina666
9 months, 4 weeks ago
Selected Answer: C
Gemini: C. Add HAVING sum(sales_amount) > 0 after the GROUP BY clause. Zero Sales Products: The original query is likely missing products that had zero sales amount in 2023. This modification filters the grouped results, ensuring only products with positive sales are displayed. Why Other Options Don't Address the Core Issue: A. Replace sum(sales_amount) with count(*) for the aggregation. This would show how many sales transactions a product had, but not if it generated any revenue. It wouldn't solve the issue of missing products. B. Change WHERE year = 2023 to WHERE extract(year FROM sales_data) = 2023. This is functionally equivalent to the original WHERE clause if the year column is already an integer type. It wouldn't fix missing products. D. Remove the GROUP BY clause. This would aggregate all sales for 2023 with no product breakdown, losing the granularity needed.
upvoted 2 times
altonh
2 months ago
Why would the query miss out on products with zero sales when the condition is based on year?
upvoted 1 times
...
...
kj07
10 months, 4 weeks ago
Not A because the engineer wants a sum not the total count. Not C because it will filter out the data with sales_amount zero. Not D because it will return just one result and the engineer wants the sales for multiple products. B should be the right answer if the sales_data is a date field.
upvoted 4 times
DevoteamAnalytix
9 months ago
But this is the table name...
upvoted 1 times
...
Felix_G
10 months, 3 weeks ago
Add HAVING sum(sales_amount) > 0 this does NOT filter out the data with sales_mount zero. it can not be any negative value. The original query’s WHERE year = 2023 condition is already appropriate for filtering data by the year 2023, so that B is unnecessary.
upvoted 1 times
FuriouZ
10 months, 2 weeks ago
>0 filters out every product which is not sold. The question was about "some products are not displayed" so using the having argument can not be the right choice
upvoted 3 times
...
...
...
rralucard_
1 year ago
Selected Answer: C
https://docs.aws.amazon.com/athena/latest/ug/select.html
upvoted 2 times
nyaopoko
10 months, 1 week ago
answer is C!
upvoted 1 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago