exam questions

Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 1 question 20 discussion

Actual exam question from Microsoft's DP-203
Question #: 20
Topic #: 1
[All DP-203 Questions]

You are designing a fact table named FactPurchase in an Azure Synapse Analytics dedicated SQL pool. The table contains purchases from suppliers for a retail store. FactPurchase will contain the following columns.

FactPurchase will have 1 million rows of data added daily and will contain three years of data.
Transact-SQL queries similar to the following query will be executed daily.

SELECT -
SupplierKey, StockItemKey, IsOrderFinalized, COUNT(*)

FROM FactPurchase -

WHERE DateKey >= 20210101 -

AND DateKey <= 20210131 -
GROUP By SupplierKey, StockItemKey, IsOrderFinalized
Which table distribution will minimize query times?

  • A. replicated
  • B. hash-distributed on PurchaseKey
  • C. round-robin
  • D. hash-distributed on IsOrderFinalized
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
FredNo
Highly Voted 3 years, 5 months ago
Selected Answer: B
Correct
upvoted 47 times
Deepshikha1228
2 years, 9 months ago
B is correct
upvoted 2 times
...
...
GameLift
Highly Voted 3 years, 7 months ago
Is it hash-distributed on PurchaseKey and not on IsOrderFinalized because 'IsOrderFinalized' yields less distributions(rows either contain yes,no values) compared to PurchaseKey?
upvoted 31 times
Podavenna
3 years, 7 months ago
Yes, your logic is correct!
upvoted 9 times
...
saqib839
1 year, 2 months ago
Plus its better to use hash distribution on column where group by or joins are used
upvoted 1 times
...
...
adaod14
Most Recent 1 month, 3 weeks ago
Selected Answer: B
Correct Answer: B. Hash-Distributed on PurchaseKey 💡 Explanation: Azure Synapse Analytics supports three table distribution methods: Replicated, Hash-Distributed, and Round-Robin. The best choice depends on the workload and query patterns. 1️⃣ Why Use Hash Distribution? The FactPurchase table will have 1 million rows added daily and contain three years of data → This results in over 1 billion rows, making hash distribution the best option for efficient querying. Queries GROUP BY SupplierKey, StockItemKey, and IsOrderFinalized → A hash-distributed table will ensure these columns are evenly distributed across compute nodes, reducing data movement and improving query performance. 2️⃣ Why Hash on PurchaseKey? PurchaseKey is the primary key (unique for each purchase). Ideal for distributing data evenly across compute nodes, avoiding data skew. Improves aggregation queries where GROUP BY is used on multiple dimensions.
upvoted 4 times
...
RAG11
2 months ago
Selected Answer: D
D, Since the query doesn't reference PurchaseKey for filtering or grouping, hash distribution on PurchaseKey doesn't help with optimizing the query and would not be the best choice.
upvoted 1 times
...
thanglai
3 months, 2 weeks ago
Hash-Distributed: For large fact tables where joins or aggregations are common
upvoted 2 times
...
EmnCours
4 months, 4 weeks ago
Selected Answer: B
Correct Answer: B
upvoted 1 times
...
Okkier
9 months, 3 weeks ago
Selected Answer: C
Optimal Distribution Given that the query performs a GROUP BY on SupplierKey, StockItemKey, and IsOrderFinalized, the most balanced approach is to use Round-robin distribution. While it does not ensure that rows with the same key are stored together, it avoids data skew and ensures even distribution, which helps in achieving better performance for aggregate queries.
upvoted 3 times
JustImperius
3 months, 1 week ago
I hear where you are coming from. The fact that the PurchaseKey is not used in the query also made me think this for a sec. So even when using the hash the situation becomes like a round-robin distribution in terms of data access in this specific query. But generally hash will be more efficient because of the predictable hash distribution structure, which is better than a completely random approach. Seeing as we are using the purchaskey which i assume is unique, we don't need to worry about skew at all. So I Still think B is the correct choice here.
upvoted 1 times
...
hypersam
3 months, 3 weeks ago
hash distribution on high cardinatlity column can also avoid data skew
upvoted 1 times
...
...
MBRSDG
1 year ago
Selected Answer: B
almost exactly what's shown in a example of the official docs --> https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute#choose-a-distribution-column
upvoted 2 times
...
sdg2844
1 year, 3 months ago
Selected Answer: B
Correct. Column with many unique values. Also, it's USUALLY not a column that is used in whereclauses or groupings or such, which this isn't.
upvoted 1 times
...
pperf
1 year, 6 months ago
Selected Answer: B
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute#choosing-a-distribution-column
upvoted 1 times
...
jiajiani
1 year, 7 months ago
why the answer says it cannot be a date column?
upvoted 1 times
...
74gjd_37
1 year, 7 months ago
Selected Answer: B
Hash-distributed tables improve query performance on large fact tables. The PurchaseKey has many unique values, does not have NULLs and is not a date column.
upvoted 2 times
jiajiani
1 year, 7 months ago
why we cannot use data column?
upvoted 3 times
...
...
kkk5566
1 year, 7 months ago
Selected Answer: B
B is correct
upvoted 1 times
...
SolutionA
1 year, 8 months ago
in this case the sql where condition is on datekey so hash-distributed on PurchaseKey or Round robin distributed table the sql cost will be the same as it will be full table scan
upvoted 1 times
SolutionA
1 year, 8 months ago
on second thought if purchasekey is not unique what is the constraint and how its created , as the question didn't mention more details , i would go with round robin not the has distributed
upvoted 3 times
...
...
mamahani
1 year, 11 months ago
Selected Answer: B
B is correct
upvoted 1 times
...
henryphchan
1 year, 11 months ago
Selected Answer: B
B. Hash the purchasekey to evenly distribute the data into 60 distributions.
upvoted 1 times
...
SHENOOOO
2 years, 2 months ago
Selected Answer: B
B is the Correct Answer
upvoted 3 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago