exam questions

Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 4 question 12 discussion

Actual exam question from Microsoft's DP-203
Question #: 12
Topic #: 4
[All DP-203 Questions]

You are designing an inventory updates table in an Azure Synapse Analytics dedicated SQL pool. The table will have a clustered columnstore index and will include the following columns:

You identify the following usage patterns:
✑ Analysts will most commonly analyze transactions for a warehouse.
✑ Queries will summarize by product category type, date, and/or inventory event type.
You need to recommend a partition strategy for the table to minimize query times.
On which column should you partition the table?

  • A. EventTypeID
  • B. ProductCategoryTypeID
  • C. EventDate
  • D. WarehouseID
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Lio95
Highly Voted 3 years, 6 months ago
It is recommended to have at least 1 million rows per partition and distribution. Since there are 60 distributions, the number of rows for each partition must exceed 60 millions. Answer is correct
upvoted 27 times
LiamRT
3 years, 4 months ago
Partitioning by EventDate does nott mean a partition for each day. Partitioning by quarter years would be effective.
upvoted 2 times
MBRSDG
1 year ago
Perfectly agree! This was exactly my way of reasoning.
upvoted 1 times
...
...
yassine70
3 years, 6 months ago
I fully Agree! Answer is correct Link below :https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-partition "When creating partitions on clustered columnstore tables, it is important to consider how many rows belong to each partition. For optimal compression and performance of clustered columnstore tables, a minimum of 1 million rows per distribution and partition is needed. Before partitions are created, dedicated SQL pool already divides each table into 60 distributed databases. Any partitioning added to a table is in addition to the distributions created behind the scenes. Using this example, if the sales fact table contained 36 monthly partitions, and given that a dedicated SQL pool has 60 distributions, then the sales fact table should contain 60 million rows per month, or 2.1 billion rows when all months are populated. If a table contains fewer than the recommended minimum number of rows per partition, consider using fewer partitions in order to increase the number of rows per partition."
upvoted 7 times
...
...
Canary_2021
Highly Voted 3 years, 3 months ago
Selected Answer: D
D is the correct answert. Analysts will most commonly analyze transactions for a warehouse. This mean that warehouseID is always in where clause. Partition filed should in where clause to improve query performace.
upvoted 18 times
Canary_2021
3 years, 3 months ago
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/cheat-sheet
upvoted 1 times
Matt2000
1 year, 8 months ago
However the cheat sheet says: "In 99 percent of cases, the partition key should be based on date"
upvoted 1 times
dakku987
1 year, 3 months ago
but only when there is more than 1 billion records "You might partition your table when you have a large fact table (greater than 1 billion rows). In 99 percent of cases, the partition key should be based on date."
upvoted 1 times
...
...
...
...
renan_ineu
Most Recent 7 months ago
Selected Answer: C
There are partitions and distribution. The question is about partition, but let's remember that if we consider distribution, WarehouseID would be the best option. It should not be used for partitioning, so my answer does not go here. As documents say (and ChatGPT, and people in other answers), partitioning by date is the way to go in almost all cases. It requires at least 1M records per partition and the question says we have 1M daily rows. Even if we don't want to partition up to the day, we can partition up to the quarter or even the year, if needed.
upvoted 2 times
...
kkk5566
1 year, 7 months ago
Selected Answer: D
is correct
upvoted 2 times
...
Karl_Cen
2 years, 2 months ago
Selected Answer: C
I don't think the answer is right, the answer should be C, EventDate . The total row number in this inventory updates table is determined before it’s created. And here the question is asking us to chose the partition column, not distribution column.
upvoted 2 times
...
dmitriypo
2 years, 5 months ago
Selected Answer: C
I would go for a date column since positions are most often created for a date column
upvoted 2 times
dmitriypo
2 years, 5 months ago
Forget it. I agree with the provided answer D
upvoted 2 times
...
...
dom271219
2 years, 7 months ago
Selected Answer: D
Tables ? These are the columns, aren't they ?
upvoted 1 times
...
Deeksha1234
2 years, 8 months ago
D is right
upvoted 2 times
...
nefarious_smalls
2 years, 10 months ago
Selected Answer: C
I will go C. We are querying about warehouses. Therefore I think the distribution column would have to be warehouse. If not then we would most likely have to do a shuffle to aggregate all the transactions for the same warehouse which would be spread out amongst the 60 distibutions.
upvoted 1 times
Aditya0891
2 years, 10 months ago
It's about partition not distribution. Read the question carefully first
upvoted 2 times
...
...
Dizzystar
3 years, 5 months ago
I agree on date column. "In most cases, table partitions are created on a date column." https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-partition
upvoted 1 times
ploer
3 years, 2 months ago
But only in most cases. In most cases old data is not needed so date column often shows up in the where clause. This is why partitioning often makes sense on date columns. In this case the "Analysts will most commonly analyze transactions for a warehouse", so WarehouseID will be in the where clause and therefore we should partition on this column.
upvoted 2 times
...
...
sreejani
3 years, 6 months ago
Aren't partition supposed to be done on columns of group by?. So here it's product type on which analysts summarise.so partition should be on productype
upvoted 2 times
Samanda
3 years, 5 months ago
are you thinking of hash distributions instead of partitions?
upvoted 5 times
...
...
rikku33
3 years, 6 months ago
For effective partitions its good to have one million rows per partitions for an ideal optimized scenario. This is also mentioned in the Microsoft documentation. C
upvoted 2 times
Samanda
3 years, 5 months ago
You don't have to put each warehouse into it's own partition though so the sizing argument doesnt make sense....Answer is D as you will benefit from partition elimination when you use the warehouseID in the where clause
upvoted 2 times
...
...
sachabess79
3 years, 6 months ago
WHERE is applied on the WarehouseID, so D
upvoted 6 times
YipingRuan
3 years, 6 months ago
Nope, don't use WHERE
upvoted 2 times
mbl
3 years, 5 months ago
it does : "Analysts will most commonly analyze transactions for a warehouse"
upvoted 3 times
...
...
...
AppleVan
3 years, 6 months ago
I think it faster to go by date (C).....Otherwise, the query time will be extremely long since it has wrangled here and there...
upvoted 2 times
...
Amalbenrebai
3 years, 7 months ago
can someone confirm this ?
upvoted 1 times
Samanda
3 years, 5 months ago
It's 100% D
upvoted 3 times
...
...
rav009
3 years, 7 months ago
I will go C
upvoted 3 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago