Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 4 question 12 discussion

Actual exam question from Microsoft's DP-203

Question #: 12
Topic #: 4

You are designing an inventory updates table in an Azure Synapse Analytics dedicated SQL pool. The table will have a clustered columnstore index and will include the following columns:

You identify the following usage patterns:
✑ Analysts will most commonly analyze transactions for a warehouse.
✑ Queries will summarize by product category type, date, and/or inventory event type.
You need to recommend a partition strategy for the table to minimize query times.
On which column should you partition the table?

A. EventTypeID
B. ProductCategoryTypeID
C. EventDate
D. WarehouseID

Show Suggested Answer

Suggested Answer: D 🗳️

by rav009 at Sept. 19, 2021, 3:26 p.m.

Comments

Submit Cancel

Lio95

Highly Voted 3 years, 6 months ago

It is recommended to have at least 1 million rows per partition and distribution. Since there are 60 distributions, the number of rows for each partition must exceed 60 millions. Answer is correct

upvoted 27 times

LiamRT

3 years, 4 months ago

Partitioning by EventDate does nott mean a partition for each day. Partitioning by quarter years would be effective.

upvoted 2 times

MBRSDG

1 year ago

Perfectly agree! This was exactly my way of reasoning.

upvoted 1 times

...

yassine70

3 years, 6 months ago

I fully Agree! Answer is correct Link below :https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-partition "When creating partitions on clustered columnstore tables, it is important to consider how many rows belong to each partition. For optimal compression and performance of clustered columnstore tables, a minimum of 1 million rows per distribution and partition is needed. Before partitions are created, dedicated SQL pool already divides each table into 60 distributed databases. Any partitioning added to a table is in addition to the distributions created behind the scenes. Using this example, if the sales fact table contained 36 monthly partitions, and given that a dedicated SQL pool has 60 distributions, then the sales fact table should contain 60 million rows per month, or 2.1 billion rows when all months are populated. If a table contains fewer than the recommended minimum number of rows per partition, consider using fewer partitions in order to increase the number of rows per partition."

upvoted 7 times

...

Canary_2021

Highly Voted 3 years, 3 months ago

Selected Answer: D

D is the correct answert. Analysts will most commonly analyze transactions for a warehouse. This mean that warehouseID is always in where clause. Partition filed should in where clause to improve query performace.

upvoted 18 times

Canary_2021

3 years, 3 months ago

https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/cheat-sheet

upvoted 1 times

Matt2000

1 year, 8 months ago

However the cheat sheet says: "In 99 percent of cases, the partition key should be based on date"

upvoted 1 times

dakku987

1 year, 3 months ago

but only when there is more than 1 billion records "You might partition your table when you have a large fact table (greater than 1 billion rows). In 99 percent of cases, the partition key should be based on date."

upvoted 1 times

...

renan_ineu

Most Recent 7 months ago

Selected Answer: C

There are partitions and distribution. The question is about partition, but let's remember that if we consider distribution, WarehouseID would be the best option. It should not be used for partitioning, so my answer does not go here. As documents say (and ChatGPT, and people in other answers), partitioning by date is the way to go in almost all cases. It requires at least 1M records per partition and the question says we have 1M daily rows. Even if we don't want to partition up to the day, we can partition up to the quarter or even the year, if needed.

upvoted 2 times

...

kkk5566

1 year, 7 months ago

Selected Answer: D

is correct

upvoted 2 times

...

Karl_Cen

2 years, 2 months ago

Selected Answer: C

I don't think the answer is right, the answer should be C, EventDate . The total row number in this inventory updates table is determined before it’s created. And here the question is asking us to chose the partition column, not distribution column.

upvoted 2 times

...

dmitriypo

2 years, 5 months ago

Selected Answer: C

I would go for a date column since positions are most often created for a date column

upvoted 2 times

dmitriypo

2 years, 5 months ago

Forget it. I agree with the provided answer D

upvoted 2 times

...

dom271219

2 years, 7 months ago

Selected Answer: D

Tables ? These are the columns, aren't they ?

upvoted 1 times

...

Deeksha1234

2 years, 8 months ago

D is right

upvoted 2 times

...

nefarious_smalls

2 years, 10 months ago

Selected Answer: C

I will go C. We are querying about warehouses. Therefore I think the distribution column would have to be warehouse. If not then we would most likely have to do a shuffle to aggregate all the transactions for the same warehouse which would be spread out amongst the 60 distibutions.

upvoted 1 times

Aditya0891

2 years, 10 months ago

It's about partition not distribution. Read the question carefully first

upvoted 2 times

...

Dizzystar

3 years, 5 months ago

I agree on date column. "In most cases, table partitions are created on a date column." https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-partition

upvoted 1 times

ploer

3 years, 2 months ago

But only in most cases. In most cases old data is not needed so date column often shows up in the where clause. This is why partitioning often makes sense on date columns. In this case the "Analysts will most commonly analyze transactions for a warehouse", so WarehouseID will be in the where clause and therefore we should partition on this column.

upvoted 2 times

...

sreejani

3 years, 6 months ago

Aren't partition supposed to be done on columns of group by?. So here it's product type on which analysts summarise.so partition should be on productype

upvoted 2 times

Samanda

3 years, 5 months ago

are you thinking of hash distributions instead of partitions?

upvoted 5 times

...

rikku33

3 years, 6 months ago

For effective partitions its good to have one million rows per partitions for an ideal optimized scenario. This is also mentioned in the Microsoft documentation. C

upvoted 2 times

Samanda

3 years, 5 months ago

You don't have to put each warehouse into it's own partition though so the sizing argument doesnt make sense....Answer is D as you will benefit from partition elimination when you use the warehouseID in the where clause

upvoted 2 times

...

sachabess79

3 years, 6 months ago

WHERE is applied on the WarehouseID, so D

upvoted 6 times

YipingRuan

3 years, 6 months ago

Nope, don't use WHERE

upvoted 2 times

mbl

3 years, 5 months ago

it does : "Analysts will most commonly analyze transactions for a warehouse"

upvoted 3 times

...

AppleVan

3 years, 6 months ago

I think it faster to go by date (C).....Otherwise, the query time will be extremely long since it has wrangled here and there...

upvoted 2 times

...

Amalbenrebai

3 years, 7 months ago

can someone confirm this ?

upvoted 1 times

Samanda

3 years, 5 months ago

It's 100% D

upvoted 3 times

...

rav009

3 years, 7 months ago

I will go C

upvoted 3 times

...

Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 4 question 12 discussion

Comments

Lio95

LiamRT

MBRSDG

yassine70

Canary_2021

Canary_2021

Matt2000

dakku987

renan_ineu

kkk5566

Karl_Cen

dmitriypo

dmitriypo

dom271219

Deeksha1234

nefarious_smalls

Aditya0891

Dizzystar

ploer

sreejani

Samanda

rikku33

Samanda

sachabess79

YipingRuan

mbl

AppleVan

Amalbenrebai

Samanda

rav009

SY0-701