Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 1 question 62 discussion

Actual exam question from Microsoft's DP-203

Question #: 62
Topic #: 1

You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Table1. Table1 contains the following:
✑ One billion rows
✑ A clustered columnstore index
✑ A hash-distributed column named Product Key
✑ A column named Sales Date that is of the date data type and cannot be null
Thirty million rows will be added to Table1 each month.
You need to partition Table1 based on the Sales Date column. The solution must optimize query performance and data loading.
How often should you create a partition?

A. once per month
B. once per year
C. once per day
D. once per week

Show Suggested Answer

Suggested Answer: A 🗳️

by anks84 at Sept. 6, 2022, 3:43 a.m.

Comments

Submit Cancel

vrodriguesp

Highly Voted 2 years, 4 months ago

Remembering that we have data splitted in distribution (60 nodes) and considering that we Need a MINMIUM 1 million rows per distribution, we have: A. once per month = 30 milion / 60 = 500k record per partition B. once per year = 360 milion / 60 = 6 milion record per partition C. once per day = about 1 milion / 60 = 16k record per partition D. once per week =about 7.5 milion / 60 = 125k record per partition correct should be B

upvoted 99 times

vrodriguesp

2 years, 2 months ago

actually to be more accurate, I should have written record per distribution. We have 1 milion rows per distribution and 60 milion rows per partition..

upvoted 2 times

...

yogiazaad

2 years, 2 months ago

I think you left out the fact that the table already has 1 billion records. This will change your calculations.

upvoted 9 times

CCCool77

2 years, 2 months ago

You should consider the partition, not the table

upvoted 5 times

hypersam

2 months, 3 weeks ago

CCI will be applied on rows within each new partition, so it doesn't matter how many rows in other partitions

upvoted 1 times

...

d47320d

11 months, 2 weeks ago

no, we should consider everything including the table in order to come up with the most efficient solution

upvoted 1 times

...

Load full discussion...

...

hiyoww

1 year, 8 months ago

I think you mix up the concept of distribution and partition, we always do partitions with date, distribution with Product Key. I think you over think, no need to think about the calculation

upvoted 1 times

hiyoww

1 year, 8 months ago

sorry may be you are right

upvoted 1 times

Gcplearner8888

1 year, 8 months ago

I have contributor access which is purchased for $47.99 but no downloadable PDF with questions and explanations received yet.

upvoted 3 times

Paulkuzzio

1 year, 6 months ago

Contact their Customer Care for answer with your complain and they will respond to you.

upvoted 1 times

...

gggmaaster

1 year, 6 months ago

Following this logic. Although A is less than 1m requirement, but it is the closest on to 1m. B met the requirement, but is is way too big hence loading to memory is slower than 500k one, already A may result in higher number of partitions, which is larger storage space.

upvoted 3 times

...

anks84

Highly Voted 2 years, 7 months ago

Correct, Considering the high volume of data, for faster queries its recommended to create fewer partitions. " If a table contains fewer than the recommended minimum number of rows per partition(i.e. 60 million rows per month for 60 distributed partitions, consider using fewer partitions in order to increase the number of rows per partition."

upvoted 11 times

...

ShdwZephyr

Most Recent 2 months ago

Selected Answer: A

I would say its option A. - Efficiently load new data (in this case, 30 million new rows each month). - Improve query performance for time-based queries.

upvoted 1 times

...

prem__raj

2 months, 3 weeks ago

Selected Answer: A

I think it should be by month. There are 12 months in a year. The table already has 1 billion rows, we have 60 distribution nodes. So 1 billion/60 = 16.6 Million approx per node. We get 30 million data per month, we distribute it to 60 nodes, that is 30 Million/60 = 50,000 rows per node. Now we have 16.6 million + 50,000 rows per node that’s 16,650,00 rows. Now if we partition by month that’s 16,650,00/12 we’ll have close to 138750 which is 1.3 million. And the optimal size per partition is close to 1 million. Therefore A

upvoted 1 times

...

abhi_11kr1

4 months, 3 weeks ago

Selected Answer: A

Creating a partition once per month strikes the right balance between performance, manageability, and data loading efficiency

upvoted 1 times

...

mavdplas

5 months, 2 weeks ago

Selected Answer: A

I think the loading performance over here is the determining factor. With a partition by month the date can be quickly swapped into the table with a partition swap. https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-partition

upvoted 1 times

...

renan_ineu

8 months ago

The table currently has 1 billion rows. With 60 partitions, this means there are approximately 16,666,666 rows per partition. Each month, an additional 30 million rows will be added. After the first month, the table will have 1.03 billion rows, resulting in approximately 17,166,666 rows per partition. I recommend going with Option A: partitioning once per month.

upvoted 1 times

hypersam

2 months, 3 weeks ago

wrong. New data will be saved in a new partition, not distributed across all previous partitions. so it doesn't matter how much data in previous partitions

upvoted 1 times

prem__raj

2 months, 3 weeks ago

Data will be partitioned only after it's distributed. The 30 million would be distributed among the 60 nodes and then partitioned in each node inside.

upvoted 1 times

...

d47320d

11 months, 2 weeks ago

We have to consider all information given to us about Table1, including existing data, in order to optimize query performance and data loading for the specific table. So, a year after Table1 will contain 1360M i.e. around 22.6M per distribution having specified one partition per year. Instead, if we specify one partition by month, then we will have around 1.8M per distribution and partition. Typical analytical queries involve filter predicates for a specific month. Isn't it more efficient to process 1.8M instead of 22.6M ? (optimize query performance). Loading data to a new empty or smaller partition isn't it faster than loading it to one with pre-existing data or larger partition (optimize data loading)? The above leads us to "A. once per month".

upvoted 1 times

...

Alongi

1 year ago

Selected Answer: B

It's B for that reason: 12month * 30 mln / 60 nodes = 6 mln record per partition

upvoted 1 times

d47320d

11 months, 2 weeks ago

you are totally ignoring existing table data

upvoted 1 times

...

dgerok

1 year ago

Selected Answer: B

B) once per year, because your partition should contain more than 60 mln rows (1 mln for each of 60 nodes)

upvoted 2 times

...

dgerok

1 year ago

Selected Answer: B

B. once per year = 360 milion / 60 = 6 milion record per partition

upvoted 2 times

...

[Removed]

1 year, 1 month ago

Selected Answer: A

It is about partition, each distribution has several partitions.

upvoted 2 times

...

mav2000

1 year, 1 month ago

Selected Answer: A

A partition is divided into distribution units. A) eleven per month => 30 million / 60 = 500k rows per partition B) eleven per year => 360 million / 60 = 6 million rows per partition C) eleven per day => 1 million / 60 = 15k rows per partition D) eleven per week => 7.5~ million/60 = 125k rows per partition But since I already have 1 billion rows distributed, each distribution node would have 16.7 million rows, fulfilling more than the minimum, since I am interested in having a certain reasonable number of partitions to increase the speed of the queries, I would choose once per month, because once per year would create very few partitions. Therefore the answer is A)

upvoted 4 times

SajadAhm

8 months, 2 weeks ago

you'll have new partitions whenever you update the table's partitions. if you create a new partition every month, this new partition will have only 500k rows per distribution. I agree that the previous partitions have more than enough records, but the problem is with the new partition.

upvoted 2 times

...

rocky48

1 year, 2 months ago

Selected Answer: A

Given that you’ll be adding 30 million rows per month, consider partitioning once per month based on the Sales Date column. This approach balances data maintenance and query performance while avoiding excessive partitioning. Therefore, the correct answer is A. once per month

upvoted 2 times

...

rocky48

1 year, 2 months ago

To optimize query performance and data loading for your Azure Synapse Analytics dedicated SQL pool table, you should create a partition based on the Sales Date column. Partitioning Frequency: The decision on how often to create a partition depends on the data volume and your specific use case. Minimum Rows per Partition: For optimal compression and performance of clustered columnstore tables, it’s recommended to have a minimum of 1 million rows per distribution and partition. Automatic Distribution: Dedicated SQL pools already divide each table into 60 distributed databases. Recommendation: Given that you’ll be adding 30 million rows per month, consider partitioning once per month based on the Sales Date column. This approach balances data maintenance and query performance while avoiding excessive partitioning. Therefore, the correct answer is A. once per month

upvoted 1 times

...

moneytime

1 year, 2 months ago

I chose D Reason: For optimal performance and compression the followings are true; The minimum record to be load is 60M permonth. This implies that for ; *Per week = 60/4= 15M records *Pe rday = 15/7 =2.1M records *Per year= 12 *60M = 740M and so on for hours ,minutes ,sec. in the right proportions. Considering ,the case at hand. Adding 30M rows monthly will have negative impact on the performance and compression of the table .The reason is that the quantity of records to load is less than the required minimum value of 60M for monthly partition. In other to correct this ,30M minimum rows show be added every 2 weeks(not permonth) or 10M minimum rows per week. So ,the most reasonable advice is to load the table perweek for optimal performance and compression.

upvoted 1 times

d47320d

11 months, 2 weeks ago

Your calculation logic is incorrect. A year after Table1 will contain 1360M i.e. around 22.6M per distribution having specified one partition per year. Instead, if we specify one partition by week, then we will have around 0.4M per distribution and partition, which is below than 1M Microsoft recommendation. This makes your choice wrong.

upvoted 2 times

...

Charley92

1 year, 3 months ago

To partition Table1 based on the Sales Date column, you can create a partition for each month. Since thirty million rows will be added to Table1 each month, creating a partition once per month would be the most optimal solution for query performance and data loading 1. This way, the data can be easily queried and loaded, and the partitioning will not be too granular or too coarse 1. Therefore, the answer is A. once per month.

upvoted 1 times

...

Load full discussion...

Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 1 question 62 discussion

Comments

vrodriguesp

vrodriguesp

yogiazaad

CCCool77

hypersam

d47320d

hiyoww

hiyoww

Gcplearner8888

Paulkuzzio

gggmaaster

anks84

ShdwZephyr

prem__raj

abhi_11kr1

mavdplas

renan_ineu

hypersam

prem__raj

d47320d

Alongi

d47320d

dgerok

dgerok

[Removed]

mav2000

SajadAhm

rocky48

rocky48

moneytime

d47320d

Charley92

SY0-701