exam questions

Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 1 question 62 discussion

Actual exam question from Microsoft's DP-203
Question #: 62
Topic #: 1
[All DP-203 Questions]

You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Table1. Table1 contains the following:
✑ One billion rows
✑ A clustered columnstore index
✑ A hash-distributed column named Product Key
✑ A column named Sales Date that is of the date data type and cannot be null
Thirty million rows will be added to Table1 each month.
You need to partition Table1 based on the Sales Date column. The solution must optimize query performance and data loading.
How often should you create a partition?

  • A. once per month
  • B. once per year
  • C. once per day
  • D. once per week
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
vrodriguesp
Highly Voted 2 years, 4 months ago
Remembering that we have data splitted in distribution (60 nodes) and considering that we Need a MINMIUM 1 million rows per distribution, we have: A. once per month = 30 milion / 60 = 500k record per partition B. once per year = 360 milion / 60 = 6 milion record per partition C. once per day = about 1 milion / 60 = 16k record per partition D. once per week =about 7.5 milion / 60 = 125k record per partition correct should be B
upvoted 99 times
vrodriguesp
2 years, 2 months ago
actually to be more accurate, I should have written record per distribution. We have 1 milion rows per distribution and 60 milion rows per partition..
upvoted 2 times
...
yogiazaad
2 years, 2 months ago
I think you left out the fact that the table already has 1 billion records. This will change your calculations.
upvoted 9 times
CCCool77
2 years, 2 months ago
You should consider the partition, not the table
upvoted 5 times
hypersam
2 months, 3 weeks ago
CCI will be applied on rows within each new partition, so it doesn't matter how many rows in other partitions
upvoted 1 times
...
d47320d
11 months, 2 weeks ago
no, we should consider everything including the table in order to come up with the most efficient solution
upvoted 1 times
...
...
...
hiyoww
1 year, 8 months ago
I think you mix up the concept of distribution and partition, we always do partitions with date, distribution with Product Key. I think you over think, no need to think about the calculation
upvoted 1 times
hiyoww
1 year, 8 months ago
sorry may be you are right
upvoted 1 times
Gcplearner8888
1 year, 8 months ago
I have contributor access which is purchased for $47.99 but no downloadable PDF with questions and explanations received yet.
upvoted 3 times
Paulkuzzio
1 year, 6 months ago
Contact their Customer Care for answer with your complain and they will respond to you.
upvoted 1 times
...
...
...
...
gggmaaster
1 year, 6 months ago
Following this logic. Although A is less than 1m requirement, but it is the closest on to 1m. B met the requirement, but is is way too big hence loading to memory is slower than 500k one, already A may result in higher number of partitions, which is larger storage space.
upvoted 3 times
...
...
anks84
Highly Voted 2 years, 7 months ago
Correct, Considering the high volume of data, for faster queries its recommended to create fewer partitions. " If a table contains fewer than the recommended minimum number of rows per partition(i.e. 60 million rows per month for 60 distributed partitions, consider using fewer partitions in order to increase the number of rows per partition."
upvoted 11 times
...
ShdwZephyr
Most Recent 2 months ago
Selected Answer: A
I would say its option A. - Efficiently load new data (in this case, 30 million new rows each month). - Improve query performance for time-based queries.
upvoted 1 times
...
prem__raj
2 months, 3 weeks ago
Selected Answer: A
I think it should be by month. There are 12 months in a year. The table already has 1 billion rows, we have 60 distribution nodes. So 1 billion/60 = 16.6 Million approx per node. We get 30 million data per month, we distribute it to 60 nodes, that is 30 Million/60 = 50,000 rows per node. Now we have 16.6 million + 50,000 rows per node that’s 16,650,00 rows. Now if we partition by month that’s 16,650,00/12 we’ll have close to 138750 which is 1.3 million. And the optimal size per partition is close to 1 million. Therefore A
upvoted 1 times
...
abhi_11kr1
4 months, 3 weeks ago
Selected Answer: A
Creating a partition once per month strikes the right balance between performance, manageability, and data loading efficiency
upvoted 1 times
...
mavdplas
5 months, 2 weeks ago
Selected Answer: A
I think the loading performance over here is the determining factor. With a partition by month the date can be quickly swapped into the table with a partition swap. https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-partition
upvoted 1 times
...
renan_ineu
8 months ago
The table currently has 1 billion rows. With 60 partitions, this means there are approximately 16,666,666 rows per partition. Each month, an additional 30 million rows will be added. After the first month, the table will have 1.03 billion rows, resulting in approximately 17,166,666 rows per partition. I recommend going with Option A: partitioning once per month.
upvoted 1 times
hypersam
2 months, 3 weeks ago
wrong. New data will be saved in a new partition, not distributed across all previous partitions. so it doesn't matter how much data in previous partitions
upvoted 1 times
prem__raj
2 months, 3 weeks ago
Data will be partitioned only after it's distributed. The 30 million would be distributed among the 60 nodes and then partitioned in each node inside.
upvoted 1 times
...
...
...
d47320d
11 months, 2 weeks ago
We have to consider all information given to us about Table1, including existing data, in order to optimize query performance and data loading for the specific table. So, a year after Table1 will contain 1360M i.e. around 22.6M per distribution having specified one partition per year. Instead, if we specify one partition by month, then we will have around 1.8M per distribution and partition. Typical analytical queries involve filter predicates for a specific month. Isn't it more efficient to process 1.8M instead of 22.6M ? (optimize query performance). Loading data to a new empty or smaller partition isn't it faster than loading it to one with pre-existing data or larger partition (optimize data loading)? The above leads us to "A. once per month".
upvoted 1 times
...
Alongi
1 year ago
Selected Answer: B
It's B for that reason: 12month * 30 mln / 60 nodes = 6 mln record per partition
upvoted 1 times
d47320d
11 months, 2 weeks ago
you are totally ignoring existing table data
upvoted 1 times
...
...
dgerok
1 year ago
Selected Answer: B
B) once per year, because your partition should contain more than 60 mln rows (1 mln for each of 60 nodes)
upvoted 2 times
...
dgerok
1 year ago
Selected Answer: B
B. once per year = 360 milion / 60 = 6 milion record per partition
upvoted 2 times
...
[Removed]
1 year, 1 month ago
Selected Answer: A
It is about partition, each distribution has several partitions.
upvoted 2 times
...
mav2000
1 year, 1 month ago
Selected Answer: A
A partition is divided into distribution units. A) eleven per month => 30 million / 60 = 500k rows per partition B) eleven per year => 360 million / 60 = 6 million rows per partition C) eleven per day => 1 million / 60 = 15k rows per partition D) eleven per week => 7.5~ million/60 = 125k rows per partition But since I already have 1 billion rows distributed, each distribution node would have 16.7 million rows, fulfilling more than the minimum, since I am interested in having a certain reasonable number of partitions to increase the speed of the queries, I would choose once per month, because once per year would create very few partitions. Therefore the answer is A)
upvoted 4 times
SajadAhm
8 months, 2 weeks ago
you'll have new partitions whenever you update the table's partitions. if you create a new partition every month, this new partition will have only 500k rows per distribution. I agree that the previous partitions have more than enough records, but the problem is with the new partition.
upvoted 2 times
...
...
rocky48
1 year, 2 months ago
Selected Answer: A
Given that you’ll be adding 30 million rows per month, consider partitioning once per month based on the Sales Date column. This approach balances data maintenance and query performance while avoiding excessive partitioning. Therefore, the correct answer is A. once per month
upvoted 2 times
...
rocky48
1 year, 2 months ago
To optimize query performance and data loading for your Azure Synapse Analytics dedicated SQL pool table, you should create a partition based on the Sales Date column. Partitioning Frequency: The decision on how often to create a partition depends on the data volume and your specific use case. Minimum Rows per Partition: For optimal compression and performance of clustered columnstore tables, it’s recommended to have a minimum of 1 million rows per distribution and partition. Automatic Distribution: Dedicated SQL pools already divide each table into 60 distributed databases. Recommendation: Given that you’ll be adding 30 million rows per month, consider partitioning once per month based on the Sales Date column. This approach balances data maintenance and query performance while avoiding excessive partitioning. Therefore, the correct answer is A. once per month
upvoted 1 times
...
moneytime
1 year, 2 months ago
I chose D Reason: For optimal performance and compression the followings are true; The minimum record to be load is 60M permonth. This implies that for ; *Per week = 60/4= 15M records *Pe rday = 15/7 =2.1M records *Per year= 12 *60M = 740M and so on for hours ,minutes ,sec. in the right proportions. Considering ,the case at hand. Adding 30M rows monthly will have negative impact on the performance and compression of the table .The reason is that the quantity of records to load is less than the required minimum value of 60M for monthly partition. In other to correct this ,30M minimum rows show be added every 2 weeks(not permonth) or 10M minimum rows per week. So ,the most reasonable advice is to load the table perweek for optimal performance and compression.
upvoted 1 times
d47320d
11 months, 2 weeks ago
Your calculation logic is incorrect. A year after Table1 will contain 1360M i.e. around 22.6M per distribution having specified one partition per year. Instead, if we specify one partition by week, then we will have around 0.4M per distribution and partition, which is below than 1M Microsoft recommendation. This makes your choice wrong.
upvoted 2 times
...
...
Charley92
1 year, 3 months ago
To partition Table1 based on the Sales Date column, you can create a partition for each month. Since thirty million rows will be added to Table1 each month, creating a partition once per month would be the most optimal solution for query performance and data loading 1. This way, the data can be easily queried and loaded, and the partitioning will not be too granular or too coarse 1. Therefore, the answer is A. once per month.
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago