Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 1 question 30 discussion

Actual exam question from Microsoft's DP-203

Question #: 30
Topic #: 1

You are designing a financial transactions table in an Azure Synapse Analytics dedicated SQL pool. The table will have a clustered columnstore index and will include the following columns:
✑ TransactionType: 40 million rows per transaction type
✑ CustomerSegment: 4 million per customer segment
✑ TransactionMonth: 65 million rows per month
AccountType: 500 million per account type

You have the following query requirements:
✑ Analysts will most commonly analyze transactions for a given month.
✑ Transactions analysis will typically summarize transactions by transaction type, customer segment, and/or account type
You need to recommend a partition strategy for the table to minimize query times.
On which column should you recommend partitioning the table?

A. CustomerSegment
B. AccountType
C. TransactionType
D. TransactionMonth

Show Suggested Answer

Suggested Answer: D 🗳️

by gf2tw at Dec. 10, 2021, 10:47 a.m.

Comments

Submit Cancel

SinSS

Highly Voted 7 months, 1 week ago

Considering row count, the answer is Transaction Month or Account Type. Partitioning with Trans. Month, it will end up with hot partition and cannot utilize parallel processing, because all data for the query will be in the same partition. To minimize the processing time, parallel processing should happen. So Transaction Month can not be the answer. My answer is Account Type. There is a similar question in Microsoft site and the explanation was given with answer as follows. Product ensures parallelism when querying data from a given month within the same region, or multiple regions. Using date and partitioning by month, all sales for a month will be in the same partition, not providing parallelism. All sales for a given region will be in the same partition, not providing parallelism....

upvoted 9 times

Rob77

1 year, 11 months ago

I think you are confusing partition with distribution. Each partition will still be distributed over 60 distributions.

upvoted 3 times

semauni

1 year, 9 months ago

The Microsoft question is also about partitioning.

upvoted 1 times

...

gf2tw

Highly Voted 3 years, 4 months ago

Agree with D, should not be confused with Distribution column for Hash-distributed tables.

upvoted 7 times

emna2022

2 years, 8 months ago

Partitioning and Distributing are different concepts In most cases data is partitioned on a date column that is closely tied to the order in which the data is loaded into the SQL pool. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-partition

upvoted 4 times

...

Jolyboy

Most Recent 4 weeks, 1 day ago

Selected Answer: D

To minimize query times for your financial transactions table in Azure Synapse Analytics, you should recommend partitioning the table on the TransactionMonth column (Option D). Here's why: Query Requirements: Analysts will most commonly analyze transactions for a given month. Partitioning by TransactionMonth allows queries to quickly filter and access the relevant partitions, significantly improving query performance1. Data Distribution: With 65 million rows per month, partitioning by TransactionMonth ensures that each partition contains a manageable amount of data, optimizing both storage and query efficiency1. Partitioning by TransactionMonth aligns well with your query patterns and helps in efficiently managing and querying large datasets.

upvoted 1 times

...

s1852

4 months ago

Selected Answer: D

For optimal compression and performance of clustered columnstore tables, a minimum of 1 million rows per distribution and partition is needed. Before partitions are created, dedicated SQL pool already divides each table into 60 distributed databases.

upvoted 1 times

...

EmnCours

5 months ago

Selected Answer: D

Correct Answer: D

upvoted 1 times

...

ibardonr

6 months ago

D. xD

upvoted 1 times

...

dgerok

7 months, 1 week ago

Selected Answer: B

While you choose month, there is no parallelization at all, so this is the worst possible answer. The correct answer is AccountType. This ensures the PARALELLIZATION, which is improving the performance. CustomerSegment and TransactionType are below 60 million, which is not enough for 60 nodes with 1 mln for each...

upvoted 4 times

_Ahan_

11 months, 1 week ago

Parallelization is for distributions I believe, and every partition will automatically be distributed as well. Partitioning is done to reduce the amount of data that is scanned. If I partition the data by months, then each month partition is further distributed into 60 distributions which are split across different nodes, thereby ensuring parallelism

upvoted 5 times

...

otapi

7 months, 1 week ago

Selected Answer: B

I assume, the listed figures are the number of unique values in each columns. TransactionMonth is a bad partition, as queries would typically fall into one partition. AccountType has the most unique values, those are better for partitioning.

upvoted 1 times

roopansh.gupta2

8 months, 1 week ago

Don't worry about the listed figures. They are not specific to the question, but is copied text from an example provided on MS-Docs.

upvoted 1 times

...

jppdks

1 year, 1 month ago

The best column to partition the table on would be D. TransactionMonth. Partitioning is a database design technique which is used to improves performance, manageability, simplifies maintenance and reduce the response time of the query. Given that analysts will most commonly analyze transactions for a given month, partitioning on the TransactionMonth column would allow the database to quickly isolate the relevant rows for a given month, thereby minimizing query times. This is particularly beneficial when dealing with large volumes of data, as is the case here. Moreover, partitioning on TransactionMonth would also align with the typical transaction analysis patterns, which summarize transactions by transaction type, customer segment, and/or account type. This would allow for efficient querying and analysis of the data. Therefore, the TransactionMonth column would be the most suitable choice for partitioning the table.

upvoted 2 times

...

Khadija10

1 year, 2 months ago

Selected Answer: D

the answer is transaction month "In most cases data is partitioned on a date column that is closely tied to the order in which the data is loaded into the SQL pool." https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-partition?context=%2Fazure%2Fsynapse-analytics%2Fcontext%2Fcontext https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/best-practices-dedicated-sql-pool

upvoted 1 times

Bakhtiyor

1 year, 1 month ago

Hello. I have a question. What is difference between partitioning and distribution?

upvoted 1 times

...

DooperMan

1 year, 5 months ago

Selected Answer: D

i felt partitioning then going with something that is not unique, if it is like hashing or even distribution depends on the case to go with more unique.

upvoted 2 times

...

ellala

1 year, 6 months ago

Selected Answer: B

Partitioning by date will lead to hot partitions since the most common query is by date. Therefore this is not a good idea. Next best option is AccountType. We want to take advantage of parallel processing to improve efficiency

upvoted 2 times

...

mav2000

1 year, 7 months ago

Answer is D, because the analyst will be querying transactions for month, and then its mentioned that transaction analysis will be done on Transaction_type, customer_segment and account_type, meaning they won't be querying for an individual columns but all 3 at the same time, which means it's pointless to partition between these columns, so transaction month is the answer

upvoted 3 times

...

hassexat

1 year, 7 months ago

Clustered columnstore index has 60 partitions and each partition needs a minimum of 1 million rows so TransactionMonth is the only that has more than 60 millions of rows. Correct answer: D

upvoted 2 times

...

kkk5566

1 year, 7 months ago

Selected Answer: D

Partition by month is a good idra

upvoted 1 times

...

akhil5432

1 year, 8 months ago

Selected Answer: D

transaction month

upvoted 1 times

...

Shanks111

1 year, 10 months ago

B, as selecting the date column as a partition will make one partition a hot partition and won't be able to make use of parallel processing, so Account Type should be the correct answer.

upvoted 4 times

...

Load full discussion...

Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 1 question 30 discussion

Comments

SinSS

Rob77

semauni

gf2tw

emna2022

Jolyboy

s1852

EmnCours

ibardonr

dgerok

_Ahan_

otapi

roopansh.gupta2

jppdks

Khadija10

Bakhtiyor

DooperMan

ellala

mav2000

hassexat

kkk5566

akhil5432

Shanks111

SY0-701