exam questions

Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 1 question 30 discussion

Actual exam question from Microsoft's DP-203
Question #: 30
Topic #: 1
[All DP-203 Questions]

You are designing a financial transactions table in an Azure Synapse Analytics dedicated SQL pool. The table will have a clustered columnstore index and will include the following columns:
✑ TransactionType: 40 million rows per transaction type
✑ CustomerSegment: 4 million per customer segment
✑ TransactionMonth: 65 million rows per month
AccountType: 500 million per account type

You have the following query requirements:
✑ Analysts will most commonly analyze transactions for a given month.
✑ Transactions analysis will typically summarize transactions by transaction type, customer segment, and/or account type
You need to recommend a partition strategy for the table to minimize query times.
On which column should you recommend partitioning the table?

  • A. CustomerSegment
  • B. AccountType
  • C. TransactionType
  • D. TransactionMonth
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
SinSS
Highly Voted 7 months, 1 week ago
Considering row count, the answer is Transaction Month or Account Type. Partitioning with Trans. Month, it will end up with hot partition and cannot utilize parallel processing, because all data for the query will be in the same partition. To minimize the processing time, parallel processing should happen. So Transaction Month can not be the answer. My answer is Account Type. There is a similar question in Microsoft site and the explanation was given with answer as follows. Product ensures parallelism when querying data from a given month within the same region, or multiple regions. Using date and partitioning by month, all sales for a month will be in the same partition, not providing parallelism. All sales for a given region will be in the same partition, not providing parallelism....
upvoted 9 times
Rob77
1 year, 11 months ago
I think you are confusing partition with distribution. Each partition will still be distributed over 60 distributions.
upvoted 3 times
semauni
1 year, 9 months ago
The Microsoft question is also about partitioning.
upvoted 1 times
...
...
...
gf2tw
Highly Voted 3 years, 4 months ago
Agree with D, should not be confused with Distribution column for Hash-distributed tables.
upvoted 7 times
emna2022
2 years, 8 months ago
Partitioning and Distributing are different concepts In most cases data is partitioned on a date column that is closely tied to the order in which the data is loaded into the SQL pool. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-partition
upvoted 4 times
...
...
Jolyboy
Most Recent 4 weeks, 1 day ago
Selected Answer: D
To minimize query times for your financial transactions table in Azure Synapse Analytics, you should recommend partitioning the table on the TransactionMonth column (Option D). Here's why: Query Requirements: Analysts will most commonly analyze transactions for a given month. Partitioning by TransactionMonth allows queries to quickly filter and access the relevant partitions, significantly improving query performance1. Data Distribution: With 65 million rows per month, partitioning by TransactionMonth ensures that each partition contains a manageable amount of data, optimizing both storage and query efficiency1. Partitioning by TransactionMonth aligns well with your query patterns and helps in efficiently managing and querying large datasets.
upvoted 1 times
...
s1852
4 months ago
Selected Answer: D
For optimal compression and performance of clustered columnstore tables, a minimum of 1 million rows per distribution and partition is needed. Before partitions are created, dedicated SQL pool already divides each table into 60 distributed databases.
upvoted 1 times
...
EmnCours
5 months ago
Selected Answer: D
Correct Answer: D
upvoted 1 times
...
ibardonr
6 months ago
D. xD
upvoted 1 times
...
dgerok
7 months, 1 week ago
Selected Answer: B
While you choose month, there is no parallelization at all, so this is the worst possible answer. The correct answer is AccountType. This ensures the PARALELLIZATION, which is improving the performance. CustomerSegment and TransactionType are below 60 million, which is not enough for 60 nodes with 1 mln for each...
upvoted 4 times
_Ahan_
11 months, 1 week ago
Parallelization is for distributions I believe, and every partition will automatically be distributed as well. Partitioning is done to reduce the amount of data that is scanned. If I partition the data by months, then each month partition is further distributed into 60 distributions which are split across different nodes, thereby ensuring parallelism
upvoted 5 times
...
...
otapi
7 months, 1 week ago
Selected Answer: B
I assume, the listed figures are the number of unique values in each columns. TransactionMonth is a bad partition, as queries would typically fall into one partition. AccountType has the most unique values, those are better for partitioning.
upvoted 1 times
roopansh.gupta2
8 months, 1 week ago
Don't worry about the listed figures. They are not specific to the question, but is copied text from an example provided on MS-Docs.
upvoted 1 times
...
...
jppdks
1 year, 1 month ago
The best column to partition the table on would be D. TransactionMonth. Partitioning is a database design technique which is used to improves performance, manageability, simplifies maintenance and reduce the response time of the query. Given that analysts will most commonly analyze transactions for a given month, partitioning on the TransactionMonth column would allow the database to quickly isolate the relevant rows for a given month, thereby minimizing query times. This is particularly beneficial when dealing with large volumes of data, as is the case here. Moreover, partitioning on TransactionMonth would also align with the typical transaction analysis patterns, which summarize transactions by transaction type, customer segment, and/or account type. This would allow for efficient querying and analysis of the data. Therefore, the TransactionMonth column would be the most suitable choice for partitioning the table.
upvoted 2 times
...
Khadija10
1 year, 2 months ago
Selected Answer: D
the answer is transaction month "In most cases data is partitioned on a date column that is closely tied to the order in which the data is loaded into the SQL pool." https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-partition?context=%2Fazure%2Fsynapse-analytics%2Fcontext%2Fcontext https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/best-practices-dedicated-sql-pool
upvoted 1 times
Bakhtiyor
1 year, 1 month ago
Hello. I have a question. What is difference between partitioning and distribution?
upvoted 1 times
...
...
DooperMan
1 year, 5 months ago
Selected Answer: D
i felt partitioning then going with something that is not unique, if it is like hashing or even distribution depends on the case to go with more unique.
upvoted 2 times
...
ellala
1 year, 6 months ago
Selected Answer: B
Partitioning by date will lead to hot partitions since the most common query is by date. Therefore this is not a good idea. Next best option is AccountType. We want to take advantage of parallel processing to improve efficiency
upvoted 2 times
...
mav2000
1 year, 7 months ago
Answer is D, because the analyst will be querying transactions for month, and then its mentioned that transaction analysis will be done on Transaction_type, customer_segment and account_type, meaning they won't be querying for an individual columns but all 3 at the same time, which means it's pointless to partition between these columns, so transaction month is the answer
upvoted 3 times
...
hassexat
1 year, 7 months ago
Clustered columnstore index has 60 partitions and each partition needs a minimum of 1 million rows so TransactionMonth is the only that has more than 60 millions of rows. Correct answer: D
upvoted 2 times
...
kkk5566
1 year, 7 months ago
Selected Answer: D
Partition by month is a good idra
upvoted 1 times
...
akhil5432
1 year, 8 months ago
Selected Answer: D
transaction month
upvoted 1 times
...
Shanks111
1 year, 10 months ago
B, as selecting the date column as a partition will make one partition a hot partition and won't be able to make use of parallel processing, so Account Type should be the correct answer.
upvoted 4 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago