exam questions

Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 1 question 9 discussion

Actual exam question from Microsoft's DP-203
Question #: 9
Topic #: 1
[All DP-203 Questions]

HOTSPOT -
You have a data model that you plan to implement in a data warehouse in Azure Synapse Analytics as shown in the following exhibit.

All the dimension tables will be less than 2 GB after compression, and the fact table will be approximately 6 TB. The dimension tables will be relatively static with very few data inserts and updates.
Which type of table should you use for each table? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:

Show Suggested Answer Hide Answer
Suggested Answer:
Box 1: Replicated -
Replicated tables are ideal for small star-schema dimension tables, because the fact table is often distributed on a column that is not compatible with the connected dimension tables. If this case applies to your schema, consider changing small dimension tables currently implemented as round-robin to replicated.

Box 2: Replicated -

Box 3: Replicated -

Box 4: Hash-distributed -
For Fact tables use hash-distribution with clustered columnstore index. Performance improves when two hash tables are joined on the same distribution column.
Reference:
https://azure.microsoft.com/en-us/updates/reduce-data-movement-and-make-your-queries-more-efficient-with-the-general-availability-of-replicated-tables/ https://azure.microsoft.com/en-us/blog/replicated-tables-now-generally-available-in-azure-sql-data-warehouse/

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
ian_viana
Highly Voted 3 years, 7 months ago
The answer is correct. The Dims are under 2gb so no point in use hash. Common distribution methods for tables: The table category often determines which option to choose for distributing the table. Table category Recommended distribution option Fact -Use hash-distribution with clustered columnstore index. Performance improves when two hash tables are joined on the same distribution column. Dimension - Use replicated for smaller tables. If tables are too large to store on each Compute node, use hash-distributed. Staging - Use round-robin for the staging table. The load with CTAS is fast. Once the data is in the staging table, use INSERT...SELECT to move the data to production tables. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-overview#common-distribution-methods-for-tables
upvoted 240 times
GameLift
3 years, 6 months ago
Thanks, but where in the question does it indicate about Fact table has clustered columnstore index.?
upvoted 3 times
berserksap
3 years, 5 months ago
Normally for big tables we use clustered columnstore index for optimal performance and compression. Since the table mentioned here is in TBs we can safely assume using this index is the best choice https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-index
upvoted 3 times
berserksap
3 years, 5 months ago
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-overview
upvoted 1 times
...
...
...
Tara123
1 year, 4 months ago
Can you please explain for Dimension table it is mentioned that "If tables are too large" use Hash distribution. Here Too large means how much?I am waiting for your reply🙏🙏🙏🙏
upvoted 1 times
lisa710
1 year, 4 months ago
exceeding 10 gigabytes (GB) are often considered large
upvoted 5 times
...
...
virendrapsingh
2 years, 10 months ago
This is a wonderful explanation. Worth giving a like.
upvoted 7 times
...
...
ohana
Highly Voted 3 years, 6 months ago
Took the exam today, this question came out. Ans: All the Dim tables --> Replicated Fact Tables --> Hash Distributed
upvoted 49 times
...
technoguy
Most Recent 1 month ago
Replicated Replicated Replicated Hash Dist
upvoted 1 times
...
krishna1303
3 months, 1 week ago
Dim_customers,Dim_Employees,Dim_time are replicated and Fact_DailyBooking table is hash distributed
upvoted 1 times
...
EmnCours
5 months ago
The answer is correct.
upvoted 1 times
...
RookieCloudEngineer
8 months, 4 weeks ago
this explains better https://www.youtube.com/watch?v=seiJ2xpW0h8
upvoted 1 times
...
Bakachi55
1 year, 1 month ago
Dear Community, I would like to express my heartfelt gratitude for the thoughtful mock questions that have been shared. Your generosity in providing these valuable resources has been immensely helpful. As we engage in discussions and learn together, I am reminded of the strength and camaraderie that exists within our community. To everyone who has contributed, whether by creating questions, participating in discussions, or simply offering encouragement, thank you. Your collective efforts make this community a vibrant and supportive place for learning and growth. Let us continue to share knowledge, support one another, and celebrate our shared passion for learning
upvoted 4 times
...
Alongi
1 year, 1 month ago
All Dim Tables, if less than 2 GB, will be REPLICATED. For Fact Table, if major than 10 GB, will be HASH DIST.
upvoted 3 times
...
dakku987
1 year, 4 months ago
REplicated,REplicated,REplicated and hash
upvoted 1 times
...
hassexat
1 year, 7 months ago
Replicated / Replicated / Replicated / Hash
upvoted 1 times
...
kkk5566
1 year, 8 months ago
Ans: All the Dim tables --> Replicated Fact Tables --> Hash Distributed is correct
upvoted 1 times
...
DataEngDP
1 year, 9 months ago
https://learn.microsoft.com/en-us/training/modules/design-multidimensional-schema-to-optimize-analytical-workloads/3-create-tables
upvoted 2 times
...
DataEngDP
1 year, 9 months ago
https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-best-practices#batch-jobs-structure Dimension tables: Replicated distribution since data movement is absent here. Fact Table: Hash distribution to improve performance of moving data.
upvoted 1 times
...
vigilante89
2 years, 4 months ago
Dim_*: Replicated Since dimension tables are less likely to get frequent updates and are usually smaller in size, replicating them across all partitions makes logical sense. Also, Tables less than 2gb size should be replicated. Fact_*: Hash Distributed Since Fact tables are huge and have frequent insert/delete/updates going on, hash distribution is the perfect distribution candidate. Also, Tables greater than 2gb size should be hash distributed.
upvoted 3 times
...
Deeksha1234
2 years, 8 months ago
correct
upvoted 1 times
...
objecto
2 years, 11 months ago
Just a better link that explains the decisions. Also watch the video, it's cool. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/massively-parallel-processing-mpp-architecture
upvoted 2 times
...
Dothy
2 years, 11 months ago
The answer is correct.
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago