exam questions

Exam DP-200 All Questions

View all questions & answers for the DP-200 exam

Exam DP-200 topic 1 question 26 discussion

Actual exam question from Microsoft's DP-200
Question #: 26
Topic #: 1
[All DP-200 Questions]

You plan to create a dimension table in Azure Synapse Analytics that will be less than 1 GB.
You need to create the table to meet the following requirements:
✑ Provide the fastest query time.
✑ Minimize data movement during queries.
Which type of table should you use?

  • A. hash distributed
  • B. heap
  • C. replicated
  • D. round-robin
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️
Usually common dimension tables or tables that doesn't distribute evenly are good candidates for round-robin distributed table.
Note: Dimension tables or other lookup tables in a schema can usually be stored as round-robin tables. Usually these tables connect to more than one fact tables and optimizing for one join may not be the best idea. Also usually dimension tables are smaller which can leave some distributions empty when hash distributed.
Round-robin by definition guarantees a uniform data distribution.
Reference:
https://blogs.msdn.microsoft.com/sqlcat/2015/08/11/choosing-hash-distributed-table-vs-round-robin-distributed-table-in-azure-sql-dw-service/

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
sandeep1111
Highly Voted 4 years ago
Answer is Replicated table here.
upvoted 56 times
...
oldpony
Most Recent 2 years, 10 months ago
Selected Answer: C
dimension tables are normally replicated Fact tables are normally hash due to large size of data
upvoted 1 times
...
massnonn
3 years, 5 months ago
Absolutely wrong the round robin use all distrbution, the correct answer is replicated
upvoted 2 times
...
eurekamike
3 years, 10 months ago
dimension tables are normally replicated. also given that it is less than 1GB is another reason to use replicated. staging is round-robin unless it's too big, then use hash. fact tables are hash distributed.
upvoted 3 times
...
jayeshstudies
3 years, 10 months ago
minimize data movement --> no way it can be round robin has to be replicated
upvoted 2 times
...
VinayakT
3 years, 10 months ago
Replicated should be correct answer
upvoted 3 times
...
Simon2021
3 years, 10 months ago
Answer is Replicated
upvoted 4 times
...
tej24
3 years, 10 months ago
Data movement is reduced with Replicated tables because a full copy of data is stored on each Compute node. As a result, queries that required data movement steps to complete now run faster with Replicated tables. Answer is Replicated table https://azure.microsoft.com/en-in/blog/replicated-tables-now-in-preview-for-azure-sql-data-warehouse/
upvoted 3 times
...
LordSnoek
3 years, 10 months ago
Replicated is the right choice
upvoted 2 times
...
AZ20
3 years, 10 months ago
I think answer should be replicated as size is already given 1GB ( not that big) , plus no data movement required as present on all nodes
upvoted 1 times
AZ20
3 years, 10 months ago
to support it more .. from MS guide - What is a replicated table? A replicated table has a full copy of the table accessible on each Compute node. Replicating a table removes the need to transfer data among Compute nodes before a join or aggregation. Since the table has multiple copies, replicated tables work best when the table size is less than 2 GB compressed. 2 GB is not a hard limit. If the data is static and does not change, you can replicate larger tables. *My addition* - In dimension data doesn't change frequently
upvoted 1 times
...
...
Qrm_1972
4 years ago
The correct is Replicated ✅ Small-dimension tables in a star schema with less than 2GB of storage after compression (~5x compression).
upvoted 2 times
...
gkozla
4 years ago
Replicated table: A replicated table has a full copy of the table available on every Compute node. Queries run fast on replicated tables because joins on replicated tables don't require data movement. Replication requires extra storage, though, and isn't practical for large tables.
upvoted 2 times
...
JohnCrawford
4 years ago
See this link supporting my statement. https://docs.microsoft.com/en-gb/learn/modules/design-azure-sql-data-warehouse/6-table-geometries
upvoted 3 times
...
JohnCrawford
4 years ago
The correct answer is REPLICATED. Fact tables should almost always use hash distribution, dimension tables almost always replicated and staging tables almost always round-robin. If the dimension table is large (2 GB+) and/or used in joins with the fact table on the partition key then it may make sense to also use hash distribution for that particular dimension table.
upvoted 3 times
...
SuperAlex
4 years ago
why not replicated table ? https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/design-guidance-for-replicated-tables
upvoted 2 times
...
In my opinion, the correct answer is Hash-distributed based on the link below: "A hash distributed table distributes rows based on the value in the distribution column. A hash distributed table is designed to achieve high performance for queries on large tables. There are several factors to consider when choosing a distribution column." https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-overview
upvoted 3 times
maciejt
3 years, 11 months ago
This is not a large table, it;s very small. Hash distributed is good for large fact tables with evenly distributed column that is used in joins.
upvoted 1 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago