Exam DP-200 All Questions

View all questions & answers for the DP-200 exam

Exam DP-200 topic 1 question 26 discussion

Actual exam question from Microsoft's DP-200

Question #: 26
Topic #: 1

You plan to create a dimension table in Azure Synapse Analytics that will be less than 1 GB.
You need to create the table to meet the following requirements:
✑ Provide the fastest query time.
✑ Minimize data movement during queries.
Which type of table should you use?

A. hash distributed
B. heap
C. replicated
D. round-robin

Show Suggested Answer

Suggested Answer: D 🗳️
Usually common dimension tables or tables that doesn't distribute evenly are good candidates for round-robin distributed table.
Note: Dimension tables or other lookup tables in a schema can usually be stored as round-robin tables. Usually these tables connect to more than one fact tables and optimizing for one join may not be the best idea. Also usually dimension tables are smaller which can leave some distributions empty when hash distributed.
Round-robin by definition guarantees a uniform data distribution.
Reference:
https://blogs.msdn.microsoft.com/sqlcat/2015/08/11/choosing-hash-distributed-table-vs-round-robin-distributed-table-in-azure-sql-dw-service/

by GoldenStateWarriors at April 2, 2021, 2:23 a.m.

Comments

Submit Cancel

sandeep1111

Highly Voted 4 years ago

Answer is Replicated table here.

upvoted 56 times

...

oldpony

Most Recent 2 years, 10 months ago

Selected Answer: C

dimension tables are normally replicated Fact tables are normally hash due to large size of data

upvoted 1 times

...

massnonn

3 years, 5 months ago

Absolutely wrong the round robin use all distrbution, the correct answer is replicated

upvoted 2 times

...

eurekamike

3 years, 10 months ago

dimension tables are normally replicated. also given that it is less than 1GB is another reason to use replicated. staging is round-robin unless it's too big, then use hash. fact tables are hash distributed.

upvoted 3 times

...

jayeshstudies

3 years, 10 months ago

minimize data movement --> no way it can be round robin has to be replicated

upvoted 2 times

...

VinayakT

3 years, 10 months ago

Replicated should be correct answer

upvoted 3 times

...

Simon2021

3 years, 10 months ago

Answer is Replicated

upvoted 4 times

...

tej24

3 years, 10 months ago

Data movement is reduced with Replicated tables because a full copy of data is stored on each Compute node. As a result, queries that required data movement steps to complete now run faster with Replicated tables. Answer is Replicated table https://azure.microsoft.com/en-in/blog/replicated-tables-now-in-preview-for-azure-sql-data-warehouse/

upvoted 3 times

...

LordSnoek

3 years, 10 months ago

Replicated is the right choice

upvoted 2 times

...

AZ20

3 years, 10 months ago

I think answer should be replicated as size is already given 1GB ( not that big) , plus no data movement required as present on all nodes

upvoted 1 times

AZ20

3 years, 10 months ago

to support it more .. from MS guide - What is a replicated table? A replicated table has a full copy of the table accessible on each Compute node. Replicating a table removes the need to transfer data among Compute nodes before a join or aggregation. Since the table has multiple copies, replicated tables work best when the table size is less than 2 GB compressed. 2 GB is not a hard limit. If the data is static and does not change, you can replicate larger tables. *My addition* - In dimension data doesn't change frequently

upvoted 1 times

...

Qrm_1972

4 years ago

The correct is Replicated ✅ Small-dimension tables in a star schema with less than 2GB of storage after compression (~5x compression).

upvoted 2 times

...

gkozla

4 years ago

Replicated table: A replicated table has a full copy of the table available on every Compute node. Queries run fast on replicated tables because joins on replicated tables don't require data movement. Replication requires extra storage, though, and isn't practical for large tables.

upvoted 2 times

...

JohnCrawford

4 years ago

See this link supporting my statement. https://docs.microsoft.com/en-gb/learn/modules/design-azure-sql-data-warehouse/6-table-geometries

upvoted 3 times

...

JohnCrawford

4 years ago

The correct answer is REPLICATED. Fact tables should almost always use hash distribution, dimension tables almost always replicated and staging tables almost always round-robin. If the dimension table is large (2 GB+) and/or used in joins with the fact table on the partition key then it may make sense to also use hash distribution for that particular dimension table.

upvoted 3 times

...

SuperAlex

4 years ago

why not replicated table ? https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/design-guidance-for-replicated-tables

upvoted 2 times

...

GoldenStateWarriors

4 years ago

In my opinion, the correct answer is Hash-distributed based on the link below: "A hash distributed table distributes rows based on the value in the distribution column. A hash distributed table is designed to achieve high performance for queries on large tables. There are several factors to consider when choosing a distribution column." https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-overview

upvoted 3 times

maciejt

3 years, 11 months ago

This is not a large table, it;s very small. Hash distributed is good for large fact tables with evenly distributed column that is used in joins.

upvoted 1 times

...

Exam DP-200 All Questions

View all questions & answers for the DP-200 exam

Exam DP-200 topic 1 question 26 discussion

Comments

sandeep1111

oldpony

massnonn

eurekamike

jayeshstudies

VinayakT

Simon2021

tej24

LordSnoek

AZ20

AZ20

Qrm_1972

gkozla

JohnCrawford

JohnCrawford

SuperAlex

GoldenStateWarriors

maciejt

SY0-701