Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 2 question 53 discussion

Actual exam question from Microsoft's DP-203

Question #: 53
Topic #: 2

DRAG DROP -
You are designing an Azure Data Lake Storage Gen2 structure for telemetry data from 25 million devices distributed across seven key geographical regions. Each minute, the devices will send a JSON payload of metrics to Azure Event Hubs.
You need to recommend a folder structure for the data. The solution must meet the following requirements:
✑ Data engineers from each region must be able to build their own pipelines for the data of their respective region only.
✑ The data must be processed at least once every 15 minutes for inclusion in Azure Synapse Analytics serverless SQL pools.
How should you recommend completing the structure? To answer, drag the appropriate values to the correct targets. Each value may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.
NOTE: Each correct selection is worth one point.
Select and Place:

Show Suggested Answer

Suggested Answer:

Box 1: {raw/regionID}
Box 2: {YYYY}/{MM}/{DD}/{HH}/{mm}
Box 3: {deviceID}
Reference:
https://github.com/paolosalvatori/StreamAnalyticsAzureDataLakeStore/blob/master/README.md

by gf2tw at Dec. 10, 2021, 8:55 a.m.

Comments

Submit Cancel

ItHYMeRIsh

Highly Voted 2 years, 10 months ago

The correct answer is {raw/regionID}/{YYYY}/{MM}/{DD}/{HH}/{mm}/{deviceID}.json {raw/regionID} is the first level because raw is the container name for the raw data. RegionID follows it for ease of managing security. {YYYY}/{MM}/{DD}/{HH}/{mm}/{deviceID}.json instead of {deviceID}/{YYYY}/{MM}/{DD}/{HH}/{mm}.json. The primary reason is that you want your namespace structure to have as few folders as high up and narrow those down as you get deeper into your structure. For example, if you have 1 year worth of data and 25 million devices, using {YYYY}/{MM}/{DD}/{HH}/{mm}/ results in 2.1 million folders (1 year * 12 months * 30 days [estimate] * 24 hours * 60 minutes). If you start your folder structure with {deviceID}, you end up with 25 million folders - one for each device - before you even get to including the date in the hierarchy.

upvoted 208 times

ML_Novice

2 years, 1 month ago

ItHYMeRIsh you re a genius man

upvoted 6 times

...

nmnm22

1 year, 6 months ago

thats such a cool explanation, i aspire to have the same critical thinking skills u have

upvoted 4 times

...

Deeksha1234

2 years, 2 months ago

Agree, correct answer

upvoted 1 times

...

sdokmak

2 years, 4 months ago

I'm geting ~500k folders for 1*12*30*24*60. I get your point that heirarchy would be a lot cleaner.

upvoted 1 times

...

Load full discussion...

...

gf2tw

Highly Voted 2 years, 10 months ago

raw/RegionId should be in the first box as raw is the name of your container. Furthermore, putting RegionId as one of the first foldernames allows easy partitioning and simpler RBAC for the Data Engineers.

upvoted 15 times

SAli12

2 years, 10 months ago

Yes I agree, raw/regionId --> timestamp --> deviceId.json

upvoted 5 times

...

Sathya_sree

Most Recent 4 weeks, 1 day ago

Answer Area Position Value First value regionID Second value raw Third value {YYYY}/{MM}/{DD}/{HH}/{mm}/{deviceID}

upvoted 1 times

...

auwia

1 year, 4 months ago

I'll follow best practice from Microsoft: https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-best-practices#monitor-telemetry So: /raw/regionid/deviceid/YYYY/MM/DD/HH (without minutes).

upvoted 5 times

...

rocky48

1 year, 4 months ago

The correct answer is {raw/regionID}/{YYYY}/{MM}/{DD}/{HH}/{mm}/{deviceID}.json

upvoted 1 times

...

georgich87

2 years, 6 months ago

I think that link will help us to find the correct answer: https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-best-practices The given example for a directory structure is: *{Region}/{SubjectMatter(s)}/{yyyy}/{mm}/{dd}/{hh}/*

upvoted 4 times

...

wwdba

2 years, 7 months ago

{raw/regionID}/{YYYY}/{MM}/{DD}/{HH}/{mm}/{deviceID}.json

upvoted 2 times

...

staniopolis

2 years, 8 months ago

IMHO {YYYY}/{MM}/{DD}/{HH}/{regionID/raw}/{deviceID}.json (given answer) is correct. Please pay attention that there is no minutes {mm} course it is not supported by Time format https://docs.microsoft.com/en-us/azure/stream-analytics/blob-storage-azure-data-lake-gen2-output

upvoted 3 times

...

staniopolis

2 years, 8 months ago

{raw/regionID}/{YYYY}/{MM}/{DD}/{HH}/{deviceID}.json Time Format [optional]: if the time token is used in the prefix path, specify the time format in which your files are organized. Currently the only supported value is HH.

upvoted 3 times

...

Canary_2021

2 years, 9 months ago

Question 54: the correct answer of box 2 is {YYYY}/{MM}/{DD}/{HH}_{deviceType}.json One dataset per hour per deviceType. So looks like regionid and deviceid should be put after {YYYY}/{MM}/{DD}/{HH}/{mm} . {YYYY}/{MM}/{DD}/{HH}/{mm}/{raw/regionID}/{deviceID}.json

upvoted 1 times

Canary_2021

2 years, 9 months ago

Still feel {raw/RegionID} / {YYYY/MM/DD/mm} /{DeviceID} is correct. Just have some questions after compare answers of question 54.

upvoted 2 times

...

engrbrain

2 years, 9 months ago

The Question says : Each minute, the devices will send a JSON payload. That means the data is demarcated by region and by minutes. {raw/RegionID} / {YYYY/MM/DD/mm} /{DeviceID}

upvoted 2 times

...

SabaJamal2010AtGmail

2 years, 10 months ago

/{SubjectArea}/{DataSource}/{YYYY}/{MM}/{DD}/{FileData}_{YYYY}_{MM}_{DD}.

upvoted 2 times

...

PA7

2 years, 10 months ago

raw/regionid - > DeviceId -> YYYY/MM/dd/HH-mm

upvoted 4 times

auwia

1 year, 4 months ago

without minute info.

upvoted 2 times

...

mr_corte

2 years, 10 months ago

{raw/regionID}/{deviceID}/{YYYY}/{MM}/{DD}/{HH}{mm} imo.

upvoted 4 times

auwia

1 year, 4 months ago

without minute in my opinion

upvoted 2 times

tsmk

1 year, 3 months ago

IMO, with {mm}. Otherwise, every HH dir will have 25mil (device) * 60 (freq. of incoming files)

upvoted 1 times

...

Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 2 question 53 discussion

Comments

ItHYMeRIsh

ML_Novice

nmnm22

Deeksha1234

sdokmak

gf2tw

SAli12

Sathya_sree

auwia

rocky48

georgich87

wwdba

staniopolis

staniopolis

Canary_2021

Canary_2021

engrbrain

SabaJamal2010AtGmail

PA7

auwia

mr_corte

auwia

tsmk

SY0-701