exam questions

Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 2 question 53 discussion

Actual exam question from Microsoft's DP-203
Question #: 53
Topic #: 2
[All DP-203 Questions]

DRAG DROP -
You are designing an Azure Data Lake Storage Gen2 structure for telemetry data from 25 million devices distributed across seven key geographical regions. Each minute, the devices will send a JSON payload of metrics to Azure Event Hubs.
You need to recommend a folder structure for the data. The solution must meet the following requirements:
✑ Data engineers from each region must be able to build their own pipelines for the data of their respective region only.
✑ The data must be processed at least once every 15 minutes for inclusion in Azure Synapse Analytics serverless SQL pools.
How should you recommend completing the structure? To answer, drag the appropriate values to the correct targets. Each value may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.
NOTE: Each correct selection is worth one point.
Select and Place:

Show Suggested Answer Hide Answer
Suggested Answer:
Box 1: {raw/regionID}
Box 2: {YYYY}/{MM}/{DD}/{HH}/{mm}
Box 3: {deviceID}
Reference:
https://github.com/paolosalvatori/StreamAnalyticsAzureDataLakeStore/blob/master/README.md

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
ItHYMeRIsh
Highly Voted 2 years, 10 months ago
The correct answer is {raw/regionID}/{YYYY}/{MM}/{DD}/{HH}/{mm}/{deviceID}.json {raw/regionID} is the first level because raw is the container name for the raw data. RegionID follows it for ease of managing security. {YYYY}/{MM}/{DD}/{HH}/{mm}/{deviceID}.json instead of {deviceID}/{YYYY}/{MM}/{DD}/{HH}/{mm}.json. The primary reason is that you want your namespace structure to have as few folders as high up and narrow those down as you get deeper into your structure. For example, if you have 1 year worth of data and 25 million devices, using {YYYY}/{MM}/{DD}/{HH}/{mm}/ results in 2.1 million folders (1 year * 12 months * 30 days [estimate] * 24 hours * 60 minutes). If you start your folder structure with {deviceID}, you end up with 25 million folders - one for each device - before you even get to including the date in the hierarchy.
upvoted 208 times
ML_Novice
2 years, 1 month ago
ItHYMeRIsh you re a genius man
upvoted 6 times
...
nmnm22
1 year, 6 months ago
thats such a cool explanation, i aspire to have the same critical thinking skills u have
upvoted 4 times
...
Deeksha1234
2 years, 2 months ago
Agree, correct answer
upvoted 1 times
...
sdokmak
2 years, 4 months ago
I'm geting ~500k folders for 1*12*30*24*60. I get your point that heirarchy would be a lot cleaner.
upvoted 1 times
...
...
gf2tw
Highly Voted 2 years, 10 months ago
raw/RegionId should be in the first box as raw is the name of your container. Furthermore, putting RegionId as one of the first foldernames allows easy partitioning and simpler RBAC for the Data Engineers.
upvoted 15 times
SAli12
2 years, 10 months ago
Yes I agree, raw/regionId --> timestamp --> deviceId.json
upvoted 5 times
...
...
Sathya_sree
Most Recent 4 weeks, 1 day ago
Answer Area Position Value First value regionID Second value raw Third value {YYYY}/{MM}/{DD}/{HH}/{mm}/{deviceID}
upvoted 1 times
...
auwia
1 year, 4 months ago
I'll follow best practice from Microsoft: https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-best-practices#monitor-telemetry So: /raw/regionid/deviceid/YYYY/MM/DD/HH (without minutes).
upvoted 5 times
...
rocky48
1 year, 4 months ago
The correct answer is {raw/regionID}/{YYYY}/{MM}/{DD}/{HH}/{mm}/{deviceID}.json
upvoted 1 times
...
georgich87
2 years, 6 months ago
I think that link will help us to find the correct answer: https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-best-practices The given example for a directory structure is: *{Region}/{SubjectMatter(s)}/{yyyy}/{mm}/{dd}/{hh}/*
upvoted 4 times
...
wwdba
2 years, 7 months ago
{raw/regionID}/{YYYY}/{MM}/{DD}/{HH}/{mm}/{deviceID}.json
upvoted 2 times
...
staniopolis
2 years, 8 months ago
IMHO {YYYY}/{MM}/{DD}/{HH}/{regionID/raw}/{deviceID}.json (given answer) is correct. Please pay attention that there is no minutes {mm} course it is not supported by Time format https://docs.microsoft.com/en-us/azure/stream-analytics/blob-storage-azure-data-lake-gen2-output
upvoted 3 times
...
staniopolis
2 years, 8 months ago
{raw/regionID}/{YYYY}/{MM}/{DD}/{HH}/{deviceID}.json Time Format [optional]: if the time token is used in the prefix path, specify the time format in which your files are organized. Currently the only supported value is HH.
upvoted 3 times
...
Canary_2021
2 years, 9 months ago
Question 54: the correct answer of box 2 is {YYYY}/{MM}/{DD}/{HH}_{deviceType}.json One dataset per hour per deviceType. So looks like regionid and deviceid should be put after {YYYY}/{MM}/{DD}/{HH}/{mm} . {YYYY}/{MM}/{DD}/{HH}/{mm}/{raw/regionID}/{deviceID}.json
upvoted 1 times
Canary_2021
2 years, 9 months ago
Still feel {raw/RegionID} / {YYYY/MM/DD/mm} /{DeviceID} is correct. Just have some questions after compare answers of question 54.
upvoted 2 times
...
...
engrbrain
2 years, 9 months ago
The Question says : Each minute, the devices will send a JSON payload. That means the data is demarcated by region and by minutes. {raw/RegionID} / {YYYY/MM/DD/mm} /{DeviceID}
upvoted 2 times
...
SabaJamal2010AtGmail
2 years, 10 months ago
/{SubjectArea}/{DataSource}/{YYYY}/{MM}/{DD}/{FileData}_{YYYY}_{MM}_{DD}.
upvoted 2 times
...
PA7
2 years, 10 months ago
raw/regionid - > DeviceId -> YYYY/MM/dd/HH-mm
upvoted 4 times
auwia
1 year, 4 months ago
without minute info.
upvoted 2 times
...
...
mr_corte
2 years, 10 months ago
{raw/regionID}/{deviceID}/{YYYY}/{MM}/{DD}/{HH}{mm} imo.
upvoted 4 times
auwia
1 year, 4 months ago
without minute in my opinion
upvoted 2 times
tsmk
1 year, 3 months ago
IMO, with {mm}. Otherwise, every HH dir will have 25mil (device) * 60 (freq. of incoming files)
upvoted 1 times
...
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago