exam questions

Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 2 question 52 discussion

Actual exam question from Microsoft's DP-203
Question #: 52
Topic #: 2
[All DP-203 Questions]

HOTSPOT -
You are building an Azure Data Factory solution to process data received from Azure Event Hubs, and then ingested into an Azure Data Lake Storage Gen2 container.
The data will be ingested every five minutes from devices into JSON files. The files have the following naming pattern.
/{deviceType}/in/{YYYY}/{MM}/{DD}/{HH}/{deviceID}_{YYYY}{MM}{DD}HH}{mm}.json
You need to prepare the data for batch data processing so that there is one dataset per hour per deviceType. The solution must minimize read times.
How should you configure the sink for the copy activity? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:

Show Suggested Answer Hide Answer
Suggested Answer:
Box 1: @trigger().startTime -
startTime: A date-time value. For basic schedules, the value of the startTime property applies to the first occurrence. For complex schedules, the trigger starts no sooner than the specified startTime value.
Box 2: /{YYYY}/{MM}/{DD}/{HH}_{deviceType}.json
One dataset per hour per deviceType.

Box 3: Flatten hierarchy -
- FlattenHierarchy: All files from the source folder are in the first level of the target folder. The target files have autogenerated names.
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/concepts-pipeline-execution-triggers https://docs.microsoft.com/en-us/azure/data-factory/connector-file-system

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
ItHYMeRIsh
Highly Voted 3 years ago
The correct copy behavior is merge - not flatten hierarchy. The question starts with a folder structure as the following: /{deviceType}/in/{YYYY}/{MM}/{DD}/{HH}/{deviceID}_{YYYY}{MM}{DD}HH}{mm}.json It indicates there are multiple device ID JSON files per deviceType. Those need to be merged to get the target naming pattern - "one file per device type per hour." The target naming pattern is the following: /{YYYY}/{MM}/{DD}/{HH}_{deviceType}.json The correct copy behavior is "Merge" because there are multiple files in the source folder that are merged into a single folder per device type per hour.
upvoted 103 times
Bro111
2 years ago
Why not /{deviceType}/out/{YYYY}/{MM}/{DD}/{HH}.json ?
upvoted 3 times
sensaint
2 years ago
It is not an option. It says /{deviceID}/out/{YYYY}/{MM}/{DD}/{HH}.json
upvoted 6 times
...
...
...
onyerleft
Highly Voted 2 years, 12 months ago
1) @trigger().outputs.windowStartTime - this output is from a tumbling window trigger, and is required to identify the correct directory at the /{HH}/ level. Using windowStartTime will give the hour with complete data. The @trigger().startTime is for a schedule trigger, which corresponds to the hour for which data has not arrived yet. 2) /{YYYY}/{MM}/{DD}/{HH}_{deviceType}.json is the naming pattern to achieve an hourly dataset for each device type. 3) Multiple files for each device type will exist on the source side, since the naming pattern starts with {deviceID}... so the files must be merged in the sink to create a single file per device type.
upvoted 90 times
Davico93
2 years, 5 months ago
but, the solution must minimize read times, I think is @trigger().startTime
upvoted 2 times
...
...
renan_ineu
Most Recent 2 months, 3 weeks ago
1. @trigger().startTime - the window trigger requires a window, this does not necessarally is the case in the question - the oders are using commas. Could be @pipeline().TriggerTime, bit it uses a dot. 2. /yyyy/mm/dd/hh_deviceType.json - deviceID would not aggregate all devices by type - /deviceType.json would not split by hour - hh.json would not split by device type 3. Merge - Dynamic content requires reading the content - Flatten would not merge data into one file
upvoted 1 times
...
ELJORDAN23
11 months ago
Got this question on my exam on january 17, I answered these @trigger().StartTime /{YYYY}/{MM}/{DD}/{HH}_{deviceType}.json Merge files I passed :)
upvoted 10 times
j888
10 months, 2 weeks ago
Merge is a better answer. Flatten hierarchy does not reduce the number of files in the directory. https://www.linkedin.com/pulse/copy-behaviour-activity-adf-lokesh-sharma/
upvoted 2 times
...
...
blazy002
12 months ago
The files must be MERGED > each hour, so on @trigger().outputs.windowStartTime => start time of the window The author made 2/3 errors on this Q, grrr :) @trigger().outputs.windowStartTime: Gives the start time of the current window n @trigger().StartTime: Gives the start time of each trigger within that window n
upvoted 2 times
...
phydev
1 year, 1 month ago
Was on my exam today (31.10.2023).
upvoted 6 times
...
Chemmangat
1 year, 3 months ago
It's @trigger().outputs.windowStartTime Ref : https://learn.microsoft.com/en-us/azure/data-factory/control-flow-system-variables Under Tumbling Window Trigger
upvoted 1 times
...
kkk5566
1 year, 3 months ago
1) @trigger().outputs.windowStartTime /{YYYY}/{MM}/{DD}/{HH}_{deviceType}.json Merge
upvoted 1 times
...
pavankr
1 year, 5 months ago
what exactly you want to "FLATTEN"??? You need to Merge files. period.
upvoted 2 times
...
rocky48
1 year, 6 months ago
1) @trigger().outputs.windowStartTime 2) /{YYYY}/{MM}/{DD}/{HH}_{deviceType}.json 3) Merge
upvoted 9 times
...
rzeng
2 years, 1 month ago
1. windowstarttime 2. yyyy/mm/dd/hh_devicetype.json 3. Merge
upvoted 6 times
...
Deeksha1234
2 years, 4 months ago
1) @trigger().outputs.windowStartTime 2) /{YYYY}/{MM}/{DD}/{HH}_{deviceType}.json 3) Merge agree with onyer
upvoted 3 times
...
Rafafouille76
2 years, 9 months ago
Of course it is a merge, can't believe the official provided answers are so wrong ... Who wrote that
upvoted 10 times
kamil_k
2 years, 9 months ago
I know it's almost as bad as Microsoft documentation about Azure.. That's why we see so much confusion over so many questions
upvoted 4 times
...
...
Jaws1990
2 years, 11 months ago
Would you have to delay the tumbling processing by 60minutes to pick up data that hasn't arrived for that hour yet?
upvoted 1 times
...
Canary_2021
2 years, 11 months ago
The batch job runs in Data Factory should use Tumbling window trigger, so system variable trigger().outputs.windowStartTime should be passed in as the parameter.
upvoted 3 times
...
jv2120
2 years, 12 months ago
data is generated every 5 min but output needs every 1 hour/device it, it needs to merge files to achieve this.
upvoted 2 times
...
tony4fit
2 years, 12 months ago
The answers are correct. Flatten Hierarchy. https://vmfocus.com/2019/01/09/using-azure-data-factory-to-copy-data-between-azure-file-shares-part-1/
upvoted 2 times
Aditya0891
2 years, 6 months ago
think logically what flatten and merge means and what is asked in the question
upvoted 4 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago