Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 2 question 52 discussion

Actual exam question from Microsoft's DP-203

Question #: 52
Topic #: 2

HOTSPOT -
You are building an Azure Data Factory solution to process data received from Azure Event Hubs, and then ingested into an Azure Data Lake Storage Gen2 container.
The data will be ingested every five minutes from devices into JSON files. The files have the following naming pattern.
/{deviceType}/in/{YYYY}/{MM}/{DD}/{HH}/{deviceID}_{YYYY}{MM}{DD}HH}{mm}.json
You need to prepare the data for batch data processing so that there is one dataset per hour per deviceType. The solution must minimize read times.
How should you configure the sink for the copy activity? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:

Show Suggested Answer

Suggested Answer:

Box 1: @trigger().startTime -
startTime: A date-time value. For basic schedules, the value of the startTime property applies to the first occurrence. For complex schedules, the trigger starts no sooner than the specified startTime value.
Box 2: /{YYYY}/{MM}/{DD}/{HH}_{deviceType}.json
One dataset per hour per deviceType.

Box 3: Flatten hierarchy -
- FlattenHierarchy: All files from the source folder are in the first level of the target folder. The target files have autogenerated names.
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/concepts-pipeline-execution-triggers https://docs.microsoft.com/en-us/azure/data-factory/connector-file-system

by ItHYMeRIsh at Dec. 14, 2021, 7:50 p.m.

Comments

Submit Cancel

ItHYMeRIsh

Highly Voted 3 years, 1 month ago

The correct copy behavior is merge - not flatten hierarchy. The question starts with a folder structure as the following: /{deviceType}/in/{YYYY}/{MM}/{DD}/{HH}/{deviceID}_{YYYY}{MM}{DD}HH}{mm}.json It indicates there are multiple device ID JSON files per deviceType. Those need to be merged to get the target naming pattern - "one file per device type per hour." The target naming pattern is the following: /{YYYY}/{MM}/{DD}/{HH}_{deviceType}.json The correct copy behavior is "Merge" because there are multiple files in the source folder that are merged into a single folder per device type per hour.

upvoted 104 times

Bro111

2 years, 1 month ago

Why not /{deviceType}/out/{YYYY}/{MM}/{DD}/{HH}.json ?

upvoted 3 times

sensaint

2 years, 1 month ago

It is not an option. It says /{deviceID}/out/{YYYY}/{MM}/{DD}/{HH}.json

upvoted 6 times

...

onyerleft

Highly Voted 3 years, 1 month ago

1) @trigger().outputs.windowStartTime - this output is from a tumbling window trigger, and is required to identify the correct directory at the /{HH}/ level. Using windowStartTime will give the hour with complete data. The @trigger().startTime is for a schedule trigger, which corresponds to the hour for which data has not arrived yet. 2) /{YYYY}/{MM}/{DD}/{HH}_{deviceType}.json is the naming pattern to achieve an hourly dataset for each device type. 3) Multiple files for each device type will exist on the source side, since the naming pattern starts with {deviceID}... so the files must be merged in the sink to create a single file per device type.

upvoted 90 times

Davico93

2 years, 7 months ago

but, the solution must minimize read times, I think is @trigger().startTime

upvoted 2 times

...

renan_ineu

Most Recent 4 months, 1 week ago

1. @trigger().startTime - the window trigger requires a window, this does not necessarally is the case in the question - the oders are using commas. Could be @pipeline().TriggerTime, bit it uses a dot. 2. /yyyy/mm/dd/hh_deviceType.json - deviceID would not aggregate all devices by type - /deviceType.json would not split by hour - hh.json would not split by device type 3. Merge - Dynamic content requires reading the content - Flatten would not merge data into one file

upvoted 1 times

...

ELJORDAN23

1 year ago

Got this question on my exam on january 17, I answered these @trigger().StartTime /{YYYY}/{MM}/{DD}/{HH}_{deviceType}.json Merge files I passed :)

upvoted 10 times

j888

12 months ago

Merge is a better answer. Flatten hierarchy does not reduce the number of files in the directory. https://www.linkedin.com/pulse/copy-behaviour-activity-adf-lokesh-sharma/

upvoted 2 times

...

blazy002

1 year, 1 month ago

The files must be MERGED > each hour, so on @trigger().outputs.windowStartTime => start time of the window The author made 2/3 errors on this Q, grrr :) @trigger().outputs.windowStartTime: Gives the start time of the current window n @trigger().StartTime: Gives the start time of each trigger within that window n

upvoted 2 times

...

phydev

1 year, 3 months ago

Was on my exam today (31.10.2023).

upvoted 6 times

...

Chemmangat

1 year, 4 months ago

It's @trigger().outputs.windowStartTime Ref : https://learn.microsoft.com/en-us/azure/data-factory/control-flow-system-variables Under Tumbling Window Trigger

upvoted 1 times

...

kkk5566

1 year, 4 months ago

1) @trigger().outputs.windowStartTime /{YYYY}/{MM}/{DD}/{HH}_{deviceType}.json Merge

upvoted 1 times

...

pavankr

1 year, 6 months ago

what exactly you want to "FLATTEN"??? You need to Merge files. period.

upvoted 2 times

...

rocky48

1 year, 8 months ago

1) @trigger().outputs.windowStartTime 2) /{YYYY}/{MM}/{DD}/{HH}_{deviceType}.json 3) Merge

upvoted 9 times

...

rzeng

2 years, 3 months ago

1. windowstarttime 2. yyyy/mm/dd/hh_devicetype.json 3. Merge

upvoted 6 times

...

Deeksha1234

2 years, 5 months ago

1) @trigger().outputs.windowStartTime 2) /{YYYY}/{MM}/{DD}/{HH}_{deviceType}.json 3) Merge agree with onyer

upvoted 3 times

...

Rafafouille76

2 years, 11 months ago

Of course it is a merge, can't believe the official provided answers are so wrong ... Who wrote that

upvoted 10 times

kamil_k

2 years, 10 months ago

I know it's almost as bad as Microsoft documentation about Azure.. That's why we see so much confusion over so many questions

upvoted 4 times

...

Jaws1990

3 years ago

Would you have to delay the tumbling processing by 60minutes to pick up data that hasn't arrived for that hour yet?

upvoted 1 times

...

Canary_2021

3 years ago

The batch job runs in Data Factory should use Tumbling window trigger, so system variable trigger().outputs.windowStartTime should be passed in as the parameter.

upvoted 3 times

...

jv2120

3 years, 1 month ago

data is generated every 5 min but output needs every 1 hour/device it, it needs to merge files to achieve this.

upvoted 2 times

...

tony4fit

3 years, 1 month ago

The answers are correct. Flatten Hierarchy. https://vmfocus.com/2019/01/09/using-azure-data-factory-to-copy-data-between-azure-file-shares-part-1/

upvoted 2 times

Aditya0891

2 years, 7 months ago

think logically what flatten and merge means and what is asked in the question

upvoted 4 times

...

Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 2 question 52 discussion

Comments

ItHYMeRIsh

Bro111

sensaint

onyerleft

Davico93

renan_ineu

ELJORDAN23

j888

blazy002

phydev

Chemmangat

kkk5566

pavankr

rocky48

rzeng

Deeksha1234

Rafafouille76

kamil_k

Jaws1990

Canary_2021

jv2120

tony4fit

Aditya0891

SY0-701