Exam DP-100 All Questions

View all questions & answers for the DP-100 exam

Exam DP-100 topic 2 question 46 discussion

Actual exam question from Microsoft's DP-100

Question #: 46
Topic #: 2

You are with a time series dataset in Azure Machine Learning Studio.
You need to split your dataset into training and testing subsets by using the Split Data module.
Which splitting mode should you use?

A. Recommender Split
B. Regular Expression Split
C. Relative Expression Split
D. Split Rows with the Randomized split parameter set to true

Show Suggested Answer

Suggested Answer: C 🗳️

by jameswoo at May 14, 2020, 6:21 p.m.

Comments

Submit Cancel

jameswoo

Highly Voted 4 years, 5 months ago

i think it is C. time-series data means you should split the data by date, otherwise you may have information leaking.

upvoted 58 times

human_ai

3 years, 2 months ago

Nope, my bad... definitely C. Relative Expression split cause it is a time series dataset.

upvoted 5 times

...

human_ai

3 years, 2 months ago

I think the Answer is correct. Since you just want to split into test and training Data. You are NOT trying to SPLIT into CATEGORIES. or dates

upvoted 7 times

...

saegeb2000

Highly Voted 4 years, 3 months ago

This should be C: Relative Expression Split: Use this option whenever you want to apply a condition to a number column. The number can be a date/time field, a column that contains age or dollar amounts, or even a percentage. For example, you might want to divide your dataset based on the cost of the items, group people by age ranges, or separate data by a calendar date. https://docs.microsoft.com/en-us/azure/machine-learning/algorithm-module-reference/split-data

upvoted 29 times

...

84b1989

Most Recent 3 months, 1 week ago

Selected Answer: D

Explanation: When working with a time series dataset in Azure Machine Learning Studio, the Split Rows mode with the Randomized split parameter set to true is the most appropriate choice for splitting the dataset into training and testing subsets. Here's why: Time Series Data Considerations: Time series data has a temporal order, and splitting it randomly ensures that the training and testing subsets are representative of the entire dataset without breaking the time sequence. This helps in maintaining the integrity of the data for model evaluation. Split Rows Mode: The Split Rows mode allows you to specify a fraction of the dataset to be used for training and testing. For example, you can allocate 70% of the data for training and 30% for testing. Randomized Split Parameter: Setting the Randomized split parameter to true ensures that the data is shuffled before splitting, which is crucial for time series data to avoid bias and ensure that the model generalizes well.

upvoted 2 times

...

Lion007

10 months ago

Selected Answer: C

WRONG. The Correct answer is: C The correct method for splitting a time series dataset should consider the sequential nature of the data. The options available in the Split Data component in Azure ML are: 1. Split Rows: This mode is used to simply divide the data into two parts. This mode is generally used when the sequence of data is not a concern. 2. Regular Expression Split: This mode is for dividing the dataset based on a pattern in a text field such as analyzing sentiment. 3. Relative Expression Split: This mode applies to conditions on a number column, which could include date/time fields. For time series data, where the sequence and continuity of data points are important, neither randomization (as in Split Rows with randomization) nor pattern-based splits (as in Regular Expression Split) are appropriate. Instead, the Relative Expression Split, which can handle conditions on date/time fields, is suited for time series data, allowing the dataset to be divided without disrupting the sequence. Therefore, the correct answer should be C. Relative Expression Split.

upvoted 2 times

...

NullVoider_0

10 months, 2 weeks ago

Selected Answer: D

When splitting a time series dataset in Azure Machine Learning Studio, you should use the "Split Rows" option with the "Randomized split" parameter set to false to ensure that the temporal order of the data is preserved. This approach is crucial for maintaining the integrity of time series data in training and testing subsets.

upvoted 3 times

...

dporwal04

10 months, 2 weeks ago

Selected Answer: C

use any tool like search, bard, chatgpt or any other tool but ans is C

upvoted 1 times

...

ymj_000

11 months, 3 weeks ago

I think the answer is D because Randomized split is preferred option when you're creating training and test datasets. See https://learn.microsoft.com/en-us/azure/machine-learning/component-reference/split-data?view=azureml-api-2. This is very sneaky by mentioning this is a time series data which makes me think the answer should be Relative Expression Split.

upvoted 2 times

...

PI_Team

1 year, 3 months ago

Selected Answer: C

For splitting a time series dataset in Azure Machine Learning Studio, the appropriate splitting mode to use is the "Relative Expression Split" (C). The Relative Expression Split mode allows you to split the dataset based on conditions applied to a number column. This number column can be a date/time field, age, dollar amounts, percentages, or any other numerical value. It provides flexibility in defining the splitting criteria based on these numeric conditions. In the context of a time series dataset, you can use the Relative Expression Split mode to split the dataset based on conditions related to the time component, such as dividing data by calendar date, time periods, or specific ranges of dates.

upvoted 2 times

...

RamundiGR

1 year, 8 months ago

it clearly C!! you can check on https://learn.microsoft.com/en-us/azure/machine-learning/component-reference/split-data

upvoted 1 times

...

RamundiGR

1 year, 8 months ago

why the moderator does not bother to correct those answers?

upvoted 2 times

...

NachoPrendes

1 year, 9 months ago

I think D is the correct one to choose random dates belonging to all years in two datasets

upvoted 1 times

...

Edriv

1 year, 10 months ago

why not B?

upvoted 1 times

...

KIshor1212

1 year, 10 months ago

Selected Answer: C

Calendar year A common scenario is to divide a dataset by years. The following expression selects all rows where the values in the column Year are greater than 2010.

upvoted 1 times

...

PremPatrick

1 year, 11 months ago

Selected Answer: C

C should be correct

upvoted 2 times

...

fvil

1 year, 11 months ago

Question about Split Data module and differences between Regular Expression and Relative Expression appears on exam 07/11/2022

upvoted 1 times

...

azurelearner666

2 years, 6 months ago

Selected Answer: C

Selected Answer: C Without any shadow of a doubt ;) In machine learning, train/test split splits the data randomly, as there’s no dependence from one observation to the other. That’s not the case with time series data. Here, you’ll want to use values at the rear of the dataset for testing and everything else for training. Example: Select first 10 years for training and 2 years for testing. https://docs.microsoft.com/en-us/azure/machine-learning/algorithm-module-reference/split-data (check the Relative Expression Split section)

upvoted 1 times

...

sam844

2 years, 7 months ago

C is the correct choice. It is time series data so it has to be split by date which is only Relative Expression

upvoted 1 times

...

Load full discussion...

Exam DP-100 All Questions

View all questions & answers for the DP-100 exam

Exam DP-100 topic 2 question 46 discussion

Comments

jameswoo

human_ai

human_ai

saegeb2000

84b1989

Lion007

NullVoider_0

dporwal04

ymj_000

PI_Team

RamundiGR

RamundiGR

NachoPrendes

Edriv

KIshor1212

PremPatrick

fvil

azurelearner666

sam844

SY0-701