exam questions

Exam DP-100 All Questions

View all questions & answers for the DP-100 exam

Exam DP-100 topic 2 question 46 discussion

Actual exam question from Microsoft's DP-100
Question #: 46
Topic #: 2
[All DP-100 Questions]

You are with a time series dataset in Azure Machine Learning Studio.
You need to split your dataset into training and testing subsets by using the Split Data module.
Which splitting mode should you use?

  • A. Recommender Split
  • B. Regular Expression Split
  • C. Relative Expression Split
  • D. Split Rows with the Randomized split parameter set to true
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
jameswoo
Highly Voted 4 years, 5 months ago
i think it is C. time-series data means you should split the data by date, otherwise you may have information leaking.
upvoted 58 times
human_ai
3 years, 2 months ago
Nope, my bad... definitely C. Relative Expression split cause it is a time series dataset.
upvoted 5 times
...
human_ai
3 years, 2 months ago
I think the Answer is correct. Since you just want to split into test and training Data. You are NOT trying to SPLIT into CATEGORIES. or dates
upvoted 7 times
...
...
saegeb2000
Highly Voted 4 years, 3 months ago
This should be C: Relative Expression Split: Use this option whenever you want to apply a condition to a number column. The number can be a date/time field, a column that contains age or dollar amounts, or even a percentage. For example, you might want to divide your dataset based on the cost of the items, group people by age ranges, or separate data by a calendar date. https://docs.microsoft.com/en-us/azure/machine-learning/algorithm-module-reference/split-data
upvoted 29 times
...
84b1989
Most Recent 3 months, 1 week ago
Selected Answer: D
Explanation: When working with a time series dataset in Azure Machine Learning Studio, the Split Rows mode with the Randomized split parameter set to true is the most appropriate choice for splitting the dataset into training and testing subsets. Here's why: Time Series Data Considerations: Time series data has a temporal order, and splitting it randomly ensures that the training and testing subsets are representative of the entire dataset without breaking the time sequence. This helps in maintaining the integrity of the data for model evaluation. Split Rows Mode: The Split Rows mode allows you to specify a fraction of the dataset to be used for training and testing. For example, you can allocate 70% of the data for training and 30% for testing. Randomized Split Parameter: Setting the Randomized split parameter to true ensures that the data is shuffled before splitting, which is crucial for time series data to avoid bias and ensure that the model generalizes well.
upvoted 2 times
...
Lion007
10 months ago
Selected Answer: C
WRONG. The Correct answer is: C The correct method for splitting a time series dataset should consider the sequential nature of the data. The options available in the Split Data component in Azure ML are: 1. Split Rows: This mode is used to simply divide the data into two parts. This mode is generally used when the sequence of data is not a concern. 2. Regular Expression Split: This mode is for dividing the dataset based on a pattern in a text field such as analyzing sentiment. 3. Relative Expression Split: This mode applies to conditions on a number column, which could include date/time fields. For time series data, where the sequence and continuity of data points are important, neither randomization (as in Split Rows with randomization) nor pattern-based splits (as in Regular Expression Split) are appropriate. Instead, the Relative Expression Split, which can handle conditions on date/time fields, is suited for time series data, allowing the dataset to be divided without disrupting the sequence. Therefore, the correct answer should be C. Relative Expression Split.
upvoted 2 times
...
NullVoider_0
10 months, 2 weeks ago
Selected Answer: D
When splitting a time series dataset in Azure Machine Learning Studio, you should use the "Split Rows" option with the "Randomized split" parameter set to false to ensure that the temporal order of the data is preserved. This approach is crucial for maintaining the integrity of time series data in training and testing subsets.
upvoted 3 times
...
dporwal04
10 months, 2 weeks ago
Selected Answer: C
use any tool like search, bard, chatgpt or any other tool but ans is C
upvoted 1 times
...
ymj_000
11 months, 3 weeks ago
I think the answer is D because Randomized split is preferred option when you're creating training and test datasets. See https://learn.microsoft.com/en-us/azure/machine-learning/component-reference/split-data?view=azureml-api-2. This is very sneaky by mentioning this is a time series data which makes me think the answer should be Relative Expression Split.
upvoted 2 times
...
PI_Team
1 year, 3 months ago
Selected Answer: C
For splitting a time series dataset in Azure Machine Learning Studio, the appropriate splitting mode to use is the "Relative Expression Split" (C). The Relative Expression Split mode allows you to split the dataset based on conditions applied to a number column. This number column can be a date/time field, age, dollar amounts, percentages, or any other numerical value. It provides flexibility in defining the splitting criteria based on these numeric conditions. In the context of a time series dataset, you can use the Relative Expression Split mode to split the dataset based on conditions related to the time component, such as dividing data by calendar date, time periods, or specific ranges of dates.
upvoted 2 times
...
RamundiGR
1 year, 8 months ago
it clearly C!! you can check on https://learn.microsoft.com/en-us/azure/machine-learning/component-reference/split-data
upvoted 1 times
...
RamundiGR
1 year, 8 months ago
why the moderator does not bother to correct those answers?
upvoted 2 times
...
NachoPrendes
1 year, 9 months ago
I think D is the correct one to choose random dates belonging to all years in two datasets
upvoted 1 times
...
Edriv
1 year, 10 months ago
why not B?
upvoted 1 times
...
KIshor1212
1 year, 10 months ago
Selected Answer: C
Calendar year A common scenario is to divide a dataset by years. The following expression selects all rows where the values in the column Year are greater than 2010.
upvoted 1 times
...
PremPatrick
1 year, 11 months ago
Selected Answer: C
C should be correct
upvoted 2 times
...
fvil
1 year, 11 months ago
Question about Split Data module and differences between Regular Expression and Relative Expression appears on exam 07/11/2022
upvoted 1 times
...
azurelearner666
2 years, 6 months ago
Selected Answer: C
Selected Answer: C Without any shadow of a doubt ;) In machine learning, train/test split splits the data randomly, as there’s no dependence from one observation to the other. That’s not the case with time series data. Here, you’ll want to use values at the rear of the dataset for testing and everything else for training. Example: Select first 10 years for training and 2 years for testing. https://docs.microsoft.com/en-us/azure/machine-learning/algorithm-module-reference/split-data (check the Relative Expression Split section)
upvoted 1 times
...
sam844
2 years, 7 months ago
C is the correct choice. It is time series data so it has to be split by date which is only Relative Expression
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago