exam questions

Exam DP-100 All Questions

View all questions & answers for the DP-100 exam

Exam DP-100 topic 3 question 46 discussion

Actual exam question from Microsoft's DP-100
Question #: 46
Topic #: 3
[All DP-100 Questions]

You define a datastore named ml-data for an Azure Storage blob container. In the container, you have a folder named train that contains a file named data.csv.
You plan to use the file to train a model by using the Azure Machine Learning SDK.
You plan to train the model by using the Azure Machine Learning SDK to run an experiment on local compute.
You define a DataReference object by running the following code:

You need to load the training data.
Which code segment should you use?
A.

B.

C.

D.

E.

Show Suggested Answer Hide Answer
Suggested Answer: E
Example:
data_folder = args.data_folder
# Load Train and Test data
train_data = pd.read_csv(os.path.join(data_folder, 'data.csv'))
Reference:
https://www.element61.be/en/resource/azure-machine-learning-services-complete-toolbox-ai

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
trickerk
Highly Voted 3 years, 2 months ago
Given answer is correct cause "data_folder" already has 'train' path. Take a look at: data_ref = ml_data.path('train').as_download(path_on_compute='train_data')
upvoted 14 times
...
rishi_ram
Highly Voted 3 years, 4 months ago
How about answer B as question says data is in 'train' folder ?
upvoted 6 times
trickerk
3 years, 3 months ago
I think the script will run inside the folder, so the absolute path will be returned, so doesn't need to describe the folder's name.
upvoted 1 times
...
...
NullVoider_0
Most Recent 10 months, 1 week ago
The correct code segment to load the training data is B. This is because you have defined the data reference as data_ref = ml_data.path('train').as_download(path_on_compute='train_data'), which means that the data will be downloaded to the train_data folder on the compute target. Therefore, you need to use os.path.join(data_folder, 'train', 'data.csv') to read the CSV file from the train folder in the ml-data datastore.
upvoted 1 times
...
fhlos
1 year, 3 months ago
B - ChatGPT The correct code segment to load the training data in this scenario is: B. import os import argparse import pandas as pd parser = argparse.ArgumentParser() parser.add_argument('--data-folder', type=str, dest='data_folder') args = parser.parse_args() data_folder = args.data_folder data = pd.read_csv(os.path.join(data_folder, 'train', 'data.csv')) Explanation: The code segment B properly handles the command-line argument parsing using the argparse module and retrieves the data_folder argument. It then uses os.path.join to construct the correct path to the training data file data.csv within the specified data_folder. The other code segments (A, C, D, E) either have syntax errors, incorrect path references, or incorrect argument parsing, which would lead to issues when trying to load the training data. Therefore, the correct code segment is B.
upvoted 2 times
Lion007
9 months, 4 weeks ago
The Correct answer is as given: E We just need to join the data_folder with the CSV file data.csv data = pd.read_csv(os.path.join(data_folder, 'data.csv')) Let me explain why B is WRONG: The DataReference is configured to download the data from a datastore to a local directory named 'train_data' on the compute target: data_ref = ml_data.path('train').as_download(path_on_compute='train_data') In this context, the data_ref object will download the contents of the 'train' folder from the Azure Blob Storage to a local directory called 'train_data' on the compute target. The script_params in the Estimator object then passes this data_ref as an argument for --data-folder: script_params={'--data-folder': data_ref}, Given this setup, the training script (optin E) expects the --data-folder argument to specify the path to the directory where the data.csv file is located. It then reads this CSV file into a pandas DataFrame.
upvoted 2 times
Lion007
9 months, 4 weeks ago
Option B assumes that the data.csv file is still inside the directory 'train' within the data_folder. This is WRONG since the path is already constructed in the data_ref without the need to repeat the 'train' directory, as the data_ref is already pointing to the correct location of the data.csv file.
upvoted 2 times
...
...
...
Andrea2
2 years, 4 months ago
I think answer E is correct. Data_ref contains the reference to data, that has been downloaded on the compute at the path train_data. For this reason you can simply add data.csv to load data.
upvoted 4 times
...
YipingRuan
3 years, 3 months ago
Seems different from #43 above (using input)
upvoted 1 times
...
MohsenSic
3 years, 3 months ago
@rishi_ram, I guess it is wrong as we will end up with two ''train"s in the path, one from data_folder and one from "train"
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago