Exam DP-100 All Questions

View all questions & answers for the DP-100 exam

Exam DP-100 topic 3 question 46 discussion

Actual exam question from Microsoft's DP-100

Question #: 46
Topic #: 3

You define a datastore named ml-data for an Azure Storage blob container. In the container, you have a folder named train that contains a file named data.csv.
You plan to use the file to train a model by using the Azure Machine Learning SDK.
You plan to train the model by using the Azure Machine Learning SDK to run an experiment on local compute.
You define a DataReference object by running the following code:

You need to load the training data.
Which code segment should you use?
A.

B.

C.

D.

E.

Show Suggested Answer

Suggested Answer: E
Example:
data_folder = args.data_folder
# Load Train and Test data
train_data = pd.read_csv(os.path.join(data_folder, 'data.csv'))
Reference:
https://www.element61.be/en/resource/azure-machine-learning-services-complete-toolbox-ai

by rishi_ram at June 1, 2021, 7:43 a.m.

Comments

Submit Cancel

trickerk

Highly Voted 3 years, 2 months ago

Given answer is correct cause "data_folder" already has 'train' path. Take a look at: data_ref = ml_data.path('train').as_download(path_on_compute='train_data')

upvoted 14 times

...

rishi_ram

Highly Voted 3 years, 4 months ago

How about answer B as question says data is in 'train' folder ?

upvoted 6 times

trickerk

3 years, 3 months ago

I think the script will run inside the folder, so the absolute path will be returned, so doesn't need to describe the folder's name.

upvoted 1 times

...

NullVoider_0

Most Recent 10 months, 1 week ago

The correct code segment to load the training data is B. This is because you have defined the data reference as data_ref = ml_data.path('train').as_download(path_on_compute='train_data'), which means that the data will be downloaded to the train_data folder on the compute target. Therefore, you need to use os.path.join(data_folder, 'train', 'data.csv') to read the CSV file from the train folder in the ml-data datastore.

upvoted 1 times

...

fhlos

1 year, 3 months ago

B - ChatGPT The correct code segment to load the training data in this scenario is: B. import os import argparse import pandas as pd parser = argparse.ArgumentParser() parser.add_argument('--data-folder', type=str, dest='data_folder') args = parser.parse_args() data_folder = args.data_folder data = pd.read_csv(os.path.join(data_folder, 'train', 'data.csv')) Explanation: The code segment B properly handles the command-line argument parsing using the argparse module and retrieves the data_folder argument. It then uses os.path.join to construct the correct path to the training data file data.csv within the specified data_folder. The other code segments (A, C, D, E) either have syntax errors, incorrect path references, or incorrect argument parsing, which would lead to issues when trying to load the training data. Therefore, the correct code segment is B.

upvoted 2 times

Lion007

9 months, 4 weeks ago

The Correct answer is as given: E We just need to join the data_folder with the CSV file data.csv data = pd.read_csv(os.path.join(data_folder, 'data.csv')) Let me explain why B is WRONG: The DataReference is configured to download the data from a datastore to a local directory named 'train_data' on the compute target: data_ref = ml_data.path('train').as_download(path_on_compute='train_data') In this context, the data_ref object will download the contents of the 'train' folder from the Azure Blob Storage to a local directory called 'train_data' on the compute target. The script_params in the Estimator object then passes this data_ref as an argument for --data-folder: script_params={'--data-folder': data_ref}, Given this setup, the training script (optin E) expects the --data-folder argument to specify the path to the directory where the data.csv file is located. It then reads this CSV file into a pandas DataFrame.

upvoted 2 times

Lion007

9 months, 4 weeks ago

Option B assumes that the data.csv file is still inside the directory 'train' within the data_folder. This is WRONG since the path is already constructed in the data_ref without the need to repeat the 'train' directory, as the data_ref is already pointing to the correct location of the data.csv file.

upvoted 2 times

...

Andrea2

2 years, 4 months ago

I think answer E is correct. Data_ref contains the reference to data, that has been downloaded on the compute at the path train_data. For this reason you can simply add data.csv to load data.

upvoted 4 times

...

YipingRuan

3 years, 3 months ago

Seems different from #43 above (using input)

upvoted 1 times

...

MohsenSic

3 years, 3 months ago

@rishi_ram, I guess it is wrong as we will end up with two ''train"s in the path, one from data_folder and one from "train"

upvoted 2 times

...

Exam DP-100 All Questions

View all questions & answers for the DP-100 exam

Exam DP-100 topic 3 question 46 discussion

Comments

trickerk

rishi_ram

trickerk

NullVoider_0

fhlos

Lion007

Lion007

Andrea2

YipingRuan

MohsenSic

SY0-701