You need to train an XGBoost model on a small dataset. Your training code requires custom dependencies. You want to minimize the startup time of your training job. How should you set up your Vertex AI custom training job?
A.
Store the data in a Cloud Storage bucket, and create a custom container with your training application. In your training application, read the data from Cloud Storage and train the model.
B.
Use the XGBoost prebuilt custom container. Create a Python source distribution that includes the data and installs the dependencies at runtime. In your training application, load the data into a pandas DataFrame and train the model.
C.
Create a custom container that includes the data. In your training application, load the data into a pandas DataFrame and train the model.
D.
Store the data in a Cloud Storage bucket, and use the XGBoost prebuilt custom container to run your training application. Create a Python source distribution that installs the dependencies at runtime. In your training application, read the data from Cloud Storage and train the model.
My Answer: A
Focus on “training code requires custom dependencies” and “ minimize the startup time of your training job”, the best choice is A because use custom container and read the data from GCS is he faster way
I select D: While A could work, D is the optimal solution because it balances efficiency, ease of setup, and performance. It minimizes startup time by leveraging Google’s prebuilt XGBoost container and offers flexibility by installing custom dependencies at runtime. This approach avoids the overhead of building and maintaining a custom container from scratch, which is unnecessary for a small dataset with only specific custom dependency needs.
The focus is on startup time, and the dataset is small, so the container should still be of reasonable size.
Downloading data from Cloud Storage introduces a delay.
Given the focus on minimizing startup time, and based on the information about XGBoost prebuilt container dependencies available here https://cloud.google.com/vertex-ai/docs/training/pre-built-containers#xgboost
A: Separate Data and Custom Container is the best approach for minimizing startup time, especially for small datasets. Separating data in Cloud Storage keeps the container image lean, leading to faster download and startup compared to bundling data within the container.
B. The prebuilt Container could have unnecessary components, potentially increasing the image size and impacting startup time.
But the description mentioned it is a small dataset and requires minimizing latency which makes C the best per requirement, there is no mentioning to make the container reusable whatsoever
B
XGBoost prebuilt customer container already includes XGBoost library and all of its dependencies.
Python source distribution to avoid overhead of reading the data from Cloud storage the 2nd time.
Load data to a Pandas DataFrame is convenient to work with Python. Pandas is for data analysis and manipulation.
However, the question specifically says that the training code requires custom dependencies beyond those included in the prebuilt container. Therefore, using the prebuilt container alone would not be sufficient in this case.
&
regarding the use of a Python source distribution to avoid reading data from Cloud Storage multiple times, it's important to consider the trade-off between startup time and potential performance gains. While including the data in the source distribution might save some time during training, it also increases the size of the container and can lead to longer startup times. For small datasets, the overhead of reading data from Cloud Storage is typically negligible compared to the benefits of a smaller container and faster startup.
Also, creating a Python source distribution that includes the data and installs the dependencies at runtime can increase startup time since dependencies have to be installed every time the job runs
upvoted 1 times
...
...
...
Log in to ExamTopics
Sign in:
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one.
So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
guilhermebutzke
Highly Voted 9 months, 1 week agoFoxy2021
Most Recent 1 month, 1 week agowences
1 month, 3 weeks agoomribt
5 months, 1 week agobobjr
5 months, 3 weeks agobobjr
5 months, 3 weeks agoomermahgoub
7 months, 2 weeks agoCHARLIE2108
8 months, 1 week agotavva_prudhvi
7 months, 4 weeks agoraidenrock
6 months, 3 weeks agoYan_X
8 months, 3 weeks agotavva_prudhvi
7 months, 4 weeks agotavva_prudhvi
7 months, 4 weeks ago