Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 201 discussion

Actual exam question from Google's Professional Machine Learning Engineer

Question #: 201
Topic #: 1

[All Professional Machine Learning Engineer Questions]

You developed a Vertex AI pipeline that trains a classification model on data stored in a large BigQuery table. The pipeline has four steps, where each step is created by a Python function that uses the KubeFlow v2 API. The components have the following names:

You launch your Vertex AI pipeline as the following:

You perform many model iterations by adjusting the code and parameters of the training step. You observe high costs associated with the development, particularly the data export and preprocessing steps. You need to reduce model development costs. What should you do?

A. Change the components’ YAML filenames to export.yaml, preprocess,yaml, f "train-
{dt}.yaml", f"calibrate-{dt).vaml".
B. Add the {"kubeflow.v1.caching": True} parameter to the set of params provided to your PipelineJob.
C. Move the first step of your pipeline to a separate step, and provide a cached path to Cloud Storage as an input to the main pipeline.
D. Change the name of the pipeline to f"my-awesome-pipeline-{dt}".

Show Suggested Answer

Suggested Answer: A 🗳️

by pikachu007 at Jan. 13, 2024, 4:21 a.m.

Comments

Submit Cancel

guilhermebutzke

Highly Voted 9 months, 1 week ago

Selected Answer: A

My Answer: A From what I understood, it's about optimizing the process of adjusting code while utilizing previously processed results from the pipeline. Kubeflow inherently caches these steps, eliminating the need to explicitly store results in a designated path. However, the original filenames include a timestamp (**`-dt`**), suggesting that by removing this timestamp, the pipeline steps might not rerun as expected. Option C could be an approach, but it would require more effort to implement (since Kubeflow handles it automatically). Additionally, the beginning of the option only mentions moving the first step, which is the export, and doesn't say anything about preprocessing (which could be one of the more expensive steps). So, considering all of these factors, I think A is the best choice."

upvoted 5 times

...

f084277

Most Recent 1 week ago

Selected Answer: A

A. The dynamic filename is causing kubeflow to be unable to cache the export and preprocess steps, causing the problems mentioned in the question.

upvoted 1 times

...

Foxy2021

1 month, 1 week ago

I select C: By leveraging a Dataproc cluster, you can maintain compatibility with your existing PySpark jobs, minimize management overhead, and create a scalable proof of concept quickly and efficiently.

upvoted 1 times

...

10 months, 1 week ago

Selected Answer: A

i think it's A. 1)if we want to use the same results several times we shouldn't rename them. so we need to delete {dt} from the first two components names. 2)we already have this option enable_caching = True, why do we need kubeflow.v1.caching then? 3)i'm not sure but may be it does metter

upvoted 2 times

BlehMaks

10 months, 1 week ago

3)i'm not sure but may be it does matter that KubeFlow v2 API and kubeflow.v1.caching have different versions (v1 and v2)

upvoted 1 times

...

pikachu007

10 months, 1 week ago

Selected Answer: B

Enables caching: Setting this parameter instructs Vertex AI Pipelines to cache the outputs of pipeline steps that have successfully completed. This means that if a step's inputs haven't changed, its execution can be skipped, reusing the cached output instead. Targets costly steps: The prompt highlights that data export and preprocessing steps are particularly expensive. Caching these steps can significantly reduce costs during model iterations.

upvoted 2 times

...

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 201 discussion

Comments

guilhermebutzke

f084277

Foxy2021

Foxy2021

gscharly

pinimichele01

Yan_X

shadz10

b1a8fae

BlehMaks

BlehMaks

pikachu007

Get IT Certification

New Version GCP Professional Cloud Architect Certificate & Helpful Information

The 5 Most In-Demand Project Management Certifications of 2019