Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 70 discussion

Actual exam question from Google's Professional Machine Learning Engineer

Question #: 70
Topic #: 1

[All Professional Machine Learning Engineer Questions]

You lead a data science team at a large international corporation. Most of the models your team trains are large-scale models using high-level TensorFlow APIs on AI Platform with GPUs. Your team usually takes a few weeks or months to iterate on a new version of a model. You were recently asked to review your team’s spending. How should you reduce your Google Cloud compute costs without impacting the model’s performance?

A. Use AI Platform to run distributed training jobs with checkpoints.
B. Use AI Platform to run distributed training jobs without checkpoints.
C. Migrate to training with Kuberflow on Google Kubernetes Engine, and use preemptible VMs with checkpoints.
D. Migrate to training with Kuberflow on Google Kubernetes Engine, and use preemptible VMs without checkpoints.

Show Suggested Answer

Suggested Answer: C 🗳️

by YangG at Dec. 9, 2022, 3:52 a.m.

Comments

Submit Cancel

seifou

Highly Voted 2 years, 6 months ago

Selected Answer: C

https://cloud.google.com/blog/products/ai-machine-learning/reduce-the-costs-of-ml-workflows-with-preemptible-vms-and-gpus?hl=en

upvoted 11 times

...

sashimii14

Most Recent 7 months, 4 weeks ago

Selected Answer: C

C for me

upvoted 1 times

...

PhilipKoku

1 year ago

Selected Answer: C

C) Preemptible VMs with Check points

upvoted 1 times

...

MultiCloudIronMan

1 year, 3 months ago

Selected Answer: C

Pre-emptive VMs are cheaper and checkpoints will enable termination if the result is acceptable

upvoted 3 times

...

libo1985

1 year, 9 months ago

I guess distributed training is not cheap. So C.

upvoted 1 times

...

joaquinmenendez

1 year, 9 months ago

C is the best approach because it allows you to reduce your compute costs without impacting the model's performance. Preemptible VMs are much cheaper than standard VMs, but they can be terminated at any time. By using checkpoints, you can ensure that your training job can be resumed if a preemptible VM is terminated. Also, even if training takes days, the checkpoints will prevent lossing the progress if preemtible VM are down.

upvoted 4 times

...

Liting

1 year, 12 months ago

Selected Answer: C

Optimize cost then should use kubeflow

upvoted 2 times

...

M25

2 years, 1 month ago

Selected Answer: C

Went with C

upvoted 1 times

...

CloudKida

2 years, 1 month ago

Selected Answer: C

https://cloud.google.com/ai-platform/prediction/docs/ai-explanations/overview AI Explanations helps you understand your model's outputs for classification and regression tasks. Whenever you request a prediction on AI Platform, AI Explanations tells you how much each feature in the data contributed to the predicted result. You can then use this information to verify that the model is behaving as expected, recognize bias in your models, and get ideas for ways to improve your model and your training data.

upvoted 1 times

...

_learner_

2 years, 1 month ago

Selected Answer: A

preemtible vm are valid for 24hrs. Hence training needs months to complete which is mentioned in question that makes A is answer.

upvoted 2 times

...

tavva_prudhvi

2 years, 3 months ago

Additionally, AI Platform's autoscaling feature can automatically adjust the number of resources used based on the workload, further optimizing costs.

upvoted 1 times

tavva_prudhvi

2 years, 3 months ago

I think it’s a. By using distributed training jobs with checkpoints, you can train your models on multiple GPUs simultaneously, which reduces the training time. Checkpoints allow you to save the progress of your training jobs regularly, so if the training job gets interrupted or fails, you can restart it from the last checkpoint instead of starting from scratch. This saves time and resources, which reduces costs. Additionally, AI Platform's autoscaling feature can automatically adjust the number of resources used based on the workload, further optimizing costs.

upvoted 1 times

...

John_Pongthorn

2 years, 5 months ago

C is out of date ? AI Platform is Vertex-AI ,so , this is a simple scenario that would accommodate infrastructure for this case.

upvoted 1 times

...

ares81

2 years, 5 months ago

Selected Answer: A

It's A.

upvoted 2 times

...

hiromi

2 years, 6 months ago

Selected Answer: C

It's seem C - https://www.kubeflow.org/docs/distributions/gke/pipelines/preemptible/ - https://cloud.google.com/optimization/docs/guide/checkpointing

upvoted 4 times

...

ares81

2 years, 6 months ago

"A Preemptible VM (PVM) is a Google Compute Engine (GCE) virtual machine (VM) instance that can be purchased for a steep discount as long as the customer accepts that the instance will terminate after 24 hours." This excludes C and D. Checkpoints are needed for long processing, so A.

upvoted 3 times

...

neochaotic

2 years, 6 months ago

Selected Answer: C

C - Reduce cost with preemptive instances and add checkpoints to snapshot intermediate results

upvoted 3 times

...

LearnSodas

2 years, 6 months ago

Selected Answer: A

Saving checkpoints avoids re-run from scratch

upvoted 2 times

...

Load full discussion...