Exam Professional Machine Learning Engineer topic 1 question 287 discussion

Actual exam question from Google's Professional Machine Learning Engineer

Question #: 287
Topic #: 1

[All Professional Machine Learning Engineer Questions]

You are tasked with building an MLOps pipeline to retrain tree-based models in production. The pipeline will include components related to data ingestion, data processing, model training, model evaluation, and model deployment. Your organization primarily uses PySpark-based workloads for data preprocessing. You want to minimize infrastructure management effort. How should you set up the pipeline?

A. Set up a TensorFlow Extended (TFX) pipeline on Vertex AI Pipelines to orchestrate the MLOps pipeline. Write a custom component for the PySpark-based workloads on Dataproc.
B. Set up a Vertex AI Pipelines to orchestrate the MLOps pipeline. Use the predefined Dataproc component for the PySpark-based workloads.
C. Set up Kubeflow Pipelines on Google Kubernetes Engine to orchestrate the MLOps pipeline. Write a custom component for the PySparkbased workloads on Dataproc.
D. Set up Cloud Composer to orchestrate the MLOps pipeline. Use Dataproc workflow templates for the PySpark-based workloads in Cloud Composer.

Show Suggested Answer

Suggested Answer: B 🗳️

by carolctech at Oct. 26, 2024, 6:47 p.m.

Comments

Submit Cancel

Pau1234

4 months, 2 weeks ago

Selected Answer: B

minimize infrastructure management effort -- hence B

upvoted 1 times

...

Omi_04040

4 months, 2 weeks ago

Selected Answer: B

A- Rejected due to component for the PySpark-based C- Kubeflow Pipelines not a managed service and the question mentions 'minimize infrastructure management effort' D-

upvoted 1 times

Omi_04040

4 months, 2 weeks ago

D- Cloud Composer to orchestrate is an overhead hence B

upvoted 1 times

...

AB_C

5 months ago

Selected Answer: B

This is the most suitable approach

upvoted 2 times

...

carolctech

6 months ago

Selected Answer: B

B) Best option due to higher ease of use, integration with existing PySpark infrastructure (via Dataproc) and minimal infrastructure management overhead, because: Vertex AI Pipelines is fully managed, minimizing infra management effort and natively integrated with Dataproc for PySpark (while Composer is not); Dataproc’s predefined component for PySpark workload reduces effort and error probability; It is suitable for tree-based models (other options are too, but with more effort)

upvoted 2 times

...

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 287 discussion

Comments

Pau1234

Omi_04040

Omi_04040

AB_C

carolctech

SY0-701