Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 279 discussion

Actual exam question from Google's Professional Machine Learning Engineer

Question #: 279
Topic #: 1

[All Professional Machine Learning Engineer Questions]

You are creating an ML pipeline for data processing, model training, and model deployment that uses different Google Cloud services. You have developed code for each individual task, and you expect a high frequency of new files. You now need to create an orchestration layer on top of these tasks. You only want this orchestration pipeline to run if new files are present in your dataset in a Cloud Storage bucket. You also want to minimize the compute node costs. What should you do?

A. Create a pipeline in Vertex AI Pipelines. Configure the first step to compare the contents of the bucket to the last time the pipeline was run. Use the scheduler API to run the pipeline periodically.
B. Create a Cloud Function that uses a Cloud Storage trigger and deploys a Cloud Composer directed acyclic graph (DAG).
C. Create a pipeline in Vertex AI Pipelines. Create a Cloud Function that uses a Cloud Storage trigger and deploys the pipeline.
D. Deploy a Cloud Composer directed acyclic graph (DAG) with a GCSObjectUpdateSensor class that detects when a new file is added to the Cloud Storage bucket.

Show Suggested Answer

Suggested Answer: C 🗳️

by guilhermebutzke at Feb. 19, 2024, 3:15 p.m.

Comments

Submit Cancel

fitri001

Highly Voted 11 months, 1 week ago

Selected Answer: C

Option C appears to be the best choice for balancing the requirements of efficient orchestration, cost minimization, and ensuring the pipeline only runs when new files are present. By using a Cloud Function triggered by Cloud Storage events to deploy a Vertex AI Pipeline, you can leverage the event-driven model of Cloud Functions to minimize unnecessary runs and associated costs, while still using the powerful orchestration capabilities of Vertex AI Pipelines.

upvoted 5 times

fitri001

11 months, 1 week ago

why not D? Pros: Cloud Composer provides a powerful orchestration framework that can handle complex dependencies and workflows.GCSObjectUpdateSensor can efficiently detect new files in the bucket and trigger the pipeline. Cons: Cloud Composer can be relatively costly due to the continuous operation of its environment. Overhead of maintaining Cloud Composer for potentially simple file-triggered tasks.

upvoted 1 times

tardigradum

8 months, 3 weeks ago

I think we should use Cloud Composer here because of "that uses different Google Cloud services". Vertex AI is less integrated with the rest of services than Cloud Composer, which was designed exactly for that.

upvoted 1 times

...

juliorevk

Most Recent 5 months, 1 week ago

Probably C because while D would be good, it specifically says to minimize compute costs which cloud composer does incur whereas C is more serverless.

upvoted 2 times

...

Foxy2021

6 months, 2 weeks ago

My answer is D: While C (Cloud Function + Vertex AI Pipelines) is a viable approach for triggering ML pipelines, D (Cloud Composer DAG with GCSObjectUpdateSensor) is the more appropriate and scalable solution when your orchestration spans multiple Google Cloud services and you want to minimize costs by only triggering the pipeline when new files appear.

upvoted 1 times

...

tardigradum

8 months, 3 weeks ago

Selected Answer: D

The key here is "that uses different Google Cloud services". Taking this into account, Cloud Composer is the correct answer (for instance, Vertex AI pipelines is not integrated with classic Dataproc or Cloud Composer DAGs). Moreover, GCSObjectUpdateSensor is more efficient than a Cloud Function.

upvoted 1 times

...

Kili1

11 months, 1 week ago

Selected Answer: D

"Different Google Cloud services" and GCSObjectUpdateSensor: This sensor class specifically checks for updates to Cloud Storage objects. This ensures the DAG only triggers when there's a new file in the bucket, minimizing unnecessary executions.

upvoted 1 times

...

CHARLIE2108

1 year, 1 month ago

Why not D?

upvoted 1 times

...

Yan_X

1 year, 1 month ago

Selected Answer: C

C Cloud Function to be triggered by Cloud storage trigger, and then deploy the Vertex AI pipeline.

upvoted 3 times

...

JG123

1 year, 1 month ago

Its C. Vertex pipelines are recommened to run ML pipeline!

upvoted 1 times

...

guilhermebutzke

1 year, 2 months ago

Selected Answer: B

My Answer: B Cloud Function that uses a Cloud Storage trigger (”run if new files are present in your dataset in a Cloud Storage bucket”) and Cloud Composer directed acyclic graph (DAG) (”model deployment that uses different Google Cloud services”, ”orchestration layer on top of these tasks”,)

upvoted 1 times

tavva_prudhvi

1 year, 2 months ago

Cloud Composer already provides a way to orchestrate tasks, and creating a Cloud Function to deploy a DAG is not a common practice. The Cloud Function with a Cloud Storage trigger would be redundant since the GCSObjectUpdateSensor within the DAG itself can handle the file detection.

upvoted 5 times

...

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 279 discussion

Comments

fitri001

fitri001

tardigradum

juliorevk

Foxy2021

tardigradum

Kili1

CHARLIE2108

Yan_X

JG123

guilhermebutzke

tavva_prudhvi

SY0-701