Exam Professional Machine Learning Engineer topic 1 question 233 discussion

Actual exam question from Google's Professional Machine Learning Engineer

Question #: 233
Topic #: 1

[All Professional Machine Learning Engineer Questions]

You are building a custom image classification model and plan to use Vertex AI Pipelines to implement the end-to-end training. Your dataset consists of images that need to be preprocessed before they can be used to train the model. The preprocessing steps include resizing the images, converting them to grayscale, and extracting features. You have already implemented some Python functions for the preprocessing tasks. Which components should you use in your pipeline?

A. DataprocSparkBatchOp and CustomTrainingJobOp
B. DataflowPythonJobOp, WaitGcpResourcesOp, and CustomTrainingJobOp
C. dsl.ParallelFor, dsl.component, and CustomTrainingJobOp
D. ImageDatasetImportDataOp, dsl.component, and AutoMLImageTrainingJobRunOp

Show Suggested Answer

Suggested Answer: B 🗳️

by pikachu007 at Jan. 13, 2024, 8:04 a.m.

Comments

Submit Cancel

guilhermebutzke

Highly Voted 1 year, 2 months ago

Selected Answer: B

My Answer: B Looking for the options, DataflowPythonJobOp can be used for parallelizing the preprocessing tasks, which is suitable for image resizing, converting to grayscale, and extracting features. dsl.ParallelFor could be useful for parallelizing tasks but might not be the most straightforward option for image preprocessing. Generally DataflowPythonJobOp is followed by WaitGcpResourcesOp. https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/fe7d3e4b8edc137d90ec061789b879b7cc8d3854/notebooks/community/ml_ops/stage3/get_started_with_dataflow_flex_template_component.ipynb

upvoted 5 times

...

Dirtie_Sinkie

Most Recent 7 months, 1 week ago

Selected Answer: B

B is definitely right, no doubt

upvoted 2 times

...

pinimichele01

1 year ago

Selected Answer: B

https://cloud.google.com/vertex-ai/docs/pipelines/dataflow-component#dataflowpythonjobop

upvoted 1 times

...

b1a8fae

1 year, 3 months ago

Selected Answer: B

I go with B. Custom training is surely required. Discarding A because Spark is not mentioned anywhere in the problem description. C involves Kubeflow which seems a bit overkill imo. DataflowPythonJobOp operator lets you create a Vertex AI Pipelines component that prepares data -> seems like the appropriate course of action to me. https://cloud.google.com/vertex-ai/docs/pipelines/dataflow-component#dataflowpythonjobop

upvoted 2 times

...

pikachu007

1 year, 3 months ago

Selected Answer: B

A. DataprocSparkBatchOp: While capable of data processing, it's less well-suited for image-specific tasks like resizing and grayscale conversion compared to DataflowPythonJobOp. C. dsl.ParallelFor, dsl.component: While offering flexibility, they require more manual orchestration and potentially less efficient for image preprocessing compared to DataflowPythonJobOp. D. ImageDatasetImportDataOp, AutoMLImageTrainingJobRunOp: These components are designed for AutoML Image training, not directly compatible with custom preprocessing and training tasks.

upvoted 1 times

...

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 233 discussion

Comments

guilhermebutzke

Dirtie_Sinkie

pinimichele01

b1a8fae

pikachu007

SY0-701