Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 202 discussion

Actual exam question from Google's Professional Machine Learning Engineer

Question #: 202
Topic #: 1

[All Professional Machine Learning Engineer Questions]

You work for a startup that has multiple data science workloads. Your compute infrastructure is currently on-premises, and the data science workloads are native to PySpark. Your team plans to migrate their data science workloads to Google Cloud. You need to build a proof of concept to migrate one data science job to Google Cloud. You want to propose a migration process that requires minimal cost and effort. What should you do first?

A. Create a n2-standard-4 VM instance and install Java, Scala, and Apache Spark dependencies on it.
B. Create a Google Kubernetes Engine cluster with a basic node pool configuration, install Java, Scala, and Apache Spark dependencies on it.
C. Create a Standard (1 master, 3 workers) Dataproc cluster, and run a Vertex AI Workbench notebook instance on it.
D. Create a Vertex AI Workbench notebook with instance type n2-standard-4.

Show Suggested Answer

Suggested Answer: C 🗳️

by pikachu007 at Jan. 13, 2024, 4:23 a.m.

Comments

Submit Cancel

lunalongo

7 months ago

Selected Answer: C

C is the right answer because it ensures: Cost-effectiveness: Dataproc is managed and you only pay for the compute time used, which is cost-effective for a POC. A standard cluster is enough for the task. Ease of use: Dataproc simplifies the process of setting up and managing a Spark cluster Minimal effort: a Dataproc cluster + a Vertex AI Workbench instance is a straightforward process through the console or command-line tools, minimizing setup time and effort compared to manually configuring VMs or Kubernetes clusters. *A and B include manual installation steps; D creates a notebook environment but it's not enough to run a PySpark job.

upvoted 1 times

...

DaleR

7 months, 1 week ago

D. Just ran a pilot on Workbench

upvoted 1 times

...

f084277

7 months, 3 weeks ago

Selected Answer: D

D. "minimal cost and effort". There's only one answer.

upvoted 1 times

...

baimus

9 months, 3 weeks ago

Selected Answer: C

C and D are both valid, as people point out you can technically have Spark preinstalled on D. But this is for a proof of concept for the real design. The concept is not proved by using a notebook, as it's not best practice. Therefore C makes more sense, and is still low effort as it's managed.

upvoted 1 times

...

AK2020

11 months ago

Selected Answer: C

C is the answer

upvoted 1 times

...

TanTran04

12 months ago

Selected Answer: C

I'm following option C. Please take a look the concept of 'Dataproc documentation' (ref: https://cloud.google.com/dataproc/docs) With option D: doesn't provide a solution for managing and scaling the Spark environment, which is necessary for running PySpark workloads.

upvoted 2 times

...

fitri001

1 year, 2 months ago

Selected Answer: D

Vertex AI Workbench notebook: This option provides a pre-configured environment with popular data science libraries like PySpark already installed. It allows you to focus on migrating your PySpark code with minimal changes. n2-standard-4 instance type: This is a general-purpose machine type suitable for various data science tasks. It offers a good balance between cost and performance for initial exploration.

upvoted 1 times

Jason_Cloud_at

10 months, 1 week ago

Option D doesnt provide Pyspark out of the box, you have to manually install it wherelse in C dataproc is managed spark and hadoop services which supports running pyspark services right away.

upvoted 1 times

...

pinimichele01

1 year, 2 months ago

https://cloud.google.com/architecture/hadoop/migrating-apache-spark-jobs-to-cloud-dataproc#overview why not c?

upvoted 1 times

...

fitri001

1 year, 2 months ago

A. Create a n2-standard-4 VM instance: This option requires manually installing Java, Scala, and Spark dependencies, which is time-consuming and prone to errors. It also involves managing the VM instance lifecycle, increasing complexity. B. Create a Google Kubernetes Engine cluster: Setting up and managing a Kubernetes cluster for a single job is overkill for a proof of concept. It adds unnecessary complexity and cost. C. Create a Standard Dataproc cluster: While Dataproc is a managed Spark environment on GCP, setting up a full cluster (master and workers) might be more resource-intensive than needed for a single job, especially for a proof of concept.

upvoted 1 times

...

gscharly

1 year, 2 months ago

Selected Answer: D

went with D: https://cloud.google.com/vertex-ai/docs/workbench/instances/create-dataproc-enabled

upvoted 2 times

pinimichele01

1 year, 2 months ago

https://cloud.google.com/architecture/hadoop/migrating-apache-spark-jobs-to-cloud-dataproc#overview

upvoted 1 times

...

pinimichele01

1 year, 2 months ago

Selected Answer: C

When you want to move your Apache Spark workloads from an on-premises environment to Google Cloud, we recommend using Dataproc to run Apache Spark/Apache Hadoop clusters. https://cloud.google.com/architecture/hadoop/migrating-apache-spark-jobs-to-cloud-dataproc#overview

upvoted 1 times

...

Yan_X

1 year, 3 months ago

Selected Answer: D

D Can use Notebook pre-installed libraries and tools, including PySpark.

upvoted 2 times

...

Carlose2108

1 year, 4 months ago

Selected Answer: D

My bad, I mean is Option D.

upvoted 1 times

...

Carlose2108

1 year, 4 months ago

Selected Answer: C

I went with C. For Proof Of Concept and requires minimal cost and effort. Furthermore, Vertex AI Workbench notebooks come pre-configured with PySpark.

upvoted 2 times

...

guilhermebutzke

1 year, 4 months ago

Selected Answer: C

My answer: C C: This option leverages Google Cloud's Dataproc service, which is designed for running Apache Spark and other big data processing frameworks. By creating a Standard Dataproc cluster, you can easily scale resources as needed for your workload. A. n2-standard-4 VM: This requires manual setup and ongoing maintenance, increasing cost and effort. B. GKE cluster: While offering containerization benefits, it necessitates managing containers and Spark configurations, adding complexity. D. With Vertex AI Workbench, your team can develop, train, and deploy machine learning models using popular frameworks like TensorFlow, PyTorch, and scikit-learn. However, while Vertex AI Workbench supports PySpark, it may not be the optimal choice for migrating existing PySpark workloads, as it's primarily focused on machine learning tasks.

upvoted 4 times

Carlose2108

1 year, 4 months ago

You're right but I have a doubt about in a part of Option D "You need to build a proof of concept to migrate one data science job to Google Cloud"

upvoted 2 times

...

ddogg

1 year, 5 months ago

Selected Answer: C

Agree with BlehMaks https://cloud.google.com/architecture/hadoop/migrating-apache-spark-jobs-to-cloud-dataproc#overview Dataproc cluster seems more suitable

upvoted 2 times

...

shadz10

1 year, 5 months ago

Selected Answer: D

https://cloud.google.com/vertex-ai-notebooks?hl=en Data Data Lake and Spark in one place Whether you use TensorFlow, PyTorch, or Spark, you can run any engine from Vertex AI Workbench. D is correct

upvoted 1 times

...

BlehMaks

1 year, 5 months ago

Selected Answer: C

https://cloud.google.com/architecture/hadoop/migrating-apache-spark-jobs-to-cloud-dataproc#overview

upvoted 2 times

...

pikachu007

1 year, 5 months ago

Selected Answer: D

Minimal setup: Vertex AI Workbench notebooks come pre-configured with PySpark and other data science tools, eliminating the need for manual installation and setup. Cost-effectiveness: Vertex AI Workbench offers managed notebooks with pay-as-you-go pricing, making it a cost-efficient option for proof-of-concept testing. Ease of use: Data scientists can directly run PySpark code in the notebook without managing infrastructure, streamlining the migration process. Scalability: Vertex AI Workbench can easily scale to handle larger workloads or multiple users if the proof-of-concept is successful.

upvoted 1 times

...