Exam Professional Data Engineer topic 1 question 313 discussion

Actual exam question from Google's Professional Data Engineer

Question #: 313
Topic #: 1

[All Professional Data Engineer Questions]

You want to migrate an Apache Spark 3 batch job from on-premises to Google Cloud. You need to minimally change the job so that the job reads from Cloud Storage and writes the result to BigQuery. Your job is optimized for Spark, where each executor has 8 vCPU and 16 GB memory, and you want to be able to choose similar settings. You want to minimize installation and management effort to run your job. What should you do?

A. Execute the job as part of a deployment in a new Google Kubernetes Engine cluster.
B. Execute the job from a new Compute Engine VM.
C. Execute the job in a new Dataproc cluster.
D. Execute as a Dataproc Serverless job.

Show Suggested Answer

Suggested Answer: D 🗳️

by mcdaley at Dec. 7, 2024, 2:45 p.m.

Comments

Submit Cancel

chicity_de

Highly Voted 1 month, 3 weeks ago

Selected Answer: D

Priority is "minimize installation and management effort" which is done via Dataproc Serverless. Furthermore, with Dataproc serverless you can still specify resource settings for your job, such as the number of vCPUs and memory per executor (https://cloud.google.com/dataproc-serverless/docs/concepts/properties)

upvoted 6 times

...

plum21

Most Recent 1 day, 11 hours ago

Selected Answer: C

It's not possible to specify a machine type using Dataproc Serverless

upvoted 1 times

...

marlon.andrei

3 weeks ago

Selected Answer: C

I choice "C", just: "where each executor has 8 vCPU and 16 GB memory, and you want to be able to choose similar settings"

upvoted 1 times

...

Pime13

1 month ago

Selected Answer: D

Dataproc Serverless allows you to run Spark jobs without needing to manage the underlying infrastructure. It automatically handles resource provisioning and scaling, which simplifies the process and reduces management overhead

upvoted 1 times

...

mcdaley

2 months ago

Selected Answer: C

Dataproc supports Spark 3, ensuring compatibility with your existing job. It also allows you to customize the cluster configuration, including the number of executors, vCPUs, and memory per executor, to match your on-premises setup (8 vCPU and 16 GB memory)

upvoted 1 times

...

Exam Professional Data Engineer All Questions

View all questions & answers for the Professional Data Engineer exam

Exam Professional Data Engineer topic 1 question 313 discussion

Comments

chicity_de

plum21

marlon.andrei

Pime13

mcdaley

SY0-701