Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 4 discussion

Actual exam question from Google's Professional Machine Learning Engineer

Question #: 4
Topic #: 1

[All Professional Machine Learning Engineer Questions]

You want to rebuild your ML pipeline for structured data on Google Cloud. You are using PySpark to conduct data transformations at scale, but your pipelines are taking over 12 hours to run. To speed up development and pipeline run time, you want to use a serverless tool and SQL syntax. You have already moved your raw data into Cloud Storage. How should you build the pipeline on Google Cloud while meeting the speed and processing requirements?

A. Use Data Fusion's GUI to build the transformation pipelines, and then write the data into BigQuery.
B. Convert your PySpark into SparkSQL queries to transform the data, and then run your pipeline on Dataproc to write the data into BigQuery.
C. Ingest your data into Cloud SQL, convert your PySpark commands into SQL queries to transform the data, and then use federated queries from BigQuery for machine learning.
D. Ingest your data into BigQuery using BigQuery Load, convert your PySpark commands into BigQuery SQL queries to transform the data, and then write the transformations to a new table.

Show Suggested Answer

Suggested Answer: D 🗳️

by ralf_cc at July 8, 2021, 12:35 p.m.

Comments

Submit Cancel

nunzio144

Highly Voted 2 months ago

It should be D .... Data Fusion is not SQL syntax ....

upvoted 22 times

q4exam

3 years, 2 months ago

Agree, BQ is the only serverless that support SQL

upvoted 4 times

...

A4M

2 years, 10 months ago

Needs to be D as the most suitable answer given the req's in question Datafusion is more of a no code Data transformation tool

upvoted 1 times

...

Celia20210714

Highly Voted 3 years, 4 months ago

ANS: A https://cloud.google.com/data-fusion#section-1 - Data Fusion is a serverless approach leveraging the scalability and reliability of Google services like Dataproc means Data Fusion offers the best of data integration capabilities with a lower total cost of ownership. - BigQuery is serverless and supports SQL. - Dataproc is not serverless, you have to manage clusters. - Cloud SQL is not serverless, you have to manage instances.

upvoted 12 times

TornikePirveli

3 months, 1 week ago

By your logic it should be D, because BQ is fully serverless and supports SQL

upvoted 1 times

...

q4exam

3 years, 2 months ago

Data Fusion is not serverless, it create dataproc to execute the job .... I think the answer is C

upvoted 1 times

mousseUwU

3 years, 1 month ago

Data Fusion is serverless: https://cloud.google.com/data-fusion#all-features

upvoted 3 times

tavva_prudhvi

1 year, 8 months ago

I think you're only viewing the sentence "A serverless approach leveraging the scalability and reliability of Google services like Dataproc means Data Fusion offers the best of data integration capabilities with a lower total cost of ownership", The sentence implies that Data Fusion leverages a serverless approach, but it does not explicitly state that Data Fusion itself is serverless. It states that Data Fusion offers the best of data integration capabilities by using a serverless approach that leverages the scalability and reliability of Google services like Dataproc. So, while Data Fusion may not be fully serverless, it is designed to take advantage of serverless capabilities through its integration with Google services.

upvoted 2 times

...

mousseUwU

3 years, 1 month ago

Agree, A is correct

upvoted 2 times

...

joqu

Most Recent 4 days, 4 hours ago

Selected Answer: D

People giving other answers are to hang up on the fact that it currently runs in PySpark. The data is in GCS, you want quick serverless solution and use SQL syntax - BigQuery is the only good option that "meets the speed and processing requirements".

upvoted 1 times

...

LeumaS_NoswaY

2 months ago

B. You need Cloud Dataproc to transform the data from PySpark to Spark SQL

upvoted 1 times

...

TornikePirveli

3 months, 1 week ago

Serverless, SQL syntax -> BigQuery, simple as that

upvoted 2 times

...

jsalvasoler

3 months, 2 weeks ago

I am very curious. Why are the solutions (when I click Reveal Solution) generally WRONG?

upvoted 2 times

...

tadeupan

4 months, 1 week ago

option D because needs a serveless solution and sql sintax and BigQuery offer this. Datarproc is not serverless, so B is incorrect, D is correct option.

upvoted 2 times

...

Yorko

4 months, 2 weeks ago

Selected Answer: D

There's an updated version of this question in the official Google Cloud certified PMLE study guide. Option D is marked as correct

upvoted 2 times

TornikePirveli

3 months, 1 week ago

Can you link the updated version? On Amazon it's still 1st version and marked B

upvoted 1 times

...

PhilipKoku

5 months, 2 weeks ago

Selected Answer: D

The best approach is option D: Ingest data into BigQuery and use SQL queries for transformations. This leverages BigQuery’s serverless capabilities, efficient processing, and seamless integration with other Google Cloud services.

upvoted 2 times

...

fragkris

11 months, 3 weeks ago

Selected Answer: D

D - BigQuery is the only serverless and SQL-syntax option.

upvoted 1 times

...

Sum_Sum

1 year ago

Selected Answer: D

D - as BQ is server less and supports SQL none of the other options match both criteria

upvoted 2 times

...

12112

1 year, 4 months ago

Selected Answer: D

I'll go with D.

upvoted 1 times

...

M25

1 year, 6 months ago

Selected Answer: D

Went with D

upvoted 3 times

...

asava

1 year, 8 months ago

Selected Answer: B

BQ is the serverless solution

upvoted 3 times

TornikePirveli

3 months, 1 week ago

But using dataproc is not serverless, so answer should be D

upvoted 1 times

...

mellowed

1 year, 10 months ago

Correct option is D

upvoted 1 times

...

ssaporylo

1 year, 10 months ago

Vote D

upvoted 1 times

...

ares81

1 year, 10 months ago

Selected Answer: A

It should be A.

upvoted 1 times

...

Load full discussion...

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 4 discussion

Comments

nunzio144

q4exam

A4M

Celia20210714

TornikePirveli

q4exam

mousseUwU

tavva_prudhvi

mousseUwU

joqu

LeumaS_NoswaY

TornikePirveli

jsalvasoler

tadeupan

Yorko

TornikePirveli

PhilipKoku

fragkris

Sum_Sum

12112

M25

asava

TornikePirveli

mellowed

ssaporylo

ares81

Get IT Certification

New Version GCP Professional Cloud Architect Certificate & Helpful Information

The 5 Most In-Demand Project Management Certifications of 2019