Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 289 discussion

Actual exam question from Google's Professional Machine Learning Engineer

Question #: 289
Topic #: 1

[All Professional Machine Learning Engineer Questions]

You developed a BigQuery ML linear regressor model by using a training dataset stored in a BigQuery table. New data is added to the table every minute. You are using Cloud Scheduler and Vertex AI Pipelines to automate hourly model training, and use the model for direct inference. The feature preprocessing logic includes quantile bucketization and MinMax scaling on data received in the last hour. You want to minimize storage and computational overhead. What should you do?

A. Preprocess and stage the data in BigQuery prior to feeding it to the model during training and inference.
B. Use the TRANSFORM clause in the CREATE MODEL statement in the SQL query to calculate the required statistics.
C. Create a component in the Vertex AI Pipelines directed acyclic graph (DAG) to calculate the required statistics, and pass the statistics on to subsequent components.
D. Create SQL queries to calculate and store the required statistics in separate BigQuery tables that are referenced in the CREATE MODEL statement.

Show Suggested Answer

Suggested Answer: B 🗳️

by carolctech at Oct. 26, 2024, 6:15 p.m.

Comments

Submit Cancel

Wuthuong1234

4 months, 2 weeks ago

Selected Answer: B

B is the right solution. Keep in mind that it is asking for a solution where you "minimize storage and computational overhead". You end up storing more data with A and D. While in C you create more computational overhead. All solutions would work perfectly fine, but B matches best with the requirements in the question.

upvoted 1 times

...

Ankit267

6 months, 2 weeks ago

Selected Answer: B

BQ is sufficient

upvoted 1 times

...

AB_C

7 months, 2 weeks ago

Selected Answer: A

While the TRANSFORM clause can perform preprocessing, it's applied during model creation, not for inference. You'll need to recalculate statistics for each inference request, increasing computational overhead.

upvoted 1 times

Omi_04040

7 months ago

This is wrong This tutorial introduces data analysts to BigQuery ML. BigQuery ML enables users to create and execute machine learning models in BigQuery using SQL queries. This tutorial introduces feature engineering by using the TRANSFORM clause. Using the TRANSFORM clause, you can specify all preprocessing during model creation. The preprocessing is automatically applied during the prediction and evaluation phases of machine learning. https://cloud.google.com/bigquery/docs/bigqueryml-transform

upvoted 1 times

...

shubhachandra

7 months, 2 weeks ago

Selected Answer: B

The TRANSFORM clause in BigQuery ML allows you to directly define feature preprocessing logic (such as quantile bucketization and MinMax scaling) within the SQL query itself. This approach minimizes storage and computational overhead because: No additional storage: Statistics for preprocessing are calculated on-the-fly during model training and inference, without needing to store preprocessed data or statistics separately. Integrated workflow: The preprocessing logic is tightly coupled with the model creation process, ensuring consistency between training and inference without external dependencies.

upvoted 2 times

...

lunalongo

7 months, 3 weeks ago

Selected Answer: B

B is the best option because: 1) TRANSFORM saves processing, storage and computation by performing feature preprocessing directly within the CREATE MODEL. 2) This method integrates preprocessing with model training, streamlining the entire process.

upvoted 1 times

...

f084277

8 months ago

Selected Answer: C

Docs say BQ is not suitable for full-pass transformations such as Minmax.

upvoted 2 times

...

carolctech

8 months, 3 weeks ago

Selected Answer: A

A) Preprocessing and staging the data in BigQuery before training and inference, is the most efficient approach because: 1) You can use BQ’s optimized processing by preprocessing data before training 2) Avoiding redundant calculations, by directly using the preprocessed data (already bucketized and scaled) for training and inference; 3) Reducing storage by keeping only preprocessed data, not raw data and statistics separately.

upvoted 1 times

...