Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 263 discussion

Actual exam question from Google's Professional Machine Learning Engineer

Question #: 263
Topic #: 1

[All Professional Machine Learning Engineer Questions]

You are developing a custom TensorFlow classification model based on tabular data. Your raw data is stored in BigQuery. contains hundreds of millions of rows, and includes both categorical and numerical features. You need to use a MaxMin scaler on some numerical features, and apply a one-hot encoding to some categorical features such as SKU names. Your model will be trained over multiple epochs. You want to minimize the effort and cost of your solution. What should you do?

A. 1. Write a SQL query to create a separate lookup table to scale the numerical features.
2. Deploy a TensorFlow-based model from Hugging Face to BigQuery to encode the text features.
3. Feed the resulting BigQuery view into Vertex AI Training.
B. 1. Use BigQuery to scale the numerical features.
2. Feed the features into Vertex AI Training.
3. Allow TensorFlow to perform the one-hot text encoding.
C. 1. Use TFX components with Dataflow to encode the text features and scale the numerical features.
2. Export results to Cloud Storage as TFRecords.
3. Feed the data into Vertex AI Training.
D. 1. Write a SQL query to create a separate lookup table to scale the numerical features.
2. Perform the one-hot text encoding in BigQuery.
3. Feed the resulting BigQuery view into Vertex AI Training.

Show Suggested Answer

Suggested Answer: C 🗳️

by pikachu007 at Jan. 13, 2024, 4:02 p.m.

Comments

Submit Cancel

b2aaace

Highly Voted 1 year ago

Selected Answer: C

"Full-pass stateful transformations aren't suitable for implementation in BigQuery. If you use BigQuery for full-pass transformations, you need auxiliary tables to store quantities needed by stateful transformations, such as means and variances to scale numerical features. Further, implementation of full-pass transformations using SQL on BigQuery creates increased complexity in the SQL scripts, and creates intricate dependency between training and the scoring SQL scripts." https://www.tensorflow.org/tfx/guide/tft_bestpractices#where_to_do_preprocessing

upvoted 7 times

Ankit267

4 months ago

only requirement is minmax scaling, not mean or variance. D, why need extra component like Cloud Storage, Vertex AI when it could be done easily in BQ where already the raw data is stored

upvoted 1 times

...

Prakzz

9 months, 4 weeks ago

Isn't Dataflow includes a lot of effort as the question asking to minimize the effort here?

upvoted 1 times

...

thescientist

Most Recent 3 months, 3 weeks ago

Selected Answer: B

With multiple epochs, the data is passed through the model multiple times. If you pre-encoded the categorical features (as in options C and D), you would be storing and repeatedly reading a much larger dataset (due to the one-hot encoding). This significantly increases storage costs and I/O overhead. By performing the one-hot encoding within TensorFlow during training (as in option B), the encoding happens on-the-fly for each batch of data during each epoch.

upvoted 1 times

...

Ankit267

4 months ago

Selected Answer: D

B & D as top 2 choices, C is including Dataflow unnecessarily. D as "minimize the effort and cost of your solution", still some room for B but I selected D

upvoted 1 times

...

pipefaxaf

5 months, 3 weeks ago

Selected Answer: D

Option D minimizes effort and cost by using BigQuery to handle both the scaling and one-hot encoding. BigQuery is efficient for these types of preprocessing tasks, especially when dealing with large datasets. By preparing the data in BigQuery, you avoid the need to export data to other services or use additional resources for preprocessing, such as Dataflow. This approach provides a streamlined workflow by creating a preprocessed view in BigQuery, which can then be directly fed into Vertex AI Training without extra transformation steps. This helps optimize cost and simplicity while handling large tabular data effectively.

upvoted 3 times

DaleR

5 months ago

Agree with pipefaxal. Minimize effort is key here.

upvoted 1 times

...

YangG

6 months, 2 weeks ago

Selected Answer: C

multiple epochs --> need to persist data after preprocessing

upvoted 2 times

...

wences

7 months, 2 weeks ago

Selected Answer: D

Option D since it says minimize effort and cost following that adding something rather than BQ will increase complexity.

upvoted 2 times

...

AzureDP900

9 months, 3 weeks ago

Option C uses TFX (TensorFlow Extended) components with Dataflow, which is a great way to perform complex data preprocessing tasks like one-hot encoding and scaling. This approach allows you to process your data in a scalable and efficient manner, using Cloud Storage as the output location. By exporting the results as TFRecords, you can easily feed this preprocessed data into Vertex AI Training for model development.

upvoted 1 times

...

dija123

9 months, 4 weeks ago

Selected Answer: C

agree with TFX components with Dataflow

upvoted 1 times

...

bobjr

10 months, 3 weeks ago

Selected Answer: D

GPT says D, Gemini says B, Perplexity says C.... I say D : stay in one tool, BQ, which is cheap and natively scalable. B has a risk of out of memory error.

upvoted 4 times

...

fitri001

1 year ago

Selected Answer: B

BigQuery for Preprocessing: BigQuery is a serverless data warehouse optimized for large datasets.expand_more It can handle scaling numerical features using built-in functions like SCALE or QUANTILE_SCALE, reducing the need for complex custom logic or separate lookup tables. TensorFlow for One-Hot Encoding: TensorFlow excels at in-memory processing. One-hot encoding of categorical features, especially text features like SKU names, can be efficiently performed within your TensorFlow model during training. This avoids unnecessary data movement or transformations in BigQuery. Vertex AI Training: By feeding the preprocessed data (scaled numerical features) directly into Vertex AI Training, you leverage its managed infrastructure for training your custom TensorFlow model.

upvoted 1 times

fitri001

1 year ago

Option A: Creates unnecessary complexity and data movement. BigQuery is better suited for scaling numerical features, and TensorFlow is efficient for one-hot encoding. Option C: TFX is a powerful framework for complex pipelines, but for a simpler scenario like this, it might be an overkill. Additionally, exporting data as TFRecords adds an extra step, potentially increasing cost and complexity. Option D: One-hot encoding in BigQuery might be cumbersome for textual features like SKU names. pen_spark exclamation It can be computationally expensive and result in data explosion. TensorFlow handles this efficiently within the model.

upvoted 1 times

...

cruise93

1 year ago

Selected Answer: C

Agree with b1a8fae

upvoted 1 times

...

gscharly

1 year ago

Selected Answer: C

agree with daidai75

upvoted 2 times

pinimichele01

1 year ago

Option B is not suitable for the big volume of data processing????? BQ is not suitable for big volume??.. for me is B

upvoted 1 times

...

guilhermebutzke

1 year, 2 months ago

Selected Answer: B

My Answer: B 1. Use BigQuery to scale the numerical features.: Simpler and cheaper then use TFX components with Dataflow to scale the numerical features 2. Feed the features into Vertex AI Training. 3. Allow TensorFlow to perform the one-hot text encoding: TensorFlow handles the one-hot text encoding better than BQ.

upvoted 4 times

...

daidai75

1 year, 3 months ago

Selected Answer: C

key messages: "contains hundreds of millions of rows, and includes both categorical and numerical features. You need to use a MaxMin scaler on some numerical features, and apply a one-hot encoding to some categorical features such as SKU names". Option B is not suitable for the big volume of data processing. Option C is better.

upvoted 2 times

...

b1a8fae

1 year, 3 months ago

Selected Answer: C

Inclined to choose C over B. By using TFX components with Dataflow, you can perform feature engineering on large-scale tabular data in a distributed and efficient way. You can use the Transform component to apply the MaxMin scaler and the one-hot encoding to the numerical and categorical features, respectively. You can also use the ExampleGen component to read data from BigQuery and the Trainer component to train your TensorFlow model.

upvoted 2 times

...

pikachu007

1 year, 3 months ago

Selected Answer: B

Option A: Involves creating a separate lookup table and deploying a Hugging Face model in BigQuery, increasing complexity and cost. Option C: While TFX offers robust preprocessing capabilities, it adds overhead for this use case and requires knowledge of Dataflow. Option D: Performing one-hot encoding in BigQuery can be less efficient than TensorFlow's optimized implementation.

upvoted 3 times

...

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 263 discussion

Comments

b2aaace

Ankit267

Prakzz

thescientist

Ankit267

pipefaxaf

DaleR

YangG

wences

AzureDP900

dija123

bobjr

fitri001

fitri001

cruise93

gscharly

pinimichele01

guilhermebutzke

daidai75

b1a8fae

pikachu007

SY0-701