exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 131 discussion

Actual exam question from Google's Professional Machine Learning Engineer
Question #: 131
Topic #: 1
[All Professional Machine Learning Engineer Questions]

You are an ML engineer at a mobile gaming company. A data scientist on your team recently trained a TensorFlow model, and you are responsible for deploying this model into a mobile application. You discover that the inference latency of the current model doesn’t meet production requirements. You need to reduce the inference time by 50%, and you are willing to accept a small decrease in model accuracy in order to reach the latency requirement. Without training a new model, which model optimization technique for reducing latency should you try first?

  • A. Weight pruning
  • B. Dynamic range quantization
  • C. Model distillation
  • D. Dimensionality reduction
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
TNT87
Highly Voted 7 months, 3 weeks ago
B. Dynamic range quantization The reason for this choice is that dynamic range quantization is a model optimization technique that can significantly reduce model size and inference time while maintaining reasonable model accuracy. Dynamic range quantization uses fewer bits to represent the weights of the model, reducing the memory required to store the model and the time required for inference.
upvoted 6 times
...
julliet
Most Recent 5 months, 1 week ago
Selected Answer: B
B. A, C, D --> have to retrain
upvoted 3 times
...
M25
5 months, 3 weeks ago
Selected Answer: B
Plus: “Magnitude-based weight pruning gradually zeroes out model weights during the training process to achieve model sparsity. Sparse models are easier to compress, and we can skip the zeroes during inference for latency improvements.” https://www.tensorflow.org/model_optimization/guide/pruning, where “during the training process” disqualifies Option A.
upvoted 1 times
M25
5 months, 3 weeks ago
https://en.wikipedia.org/wiki/Knowledge_distillation is the process of transferring knowledge from a large model to a smaller one. As smaller models are less expensive to evaluate, they can be deployed on less powerful hardware (such as a mobile device). https://en.wikipedia.org/wiki/Dimensionality_reduction is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data. “Without training a new model” disqualifies both Option C and D.
upvoted 1 times
...
...
ares81
10 months ago
Selected Answer: B
'Without training a new model' --> B
upvoted 3 times
...
hiromi
10 months, 1 week ago
Selected Answer: B
B - https://www.tensorflow.org/lite/performance/post_training_quantization#dynamic_range_quantization
upvoted 4 times
...
hiromi
10 months, 1 week ago
B -https://www.tensorflow.org/lite/performance/post_training_quantization#dynamic_range_quantization
upvoted 1 times
...
mil_spyro
10 months, 2 weeks ago
Selected Answer: B
The requirement is "Without training a new model" hence dynamic range quantization. https://www.tensorflow.org/lite/performance/post_training_quant
upvoted 3 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago