You have deployed a model on Vertex AI for real-time inference. During an online prediction request, you get an “Out of Memory” error. What should you do?
A.
Use batch prediction mode instead of online mode.
B.
Send the request again with a smaller batch of instances.
C.
Use base64 to encode your data before using it for prediction.
D.
Apply for a quota increase for the number of prediction requests.
By reducing the batch size of instances sent for prediction, you decrease the memory footprint of each request, potentially alleviating the out-of-memory issue. However, be mindful that excessively reducing the batch size might impact the efficiency of your prediction process.
B. Send the request again with a smaller batch of instances.
If you are getting an "Out of Memory" error during an online prediction request, it suggests that the amount of data you are sending in each request is too large and is exceeding the available memory. To resolve this issue, you can try sending the request again with a smaller batch of instances. This reduces the amount of data being sent in each request and helps avoid the out-of-memory error. If the problem persists, you can also try increasing the machine type or the number of instances to provide more resources for the prediction service.
This question is about prediction not training - and specifically it's about _online_ prediction (aka realtime serving).
All the answers are about batch workloads apart from C.
Okay, option D is also about online serving, but the error message indicates a problem for individual predictions, which will not be fixed by increasing the number of predictions per second.
@BenMS this feels like a trick question.... makes on to zone to the word batch. https://cloud.google.com/ai-platform/training/docs/troubleshooting .... states then when an error occurs with an online prediction request, you usually get an HTTP status code back from the service. These are some commonly encountered codes and their meaning in the context of online prediction:
429 - Out of Memory
The processing node ran out of memory while running your model. There is no way to increase the memory allocated to prediction nodes at this time. You can try these things to get your model to run:
Reduce your model size by:
1. Using less precise variables.
2. Quantizing your continuous data.
3. Reducing the size of other input features (using smaller vocab sizes, for example).
4. Send the request again with a smaller batch of instances.
A voting comment increases the vote count for the chosen answer by one.
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one.
So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
hiromi
Highly Voted 1 year, 10 months agotavva_prudhvi
1 year, 7 months agoPhilipKoku
Most Recent 4 months, 3 weeks agopmle_nintendo
8 months agoM25
1 year, 5 months agotavva_prudhvi
1 year, 7 months agoBenMS
1 year, 8 months agoBenMS
1 year, 8 months agoAntmal
1 year, 7 months agokoakande
1 year, 10 months agoares81
1 year, 10 months agoLearnSodas
1 year, 10 months agoSivaram06
1 year, 10 months ago