You developed an ML model with AI Platform, and you want to move it to production. You serve a few thousand queries per second and are experiencing latency issues. Incoming requests are served by a load balancer that distributes them across multiple Kubeflow CPU-only pods running on Google Kubernetes Engine
(GKE). Your goal is to improve the serving latency without changing the underlying infrastructure. What should you do?
Y2Data
Highly Voted 3 years, 7 months agomousseUwU
3 years, 6 months agopico
Highly Voted 1 year, 5 months agodesertlotus1211
Most Recent 3 months, 3 weeks agorajshiv
4 months, 1 week agoAB_C
4 months, 3 weeks agodesertlotus1211
5 months, 4 weeks agotaksan
8 months agochirag2506
9 months, 3 weeks agoPhilipKoku
10 months, 2 weeks agopinimichele01
1 year agoichbinnoah
1 year, 5 months agoedoo
1 year, 1 month agotavva_prudhvi
1 year, 8 months agoharithacML
1 year, 9 months agoLiting
1 year, 9 months agoM25
1 year, 11 months agoSergioRubiano
2 years agoYajnas_arpohc
2 years agofrangm23
1 year, 11 months ago