You developed an ML model with AI Platform, and you want to move it to production. You serve a few thousand queries per second and are experiencing latency issues. Incoming requests are served by a load balancer that distributes them across multiple Kubeflow CPU-only pods running on Google Kubernetes Engine
(GKE). Your goal is to improve the serving latency without changing the underlying infrastructure. What should you do?
Y2Data
Highly Voted 3 years, 2 months agomousseUwU
3 years, 1 month agodesertlotus1211
Most Recent 1 month agotaksan
3 months, 1 week agochirag2506
5 months agoPhilipKoku
5 months, 2 weeks agopinimichele01
7 months, 1 week agopico
1 year agoichbinnoah
1 year agoedoo
8 months, 3 weeks agotavva_prudhvi
1 year, 3 months agoharithacML
1 year, 4 months agoLiting
1 year, 4 months agoM25
1 year, 6 months agoSergioRubiano
1 year, 8 months agoYajnas_arpohc
1 year, 8 months agofrangm23
1 year, 7 months agoOmi_04040
1 year, 11 months agowish0035
1 year, 11 months agosachinxshrivastav
2 years, 3 months ago