How TorchServe could scale in a Kubernetes environment using KEDA I almost burned a 7K Euros GPU card (NVIDIA A100 PCIe GPU) to understand how a TorchServe could meet the increasing of ondemand inference requests at scale.
Serve AI models using TorchServe in Kubernetes at scale In a tipical MLOps pratice, among the various things, we need to serve our AI models to users exposing inference APIs. I tried a production ready framework (TorchServe) installing it on Azure Kubernetes Service and tested its power to the maximum.