Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
#### Motivation The Triton runtime can be used with model-mesh to serve PyTorch torchscript models, but it does not support arbitrary PyTorch models i.e. eager mode. KServe "classic" has integration with TorchServe but it would be good to have integration with model-mesh too so that these kinds of models can be used in distributed multi-model serving contexts. #### Modifications - Add adapter logic to implement the modelmesh management SPI using the torchserve gRPC management API - Build and include new adapter binary in the docker image - Add mock server and basic unit tests Implementation notes: - Model size (mem usage) is not returned from the `LoadModel` RPC but rather done separately in the `ModelSize` rpc (so that the model is available for use slightly sooner) - TorchServe's `DescribeModel` RPC is used to determine the model's memory usage. If that isn't successful it falls back to using a multiple of the model size on disk (similar to other runtimes) - The adapter writes the config file for TorchServe to consume TorchServe does not yet support the KServe V2 gRPC prediction API (only REST) which means that can't currently be used with model-mesh. The native TorchServe gRPC inference interface can be used instead for the time being. A smaller PR to the main modelmesh-serving controller repo will be opened to enable use of TorchServe, which will include the ServingRuntime specification. #### Result TorchServe can be used seamlessly with ModelMesh Serving to serve PyTorch models, including eager mode. Resolves #4 Contributes to kserve/modelmesh-serving#63 Signed-off-by: Nick Hill <[email protected]>
- Loading branch information