Triton server orchestration for production deployment

The Triton server(s) could be organized in several different ways for a realistic production deployment.

A. One server per model
* Requires some central map of IP:model name
* Does this imply one model per GPU?

B. Single server for all models (and all GPUs)
* Load-balancing already works well
* Need to ensure serving multiple models can be done efficiently

C. Some hybrid of A and B

D. Other?

In addition, it's likely that at least each Tier1/Tier2 would eventually have their own GPU servers (to reduce latency). The IP addresses of each site's server(s) could be tracked in e.g. `site-local-config.xml` or another appropriate part of the production infrastructure.

Triton 2.X supports https/ssl, which could potentially be used for client-server authentication in production to maintain security.

attn: @violatingcp @holzman @mapsacosta

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Triton server orchestration for production deployment #18

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Triton server orchestration for production deployment #18

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions