Skip to content

Triton server orchestration for production deployment #18

@kpedro88

Description

@kpedro88

The Triton server(s) could be organized in several different ways for a realistic production deployment.

A. One server per model

  • Requires some central map of IP:model name
  • Does this imply one model per GPU?

B. Single server for all models (and all GPUs)

  • Load-balancing already works well
  • Need to ensure serving multiple models can be done efficiently

C. Some hybrid of A and B

D. Other?

In addition, it's likely that at least each Tier1/Tier2 would eventually have their own GPU servers (to reduce latency). The IP addresses of each site's server(s) could be tracked in e.g. site-local-config.xml or another appropriate part of the production infrastructure.

Triton 2.X supports https/ssl, which could potentially be used for client-server authentication in production to maintain security.

attn: @violatingcp @holzman @mapsacosta

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions