Skip to content

feat: implement --served-model-name CLI option#158

Merged
BugenZhao merged 1 commit intoInferact:mainfrom
ericcurtin:feat/served-model-name
May 6, 2026
Merged

feat: implement --served-model-name CLI option#158
BugenZhao merged 1 commit intoInferact:mainfrom
ericcurtin:feat/served-model-name

Conversation

@ericcurtin
Copy link
Copy Markdown
Contributor

@ericcurtin ericcurtin commented May 5, 2026

Summary

  • Add `--served-model-name` CLI argument (zero or more values) to expose model aliases via the OpenAI API, matching vllm's behavior
  • `GET /v1/models` now returns one entry per served name; all endpoints accept any served name in the `model` request field and echo back the first (primary) name in responses
  • Removed `served_model_name` from `UnsupportedArgs` now that it is fully implemented
  • `AppState::new` asserts (not just debug-asserts) that the served names list is non-empty, so misconfiguration fails fast in release builds too

Behavior

When no `--served-model-name` is given, the backend model path is used as the single served name (no change in default behavior).

```
vllm-rs serve Qwen/Qwen3-0.6B --served-model-name qwen3 my-alias
```

  • `GET /v1/models` → returns `qwen3` and `my-alias`
  • `POST /v1/chat/completions` with `"model": "my-alias"` → accepted, response contains `"model": "qwen3"`

Review notes

The `EngineCoreClientConfig::model_name` intentionally keeps the backend model path (`config.model`) rather than the first served alias — it is used for the engine protocol handshake, not for API labeling.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for multiple served model names in the API, allowing the server to respond to several identifiers while designating the first as the primary ID for responses. Changes include adding the --served-model-name CLI argument, updating the server configuration and state to handle a list of names, and modifying request validation logic across HTTP and gRPC routes. Additionally, the /v1/models endpoint now returns all configured names. Feedback was provided regarding the use of debug_assert! for validating the non-empty invariant of served model names, suggesting a more robust check for production builds.

Comment thread src/server/src/state.rs Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 13e0d4e359

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/server/src/lib.rs
Comment thread src/server/src/state.rs Outdated
@ericcurtin ericcurtin force-pushed the feat/served-model-name branch from 13e0d4e to e290867 Compare May 5, 2026 14:18
@ericcurtin
Copy link
Copy Markdown
Contributor Author

@BugenZhao PTAL

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e29086775a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/cmd/src/cli.rs
@BugenZhao BugenZhao requested review from BugenZhao and njhill May 5, 2026 16:11
Copy link
Copy Markdown
Member

@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

Add --served-model-name support, matching vllm's behavior:

- Accept zero or more alias names via --served-model-name
- GET /v1/models returns one entry per served name
- POST completions/chat endpoints accept any served name in the
  model field and echo back the first (primary) name in responses
- gRPC and /inference/v1/generate validate against all served names
- Falls back to the backend model path when no names are specified
- Removed served_model_name from UnsupportedArgs now that it is
  fully implemented
@ericcurtin ericcurtin force-pushed the feat/served-model-name branch from e290867 to 26449b1 Compare May 6, 2026 08:17
@BugenZhao BugenZhao enabled auto-merge (squash) May 6, 2026 13:28
@BugenZhao BugenZhao merged commit b7bc63d into Inferact:main May 6, 2026
3 checks passed
@ericcurtin ericcurtin deleted the feat/served-model-name branch May 8, 2026 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants