feat: vLLM-Omni backend (native omni: text/audio/vision) + chat UI#2527
Draft
ramkrishna2910 wants to merge 2 commits into
Draft
feat: vLLM-Omni backend (native omni: text/audio/vision) + chat UI#2527ramkrishna2910 wants to merge 2 commits into
ramkrishna2910 wants to merge 2 commits into
Conversation
Adds the vllm-omni recipe using the descriptor backend model: a folder (vllm_omni/) with the descriptor + WrappedServer subclass, one CMake LEMON_BACKENDS line, a backend_versions.json pin, a server_models.json entry, and a bundled single-GPU deploy config. The server launches the bundle's vllm-omni-server with 'serve <model> --omni --deploy-config <yaml>', resolves the per-model deploy config via the model's extra 'deploy_config' key + get_resource_path, and forwards OpenAI-compatible chat requests. Native voice / vision ride through the chat body (audio output returns as a second choice). Validated end-to-end on gfx1151: lemond builds + registers the recipe, the model lists, loading through lemonade launches the subprocess, and all four modalities work (text, native voice out, audio in, vision in) with the audio choice passed through lemonade's handler intact. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Surface vLLM-Omni's native audio output in the chat UI: - ChatWindow derives isAudioOutput from a 'chat-speech' label (mirrors the 'chat-transcription' audio-input flag) and passes it to LLMChatPanel. - LLMChatPanel gains a voice toggle + voice picker (Chelsie/Ethan), off by default (speech gen is slow). When on, handleAudioChat sends a non-streaming request with modalities:[text,audio] + voice, and captures message.audio from the response (handles vLLM-Omni's two-choice shape and OpenAI's single-choice) into an audio artifact — rendered via the existing MessageAudio player. - Qwen2.5-Omni model labels updated to vision/chat-transcription/chat-speech/omni so vision input, audio input, and native voice output all light up. Reuses existing primitives (MessageAudio, buildFinalContent, input_audio/ image_url content types); no player/rendering changes. tsc --noEmit clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds vLLM-Omni as a first-class backend so omni / any-to-any multimodal models (Qwen2.5-Omni today; Cosmos3 later) run with ROCm acceleration on AMD gfx1151 (Strix Halo) — a single model doing text + audio + vision in, text + native voice out.
vLLM-Omni is a pure-Python layer on the same base vLLM+PyTorch+Triton, shipped as a separate
vllm-omni*release artifact, so it gets its own recipe + pin (see vllm-rocm#23).Backend (
54788cc9) — uses the new self-describing descriptor modelPer
docs/dev/adding-a-backend.md— one folder + a few appends, no router/CLI/doc edits:src/cpp/{include,server}/backends/vllm_omni/— descriptor +VLLMOmniServer(mirrorsvllm). Launchesvllm-omni-server serve <model> --omni --deploy-config <yaml> --served-model-name --max-model-len, resolves the per-model deploy config via the model'sdeploy_configextra key +get_resource_path, forwards chat. Native voice/vision ride through the chat body.CMakeLists.txtLEMON_BACKENDS+="vllm-omni|vllm_omni"backend_versions.jsonpinvllm-omni0.23.0rc1-rocm7.14.0server_models.jsonQwen2.5-Omni-3B-vLLM-Omniresources/omni_deploy/qwen2_5_omni_1gpu.yaml— single-GPU stage colocation (upstream defaults are multi-GPU)Chat UI (
a423eab3)ChatWindowderivesisAudioOutputfrom achat-speechlabel (mirrorschat-transcription).LLMChatPanel: voice toggle + picker (Chelsie/Ethan, off by default — speech gen is slow);handleAudioChatsends a non-streamingmodalities:[text,audio]+voicerequest and capturesmessage.audio(handles vLLM-Omni's two-choice shape and OpenAI's single-choice) → rendered via the existingMessageAudioplayer. No new rendering code.vision/chat-transcription/chat-speech/omnilight up vision input, audio input, and native voice output via existing capability flags.Validation
lemondbuilds, registers the recipe, lists the model, loads it (launches the subprocess), and all four modalities pass: text, native voice out (24 kHz WAV), audio in, vision in — with the audio choice passed through lemonade's handler intact.tsc --noEmitclean. Reuses proven audio-player/artifact path. Not yet run in the app (needs a webview) — reviewers should eyeball rendering/UX.Notes for reviewers
max_tokenstuning in the deploy YAML) — cosmetic.🤖 Generated with Claude Code