Skip to content

Conversation

bongwoo-bak
Copy link

@bongwoo-bak bongwoo-bak commented Sep 25, 2025

Summary

This PR introduces a new SGLang connector that supports prefill/decode (P/D) disaggregation for the LLM-D routing sidecar. It enables concurrent prefill and decode operations through SGLang’s bootstrap mechanism.

Changes

  • Added connector_sglang.go implementing P/D disaggregation
  • Integrated bootstrap configuration (host, port, room)
  • Updated cmd/llm-d-routing-sidecar/main.go and internal/proxy/proxy.go

Features

  • Room-based communication for coordinating prefill/decode
  • Configurable bootstrap via env SGLANG_BOOTSTRAP_PORT(default 8668)
  • Prefill requests are sent asynchronously, decode requests are sent synchronously and processed upon receiving the decode response

Test

  • Tested with SGLang prefill/decode services
  • Confirmed asynchronous prefill & synchronous decode execution
  • Successfully tested in Kubernetes cluster with AMD MI250 GPUs
  • Verified integration with Gateway and EPP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants