Skip to content

v0.4.0

Compare
Choose a tag to compare
@danehans danehans released this 23 Jun 04:32
· 179 commits to main since this release
v0.4.0
2b5b337

Overview

We are thrilled to announce the v0.4.0 release—our biggest update yet! This version brings powerful new Endpoint Picker (EPP) scheduler capabilities, performance improvements, and initial Gateway conformance tests.

Major Highlights

  • Modular Endpoint Picker (EPP) Scheduler: A kube-scheduler–style plugin API lets you build custom routing logic,
    filter and score backends, or swap in new picker strategies without touching core code.

  • Prefix-Cache-Aware Routing: Dramatically lower tail latency by routing requests based on cached network prefixes,
    improving response times under load.

  • Richer Metrics: Gain deeper insights with new metrics including:

    • NTPOT (Normalized Time Per Output Token)
    • Scheduler latency
    • Per-pod queue depth
    • Build and version info
  • Optional vLLM Simulator Backend: Spin up a lightweight simulator for local development and testing—no real model
    servers required.

  • Initial Conformance Tests: Validate your controller’s behavior with end-to-end tests covering InferencePool,
    InferenceModel, HTTPRoute, and more.

What's Changed

New Contributors

Full Changelog: v0.3.0...v0.4.0