Skip to content

v0.1.0

Choose a tag to compare

@OlivierDehaene OlivierDehaene released this 13 Oct 13:46
· 298 commits to main since this release
  • No compilation step
  • Dynamic shapes
  • Small docker images and fast boot times. Get ready for true serverless!
  • Token based dynamic batching
  • Optimized transformers code for inference using Flash Attention, Candle and cuBLASLt
  • Safetensors weight loading
  • Production ready (distributed tracing with Open Telemetry, Prometheus metrics)