- πΌ Principal AI Researcher at Together AI β creator and lead of TGL, the companyβs proprietary inference engine.
- π I co-lead the SGLang project with Lianmin Zheng and Ying Sheng, driving releases, optimization, and roadmap. I have led major versions and blogs including Llama 3, DeepSeek V3, Large Scale EP, and GB200 NVL72.
- π Co-author of the FlashInfer paper (MLSys 2025 Best Paper) and committer to FlashInfer. Previously, I was Lead Software Engineer at Baseten (co-authored the DeepSeek V3 and Qwen 3 launches) and led CTR GPU inference and vector retrieval system development at Meituan.
- π€ Interviewed by The New York Times (Article 1, Article 2), Featured speaker at AMD AI DevDay 2025 and PyTorch Conference 2025.
- π« Contact: [email protected] | Telegram | LinkedIn | Homepage
- π Best reached through SGLang Slack β weβre always looking for open-source enthusiasts and contributors to grow the community.
Pinned Loading
-
sgl-project/sglang
sgl-project/sglang PublicSGLang is a fast serving framework for large language models and vision language models.
-
flashinfer-ai/flashinfer
flashinfer-ai/flashinfer PublicFlashInfer: Kernel Library for LLM Serving
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.