@humansand ## Miles - MXFP8 & NVFP4 - [x] https://github.com/radixark/miles/pull/614 - [x] https://github.com/radixark/miles/issues/567 - [ ] https://github.com/radixark/miles/pull/919 - MXFP8 - [x] https://github.com/radixark/miles/pull/512 - [x] https://github.com/radixark/miles/pull/963 - NVFP4 - [x] ~https://github.com/radixark/miles/pull/546~ - [x] https://github.com/radixark/miles/pull/907 - [x] https://github.com/radixark/miles/pull/1054 - [ ] Use FlashInfer nvfp4 quantizer - [ ] Avoid QDQ during weight sync and directly use TE data ## SGLang - MXFP8 & NVFP4 - [x] https://github.com/sgl-project/sglang/pull/20214 - MXFP8 - [x] https://github.com/sgl-project/sglang/pull/17449 - [x] https://github.com/sgl-project/sglang/pull/18742 - [x] ~https://github.com/sgl-project/sglang/pull/17294~ - [ ] https://github.com/sgl-project/sglang/pull/26342 - [x] https://github.com/sgl-project/sglang/pull/19537 - [x] https://github.com/sgl-project/sglang/pull/21280 - [x] https://github.com/sgl-project/sglang/pull/21576 - [x] https://github.com/sgl-project/sglang/pull/22484 - [x] https://github.com/sgl-project/sglang/pull/26287 - [x] https://github.com/sgl-project/sglang/pull/28459 - NVFP4 - [x] ~https://github.com/sgl-project/sglang/pull/18012~ - [x] https://github.com/sgl-project/sglang/pull/18085 - [x] https://github.com/sgl-project/sglang/pull/22204 - [x] https://github.com/sgl-project/sglang/pull/22918 ## TransformerEngine - MXFP8 & NVFP4 - [x] https://github.com/NVIDIA/TransformerEngine/pull/2644 - [x] https://github.com/NVIDIA/TransformerEngine/pull/2865 - NVFP4 - [x] https://github.com/NVIDIA/TransformerEngine/pull/2931 - [x] https://github.com/NVIDIA/TransformerEngine/pull/2972 - [x] https://github.com/NVIDIA/cudnn-frontend/pull/251 - [ ] https://github.com/NVIDIA/TransformerEngine/pull/3042 ## FlashInfer - MXFP8 & NVFP4 - [x] https://github.com/flashinfer-ai/flashinfer/pull/3387 - MXFP8 - [x] https://github.com/flashinfer-ai/flashinfer/pull/2581 - NVFP4 - [x] https://github.com/flashinfer-ai/flashinfer/pull/3027 - [x] https://github.com/flashinfer-ai/flashinfer/pull/3264 - [x] https://github.com/flashinfer-ai/flashinfer/pull/3448
@HumansAnd
Miles
MXFP8 & NVFP4
--fp8-param-gatherfor mxfp8 #919MXFP8
NVFP4
Implement nvfp4 #546SGLang
flashinfer_trtllm_routedmoe backend sgl-project/sglang#20214Expand deep_gemm entrypoint to support more FP8 recipes. sgl-project/sglang#17294[RL] Add an nvfp4 online input scale mode sgl-project/sglang#18012TransformerEngine
NVTE_BACKWARD_OVERRIDE=high_precision|dequantizedNVIDIA/TransformerEngine#2644FlashInfer
cutlass_fused_moemxfp8 flashinfer-ai/flashinfer#2581