Make FlashInfer cache dtype hardware-aware (fix bf16 on SM<80) by os-gabe · Pull Request #56 · Robbyant/lingbot-map

os-gabe · 2026-04-29T01:14:40Z

FlashInferKVCacheManager unconditionally mapped fp32 -> bf16, which fails at runtime on SM<80 (Turing/Volta, e.g. Titan RTX) where bf16 kernels aren't available. AggregatorStream also passed tokens.dtype, which autocast-exempt ops (LayerNorm) leak as fp32, so the bug fires even when demo.py selects fp16.

flashinfer_cache: hardware-aware fp32/None fallback (bf16 only on SM>=8).
aggregator/stream: prefer aggregator parameter dtype before falling through to the cache's default.

SM>=80 behavior is unchanged.

FlashInferKVCacheManager unconditionally mapped fp32 -> bf16, which fails at runtime on SM<80 (Turing/Volta, e.g. Titan RTX) where bf16 kernels aren't available. AggregatorStream also passed tokens.dtype, which autocast-exempt ops (LayerNorm) leak as fp32, so the bug fires even when demo.py selects fp16. - flashinfer_cache: hardware-aware fp32/None fallback (bf16 only on SM>=8). - aggregator/stream: prefer aggregator parameter dtype before falling through to the cache's default. SM>=80 behavior is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make FlashInfer cache dtype hardware-aware (fix bf16 on SM<80)#56

Make FlashInfer cache dtype hardware-aware (fix bf16 on SM<80)#56
os-gabe wants to merge 1 commit into
Robbyant:mainfrom
os-gabe:flashinfer-fp16-sm75

os-gabe commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

os-gabe commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant