-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[WIP] Use FlashInfer RoPE #2016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It's worth noting that flashinfer uses fp32 internally for sin/cos, we found there will be some non-trivial output difference if we use fp16 sin/cos. |
9a8f8fd
to
0015a72
Compare
@james-p-xu How is it going? Has it already run successfully after removing this dependency using flashinfer latest https://github.com/flashinfer-ai/flashinfer-nightly/releases sglang/python/sglang/srt/models/llama.py Line 25 in 60769be
|
c103667
to
aed64b2
Compare
547675d
to
5ea7f23
Compare
e44fe8a
to
dcfde45
Compare
dcfde45
to
0d70c43
Compare
I will first merge this PR into the |
Looks like there is another correctness issue with flashinfer |
also related with #2620 |
Hi James @james-p-xu, we've decided to adopt this #2964, and @ByronHsu will help rewrite the CUDA kernel. I'll close this PR for now. Thanks for your contribution! |
Motivation
NOTE:
flashinfer.apply_rope_pos_ids
does not exist in the prebuilt wheel, must build from source. Is this an issue?We want to verify the correctness of flashinfer's RoPE against vLLM's RoPE, in preparation of replacing vLLM's
get_rope
with flashinfer's.cc: @ByronHsu
Modifications
Added standalone python script for comparison.
Checklist