[WIP] Use FlashInfer RoPE #2016

james-p-xu · 2024-11-12T16:24:06Z

Motivation

NOTE: flashinfer.apply_rope_pos_ids does not exist in the prebuilt wheel, must build from source. Is this an issue?

We want to verify the correctness of flashinfer's RoPE against vLLM's RoPE, in preparation of replacing vLLM's get_rope with flashinfer's.

cc: @ByronHsu

Modifications

Added standalone python script for comparison.

Checklist

Format your code according to the Contributor Guide.
Add unit tests as outlined in the Contributor Guide.
Update documentation as needed, including docstrings or example tutorials.

yzh119 · 2024-11-12T20:04:16Z

It's worth noting that flashinfer uses fp32 internally for sin/cos, we found there will be some non-trivial output difference if we use fp16 sin/cos.

zhyncs · 2024-11-23T09:14:17Z

@james-p-xu How is it going? Has it already run successfully after removing this dependency using flashinfer latest https://github.com/flashinfer-ai/flashinfer-nightly/releases

sglang/python/sglang/srt/models/llama.py

Line 25 in 60769be

from vllm.model_executor.layers.rotary_embedding import get_rope

zhyncs · 2024-12-11T14:25:15Z

I will first merge this PR into the sgl-project:rpoe branch today to run the CI with nightly FlashInfer. You can check it out later and submit a new PR. Thanks!

james-p-xu · 2024-12-23T21:31:00Z

Looks like there is another correctness issue with flashinfer apply_rope_pos_ids function. Will sync with Zihao offline on this.

zhyncs · 2024-12-27T18:32:57Z

also related with #2620

zhyncs · 2025-01-18T11:40:49Z

Hi James @james-p-xu, we've decided to adopt this #2964, and @ByronHsu will help rewrite the CUDA kernel. I'll close this PR for now. Thanks for your contribution!

zhyncs requested a review from yzh119 November 12, 2024 17:47

james-p-xu changed the title ~~Add RoPE comparison across flashinfer and vLLM~~ [WIP] Use FlashInfer RoPE Nov 14, 2024

james-p-xu requested review from merrymercy, Ying1123, hnyls2002, zhyncs, ispobock and ByronHsu as code owners November 14, 2024 22:16

james-p-xu force-pushed the rope_comparison branch 2 times, most recently from 9a8f8fd to 0015a72 Compare November 16, 2024 04:29

james-p-xu mentioned this pull request Nov 17, 2024

[TEST] flashinfer version upgrade to v0.2.0 #2054

Closed

3 tasks

james-p-xu mentioned this pull request Dec 2, 2024

[Track] progress in removing vLLM dependencies #2245

Closed

2 tasks

james-p-xu force-pushed the rope_comparison branch 7 times, most recently from c103667 to aed64b2 Compare December 6, 2024 00:30

zhyncs force-pushed the main branch from fc6387e to 64fceab Compare December 6, 2024 06:14

james-p-xu force-pushed the rope_comparison branch from 547675d to 5ea7f23 Compare December 8, 2024 22:49

merrymercy force-pushed the main branch from 1ad76cd to 835f8af Compare December 9, 2024 07:31

james-p-xu force-pushed the rope_comparison branch 5 times, most recently from e44fe8a to dcfde45 Compare December 10, 2024 19:07

james-p-xu added 2 commits December 10, 2024 14:07

Add RoPE comparison across flashinfer and vLLM

a84ee5a

feat: use FlashInfer RoPE (llama)

0d70c43

james-p-xu force-pushed the rope_comparison branch from dcfde45 to 0d70c43 Compare December 10, 2024 19:07

Use int32 positions with flashinfer

82af8ae

zhyncs self-assigned this Dec 27, 2024

zhyncs added enhancement New feature or request high priority flashinfer labels Dec 27, 2024

Fix reference script positions after flashinfer bump

d0cfd4a

james-p-xu marked this pull request as draft December 28, 2024 00:08

zhyncs closed this Jan 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Use FlashInfer RoPE #2016

[WIP] Use FlashInfer RoPE #2016

james-p-xu commented Nov 12, 2024 •

edited

Loading

yzh119 commented Nov 12, 2024

zhyncs commented Nov 23, 2024

zhyncs commented Dec 11, 2024

james-p-xu commented Dec 23, 2024

zhyncs commented Dec 27, 2024

zhyncs commented Jan 18, 2025

[WIP] Use FlashInfer RoPE #2016

[WIP] Use FlashInfer RoPE #2016

Conversation

james-p-xu commented Nov 12, 2024 • edited Loading

Motivation

Modifications

Checklist

yzh119 commented Nov 12, 2024

zhyncs commented Nov 23, 2024

zhyncs commented Dec 11, 2024

james-p-xu commented Dec 23, 2024

zhyncs commented Dec 27, 2024

zhyncs commented Jan 18, 2025

james-p-xu commented Nov 12, 2024 •

edited

Loading