-
Notifications
You must be signed in to change notification settings - Fork 816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] Resolving integration differences after XGrammar lauch refactoring #2145
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution. Can you add a test case similar to this? https://github.com/sgl-project/sglang/blob/main/test/srt/test_json_constrained.py
You can inherit the existing test class and test all existing test cases with xgrammar backend. You can add a class like this in that file.
class TestJSONConstrainedXGrammarBackend(unittest.TestCase):
@classmethod
def setUpClass(cls):
cls.model = DEFAULT_MODEL_NAME_FOR_TEST
cls.base_url = DEFAULT_URL_FOR_TEST
cls.json_schema = json.dumps(
{
"type": "object",
"properties": {
"name": {"type": "string", "pattern": "^[\\w]+$"},
"population": {"type": "integer"},
},
"required": ["name", "population"],
}
)
cls.process = popen_launch_server(
cls.model,
cls.base_url,
timeout=300,
other_args=["--max-running-requests", "10", "--grammar-backend", "xgrammar"],
)
Also, can you fix the lint error https://sgl-project.github.io/references/contributor_guide.html?
Thank you for all your work on SGLang! Yes, I can do both once I get the implementation working as expected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! Just a few minor improvements.
) | ||
self.grammar_cache = None | ||
return | ||
|
||
tokenizer_info = TokenizerInfo.from_huggingface(tokenizer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assign vocab_size and stop_token_ids (from the chat_template, optionally but will make it more robust) when constructing tokenizer_info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.. I had an in issue initially where the vocab_size provided by the tokenizer differed from vocab_size
param provided when initialing the grammar backend. In my most recent commit I use vocab_size
when creating tokenizer_info
. My blocker now is that relating to GrammarMatcher accepting the stop token, but still trying to find the next token mask. I believe this could be related to the backend not having the correct stop tokens.
What is best way to access the chat templates's stop_token_ids information when initializing the grammar backend?
Consolidated imports Fixed arg order for allocate_token_bitmask Added logic to move vocab_mask to logits dev prior to apply_token_bitmask_inplace init xgrammar using vocab_size rather than tokenizer vocab size rename grammar_cache to grammar_compiler
@Ubospica Would you take a peek at my logic for moving My methodology here is that Is this logic correct? |
Yes, that is correct. Simply moving it to GPU is okay. |
fix #2166 |
Closing as work migrated to #2176 |
Motivation
After XGrammar released their 0.10 update, there was a good bit of refactoring which seems to have broken the initial integration with SGLang. This PR works to resolve the issues and make XGrammar a working backend for SGLang.
Modifications
All the changes are going to be within
xgrammar_backend.py
.So far, we've:
GrammarCompiler
instead ofCachedGrammarCompiler
apply_token_bitmask_inplace
andallocate_token_bitmask
directly frommatcher
vocab_size
parameter fromGrammarMatcher
creation as it's no longer needed by XGrammarcompile_json_schema
rather thancompile_json_schema_grammar
as the method was renamedclear_cache
instead ofclear
as the method was renamedI'm chatting with MLC offline in their discord to work through a potential issue between C++ and python within XGrammar
Checklist