[Draft] Resolving integration differences after XGrammar lauch refactoring #2145

gittb · 2024-11-23T21:56:34Z

Motivation

After XGrammar released their 0.10 update, there was a good bit of refactoring which seems to have broken the initial integration with SGLang. This PR works to resolve the issues and make XGrammar a working backend for SGLang.

Modifications

All the changes are going to be within xgrammar_backend.py.

So far, we've:

Fixed the import issues by importing GrammarCompiler instead of CachedGrammarCompiler
- Removed the try catch logic that was put in when attempting to handle this
imported the functions apply_token_bitmask_inplace and allocate_token_bitmask directly from matcher
removed vocab_size parameter from GrammarMatcher creation as it's no longer needed by XGrammar
switched to using compile_json_schema rather than compile_json_schema_grammar as the method was renamed
switched to using clear_cache instead of clear as the method was renamed

I'm chatting with MLC offline in their discord to work through a potential issue between C++ and python within XGrammar

Checklist

Format your code according to the Contributor Guide.
Add unit tests as outlined in the Contributor Guide.
Update documentation as needed, including docstrings or example tutorials.

merrymercy

Thanks for the contribution. Can you add a test case similar to this? https://github.com/sgl-project/sglang/blob/main/test/srt/test_json_constrained.py

You can inherit the existing test class and test all existing test cases with xgrammar backend. You can add a class like this in that file.

class TestJSONConstrainedXGrammarBackend(unittest.TestCase):
    @classmethod
    def setUpClass(cls):
        cls.model = DEFAULT_MODEL_NAME_FOR_TEST
        cls.base_url = DEFAULT_URL_FOR_TEST
        cls.json_schema = json.dumps(
            {
                "type": "object",
                "properties": {
                    "name": {"type": "string", "pattern": "^[\\w]+$"},
                    "population": {"type": "integer"},
                },
                "required": ["name", "population"],
            }
        )
        cls.process = popen_launch_server(
            cls.model,
            cls.base_url,
            timeout=300,
            other_args=["--max-running-requests", "10", "--grammar-backend", "xgrammar"],
        )

Also, can you fix the lint error https://sgl-project.github.io/references/contributor_guide.html?

gittb · 2024-11-24T18:33:15Z

Thank you for all your work on SGLang!

Yes, I can do both once I get the implementation working as expected.

Ubospica

Thanks for your contribution! Just a few minor improvements.

python/sglang/srt/constrained/xgrammar_backend.py

Ubospica · 2024-11-24T18:45:58Z

python/sglang/srt/constrained/xgrammar_backend.py

-            )
-            self.grammar_cache = None
-            return
-
        tokenizer_info = TokenizerInfo.from_huggingface(tokenizer)


Assign vocab_size and stop_token_ids (from the chat_template, optionally but will make it more robust) when constructing tokenizer_info

Agreed.. I had an in issue initially where the vocab_size provided by the tokenizer differed from vocab_size param provided when initialing the grammar backend. In my most recent commit I use vocab_size when creating tokenizer_info. My blocker now is that relating to GrammarMatcher accepting the stop token, but still trying to find the next token mask. I believe this could be related to the backend not having the correct stop tokens.

What is best way to access the chat templates's stop_token_ids information when initializing the grammar backend?

python/sglang/srt/constrained/xgrammar_backend.py

Consolidated imports Fixed arg order for allocate_token_bitmask Added logic to move vocab_mask to logits dev prior to apply_token_bitmask_inplace init xgrammar using vocab_size rather than tokenizer vocab size rename grammar_cache to grammar_compiler

gittb · 2024-11-24T19:27:10Z

@Ubospica Would you take a peek at my logic for moving vocab_mask to logits device if they are not already on the same device?

My methodology here is that vocab_mask must be allocated on the CPU since it must be there to run fill_next_token_bitmask but when applying the token bit mask they must be on the same device, I assume logits would be on a cuda device, so simply move the vocab_mask to logits's device.

Is this logic correct?

Ubospica · 2024-11-24T20:07:19Z

My methodology here is that vocab_mask must be allocated on the CPU since it must be there to run fill_next_token_bitmask but when applying the token bit mask they must be on the same device, I assume logits would be on a cuda device, so simply move the vocab_mask to logits's device.

Yes, that is correct. Simply moving it to GPU is okay.

zhyncs · 2024-11-25T08:22:25Z

fix #2166

gittb · 2024-11-25T15:37:16Z

Closing as work migrated to #2176

Resolving integration differences after xgrammar refactor

91dc256

gittb requested a review from hnyls2002 as a code owner November 23, 2024 21:56

gittb marked this pull request as draft November 23, 2024 21:56

merrymercy requested changes Nov 24, 2024

View reviewed changes

Ubospica reviewed Nov 24, 2024

View reviewed changes

Updated MAX_ROLLBACK_TOKENS

3e5a76c

Consolidated imports Fixed arg order for allocate_token_bitmask Added logic to move vocab_mask to logits dev prior to apply_token_bitmask_inplace init xgrammar using vocab_size rather than tokenizer vocab size rename grammar_cache to grammar_compiler

liting and draft of test case

dea0e27

gittb closed this Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Draft] Resolving integration differences after XGrammar lauch refactoring #2145

[Draft] Resolving integration differences after XGrammar lauch refactoring #2145

gittb commented Nov 23, 2024 •

edited

Loading

merrymercy left a comment •

edited

Loading

gittb commented Nov 24, 2024

Ubospica left a comment

Ubospica Nov 24, 2024

gittb Nov 24, 2024

gittb commented Nov 24, 2024

Ubospica commented Nov 24, 2024

zhyncs commented Nov 25, 2024

gittb commented Nov 25, 2024

[Draft] Resolving integration differences after XGrammar lauch refactoring #2145

[Draft] Resolving integration differences after XGrammar lauch refactoring #2145

Conversation

gittb commented Nov 23, 2024 • edited Loading

Motivation

Modifications

Checklist

merrymercy left a comment • edited Loading

Choose a reason for hiding this comment

gittb commented Nov 24, 2024

Ubospica left a comment

Choose a reason for hiding this comment

Ubospica Nov 24, 2024

Choose a reason for hiding this comment

gittb Nov 24, 2024

Choose a reason for hiding this comment

gittb commented Nov 24, 2024

Ubospica commented Nov 24, 2024

zhyncs commented Nov 25, 2024

gittb commented Nov 25, 2024

gittb commented Nov 23, 2024 •

edited

Loading

merrymercy left a comment •

edited

Loading