Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] Resolving integration differences after XGrammar lauch refactoring #2145

Closed
wants to merge 3 commits into from

Conversation

gittb
Copy link
Contributor

@gittb gittb commented Nov 23, 2024

Motivation

After XGrammar released their 0.10 update, there was a good bit of refactoring which seems to have broken the initial integration with SGLang. This PR works to resolve the issues and make XGrammar a working backend for SGLang.

Modifications

All the changes are going to be within xgrammar_backend.py.

So far, we've:

  • Fixed the import issues by importing GrammarCompiler instead of CachedGrammarCompiler
    • Removed the try catch logic that was put in when attempting to handle this
  • imported the functions apply_token_bitmask_inplace and allocate_token_bitmask directly from matcher
  • removed vocab_size parameter from GrammarMatcher creation as it's no longer needed by XGrammar
  • switched to using compile_json_schema rather than compile_json_schema_grammar as the method was renamed
  • switched to using clear_cache instead of clear as the method was renamed

I'm chatting with MLC offline in their discord to work through a potential issue between C++ and python within XGrammar

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@gittb gittb requested a review from hnyls2002 as a code owner November 23, 2024 21:56
@gittb gittb marked this pull request as draft November 23, 2024 21:56
Copy link
Contributor

@merrymercy merrymercy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. Can you add a test case similar to this? https://github.com/sgl-project/sglang/blob/main/test/srt/test_json_constrained.py

You can inherit the existing test class and test all existing test cases with xgrammar backend. You can add a class like this in that file.

class TestJSONConstrainedXGrammarBackend(unittest.TestCase):
    @classmethod
    def setUpClass(cls):
        cls.model = DEFAULT_MODEL_NAME_FOR_TEST
        cls.base_url = DEFAULT_URL_FOR_TEST
        cls.json_schema = json.dumps(
            {
                "type": "object",
                "properties": {
                    "name": {"type": "string", "pattern": "^[\\w]+$"},
                    "population": {"type": "integer"},
                },
                "required": ["name", "population"],
            }
        )
        cls.process = popen_launch_server(
            cls.model,
            cls.base_url,
            timeout=300,
            other_args=["--max-running-requests", "10", "--grammar-backend", "xgrammar"],
        )

Also, can you fix the lint error https://sgl-project.github.io/references/contributor_guide.html?

@gittb
Copy link
Contributor Author

gittb commented Nov 24, 2024

Thank you for all your work on SGLang!

Yes, I can do both once I get the implementation working as expected.

Copy link
Contributor

@Ubospica Ubospica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution! Just a few minor improvements.

python/sglang/srt/constrained/xgrammar_backend.py Outdated Show resolved Hide resolved
python/sglang/srt/constrained/xgrammar_backend.py Outdated Show resolved Hide resolved
python/sglang/srt/constrained/xgrammar_backend.py Outdated Show resolved Hide resolved
)
self.grammar_cache = None
return

tokenizer_info = TokenizerInfo.from_huggingface(tokenizer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assign vocab_size and stop_token_ids (from the chat_template, optionally but will make it more robust) when constructing tokenizer_info

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.. I had an in issue initially where the vocab_size provided by the tokenizer differed from vocab_size param provided when initialing the grammar backend. In my most recent commit I use vocab_size when creating tokenizer_info. My blocker now is that relating to GrammarMatcher accepting the stop token, but still trying to find the next token mask. I believe this could be related to the backend not having the correct stop tokens.

What is best way to access the chat templates's stop_token_ids information when initializing the grammar backend?

python/sglang/srt/constrained/xgrammar_backend.py Outdated Show resolved Hide resolved
Consolidated imports
Fixed arg order for allocate_token_bitmask
Added logic to move vocab_mask to logits dev prior to apply_token_bitmask_inplace
init xgrammar using vocab_size rather than tokenizer vocab size
rename grammar_cache to grammar_compiler
@gittb
Copy link
Contributor Author

gittb commented Nov 24, 2024

@Ubospica Would you take a peek at my logic for moving vocab_mask to logits device if they are not already on the same device?

My methodology here is that vocab_mask must be allocated on the CPU since it must be there to run fill_next_token_bitmask but when applying the token bit mask they must be on the same device, I assume logits would be on a cuda device, so simply move the vocab_mask to logits's device.

Is this logic correct?

@Ubospica
Copy link
Contributor

My methodology here is that vocab_mask must be allocated on the CPU since it must be there to run fill_next_token_bitmask but when applying the token bit mask they must be on the same device, I assume logits would be on a cuda device, so simply move the vocab_mask to logits's device.

Yes, that is correct. Simply moving it to GPU is okay.

@zhyncs
Copy link
Member

zhyncs commented Nov 25, 2024

fix #2166

@gittb
Copy link
Contributor Author

gittb commented Nov 25, 2024

Closing as work migrated to #2176

@gittb gittb closed this Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants