-
Notifications
You must be signed in to change notification settings - Fork 661
[Feature] Guided Decoding add LLguidance backend #5124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
Thanks for your contribution! |
| and self.fd_config.structured_outputs_config.guided_decoding_backend is not None | ||
| and self.fd_config.structured_outputs_config.guided_decoding_backend == "guidance" | ||
| ) | ||
| if not ErnieArchitectures.contains_ernie_arch(architectures) or is_guidance_backend: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
现在一言所有模型的词表都可以使用 AutoTokenizer 加载吗? 之前好像都会有问题
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4.5的 22B 可以,0.3B 会挂掉。
不走这个 FastTokenizer 逻辑就会无法使用,尬住
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #5124 +/- ##
==========================================
Coverage ? 57.73%
==========================================
Files ? 318
Lines ? 38426
Branches ? 5745
==========================================
Hits ? 22186
Misses ? 14483
Partials ? 1757
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for llguidance as a new backend for constrained decoding (structured generation) in FastDeploy, enabling grammar-based constraints during token generation. This provides an alternative to the existing XGrammar backend.
- Added llguidance backend implementation with processor, backend, and checker classes
- Integrated llguidance into the configuration system with validation
- Added comprehensive unit tests with mocking support for environments without llguidance
Reviewed Changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 69 comments.
Show a summary per file
| File | Description |
|---|---|
| fastdeploy/model_executor/guided_decoding/guidance_backend.py | Core implementation of LLGuidance backend, processor, and checker classes |
| fastdeploy/lazy_loader.py | New utility for lazy-loading modules to avoid pulling in heavy dependencies |
| fastdeploy/model_executor/guided_decoding/init.py | Factory integration for llguidance backend and checker |
| fastdeploy/model_executor/guided_decoding/base_guided_decoding.py | Added conditional logic to use HF tokenizer for guidance backend |
| fastdeploy/config.py | Configuration validation and import check for llguidance backend |
| fastdeploy/envs.py | Added environment variables for llguidance configuration |
| requirements_guided_decoding.txt | Added llguidance, torch dependencies |
| tests/model_executor/guided_decoding/test_guidance_*.py | Comprehensive unit tests with mocking support |
| docs/**/parameters.md | Updated parameter documentation to include guidance backend |
| docs/**/structured_outputs.md | Added llguidance backend to feature documentation |
| fastdeploy/model_executor/guided_decoding/xgrammar_backend.py | Removed max_rollback_tokens parameter |
Comments suppressed due to low confidence (8)
tests/model_executor/guided_decoding/test_guidance_checker.py:1
- Comments in lines 22 and 36 are in Chinese (Simplified), but the codebase uses English for comments. According to the custom guidelines, Chinese should only be used for repository members. Since this is a public code contribution, these comments should be translated to English for consistency with the rest of the codebase.
"""
tests/model_executor/guided_decoding/test_guidance_checker.py:1
- Comments in lines 22 and 36 are in Chinese (Simplified), but the codebase uses English for comments. According to the custom guidelines, Chinese should only be used for repository members. Since this is a public code contribution, these comments should be translated to English for consistency with the rest of the codebase.
"""
tests/model_executor/guided_decoding/test_guidance_checker.py:1
- Numerous docstrings and comments throughout the test file are written in Chinese (Simplified). According to the custom guidelines for external contributors, all comments and documentation should be in English for consistency with the rest of the codebase. These should be translated.
"""
tests/model_executor/guided_decoding/test_guidance_checker.py:1
- Numerous docstrings and comments throughout the test file are written in Chinese (Simplified). According to the custom guidelines for external contributors, all comments and documentation should be in English for consistency with the rest of the codebase. These should be translated.
"""
tests/model_executor/guided_decoding/test_guidance_checker.py:1
- Numerous docstrings and comments throughout the test file are written in Chinese (Simplified). According to the custom guidelines for external contributors, all comments and documentation should be in English for consistency with the rest of the codebase. These should be translated.
"""
tests/model_executor/guided_decoding/test_guidance_checker.py:1
- Numerous docstrings and comments throughout the test file are written in Chinese (Simplified). According to the custom guidelines for external contributors, all comments and documentation should be in English for consistency with the rest of the codebase. These should be translated.
"""
tests/model_executor/guided_decoding/test_guidance_backend.py:1
- Multiple inline comments throughout the test file are written in Chinese (Simplified). According to the custom guidelines for external contributors, all comments should be in English for consistency. These should be translated.
"""
fastdeploy/model_executor/guided_decoding/base_guided_decoding.py:152
- Overridden method signature does not match call, where it is passed too many arguments. Overriding method method LLGuidanceBackend._create_processor matches the call.
def _create_processor(self):
| try: | ||
| self.ll_tokenizer = llguidance_hf.from_tokenizer(self.hf_tokenizer, self.vocab_size) | ||
| except Exception as e: | ||
| import traceback |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The traceback module is already imported at the top of the file (line 19). This duplicate import inside the exception handler is redundant and should be removed.
| import traceback |
| from unittest.mock import MagicMock, patch | ||
|
|
||
| # --- Mocking Setup --- | ||
| # 优先模拟这些懒加载的模块,以便在未安装这些库的环境中进行测试。 |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple docstrings and comments throughout the test file are written in Chinese (Simplified). According to the custom guidelines for external contributors, all comments and documentation should be in English. These should be translated for consistency with the rest of the codebase.
| sys.modules["llguidance.hf"] = mock_llguidance_hf | ||
| sys.modules["llguidance.torch"] = mock_llguidance_torch | ||
|
|
||
| # 模拟设置完成后,再导入需要测试的模块 |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple docstrings and comments throughout the test file are written in Chinese (Simplified). According to the custom guidelines for external contributors, all comments and documentation should be in English. These should be translated for consistency with the rest of the codebase.
| batch_size=self.batch_size, | ||
| ) | ||
|
|
||
| # 正常token |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple inline comments throughout the test file are written in Chinese (Simplified). According to the custom guidelines for external contributors, all comments should be in English for consistency. These should be translated.
| self.assertTrue(result) | ||
| mock_matcher.consume_tokens.assert_called_with([0]) | ||
|
|
||
| # EOS token |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple inline comments throughout the test file are written in Chinese (Simplified). According to the custom guidelines for external contributors, all comments should be in English for consistency. These should be translated.
| self.fd_config.structured_outputs_config.reasoning_parser = None | ||
|
|
||
| def test_initialization(self, mock_from_tokenizer, mock_matcher): | ||
| # 测试后端初始化 |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple inline comments throughout the test file are written in Chinese (Simplified). According to the custom guidelines for external contributors, all comments should be in English for consistency. These should be translated.
Motivation
This PR adds support for llguidance as a new backend for constrained decoding (structured generation).
Modifications
Dependency Integration: Added llguidance integration .txt
Backend Implementation: Implemented the wrapper/adapter for llguidance to interface with the inference engine.
Config Update: Added configuration options to select llguidance as the constrained decoding provider
Usage or Command
structured_outputs.md
Accuracy Tests
Validity Test: Verified that the generated output strictly adheres to the provided JSON Schema and Regex patterns.
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.