Skip to content

Conversation

@ST-XX
Copy link
Collaborator

@ST-XX ST-XX commented Nov 19, 2025

Motivation

This PR adds support for llguidance as a new backend for constrained decoding (structured generation).

Modifications

Dependency Integration: Added llguidance integration .txt
Backend Implementation: Implemented the wrapper/adapter for llguidance to interface with the inference engine.
Config Update: Added configuration options to select llguidance as the constrained decoding provider

Usage or Command

structured_outputs.md

Accuracy Tests

Validity Test: Verified that the generated output strictly adheres to the provided JSON Schema and Regex patterns.

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link

paddle-bot bot commented Nov 19, 2025

Thanks for your contribution!

@ST-XX ST-XX requested a review from kevincheng2 November 20, 2025 00:04
@ST-XX ST-XX requested a review from Jiang-Jia-Jun November 20, 2025 07:17
and self.fd_config.structured_outputs_config.guided_decoding_backend is not None
and self.fd_config.structured_outputs_config.guided_decoding_backend == "guidance"
)
if not ErnieArchitectures.contains_ernie_arch(architectures) or is_guidance_backend:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

现在一言所有模型的词表都可以使用 AutoTokenizer 加载吗? 之前好像都会有问题

Copy link
Collaborator Author

@ST-XX ST-XX Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4.5的 22B 可以,0.3B 会挂掉。
不走这个 FastTokenizer 逻辑就会无法使用,尬住

@ST-XX ST-XX requested a review from kevincheng2 November 20, 2025 09:07
@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 76.58537% with 48 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@f1e36ff). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...model_executor/guided_decoding/guidance_backend.py 84.90% 17 Missing and 7 partials ⚠️
fastdeploy/config.py 0.00% 11 Missing ⚠️
...tdeploy/model_executor/guided_decoding/__init__.py 0.00% 6 Missing ⚠️
fastdeploy/lazy_loader.py 81.48% 5 Missing ⚠️
...l_executor/guided_decoding/base_guided_decoding.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #5124   +/-   ##
==========================================
  Coverage           ?   57.73%           
==========================================
  Files              ?      318           
  Lines              ?    38426           
  Branches           ?     5745           
==========================================
  Hits               ?    22186           
  Misses             ?    14483           
  Partials           ?     1757           
Flag Coverage Δ
diff 57.73% <76.58%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for llguidance as a new backend for constrained decoding (structured generation) in FastDeploy, enabling grammar-based constraints during token generation. This provides an alternative to the existing XGrammar backend.

  • Added llguidance backend implementation with processor, backend, and checker classes
  • Integrated llguidance into the configuration system with validation
  • Added comprehensive unit tests with mocking support for environments without llguidance

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 69 comments.

Show a summary per file
File Description
fastdeploy/model_executor/guided_decoding/guidance_backend.py Core implementation of LLGuidance backend, processor, and checker classes
fastdeploy/lazy_loader.py New utility for lazy-loading modules to avoid pulling in heavy dependencies
fastdeploy/model_executor/guided_decoding/init.py Factory integration for llguidance backend and checker
fastdeploy/model_executor/guided_decoding/base_guided_decoding.py Added conditional logic to use HF tokenizer for guidance backend
fastdeploy/config.py Configuration validation and import check for llguidance backend
fastdeploy/envs.py Added environment variables for llguidance configuration
requirements_guided_decoding.txt Added llguidance, torch dependencies
tests/model_executor/guided_decoding/test_guidance_*.py Comprehensive unit tests with mocking support
docs/**/parameters.md Updated parameter documentation to include guidance backend
docs/**/structured_outputs.md Added llguidance backend to feature documentation
fastdeploy/model_executor/guided_decoding/xgrammar_backend.py Removed max_rollback_tokens parameter
Comments suppressed due to low confidence (8)

tests/model_executor/guided_decoding/test_guidance_checker.py:1

  • Comments in lines 22 and 36 are in Chinese (Simplified), but the codebase uses English for comments. According to the custom guidelines, Chinese should only be used for repository members. Since this is a public code contribution, these comments should be translated to English for consistency with the rest of the codebase.
"""

tests/model_executor/guided_decoding/test_guidance_checker.py:1

  • Comments in lines 22 and 36 are in Chinese (Simplified), but the codebase uses English for comments. According to the custom guidelines, Chinese should only be used for repository members. Since this is a public code contribution, these comments should be translated to English for consistency with the rest of the codebase.
"""

tests/model_executor/guided_decoding/test_guidance_checker.py:1

  • Numerous docstrings and comments throughout the test file are written in Chinese (Simplified). According to the custom guidelines for external contributors, all comments and documentation should be in English for consistency with the rest of the codebase. These should be translated.
"""

tests/model_executor/guided_decoding/test_guidance_checker.py:1

  • Numerous docstrings and comments throughout the test file are written in Chinese (Simplified). According to the custom guidelines for external contributors, all comments and documentation should be in English for consistency with the rest of the codebase. These should be translated.
"""

tests/model_executor/guided_decoding/test_guidance_checker.py:1

  • Numerous docstrings and comments throughout the test file are written in Chinese (Simplified). According to the custom guidelines for external contributors, all comments and documentation should be in English for consistency with the rest of the codebase. These should be translated.
"""

tests/model_executor/guided_decoding/test_guidance_checker.py:1

  • Numerous docstrings and comments throughout the test file are written in Chinese (Simplified). According to the custom guidelines for external contributors, all comments and documentation should be in English for consistency with the rest of the codebase. These should be translated.
"""

tests/model_executor/guided_decoding/test_guidance_backend.py:1

  • Multiple inline comments throughout the test file are written in Chinese (Simplified). According to the custom guidelines for external contributors, all comments should be in English for consistency. These should be translated.
"""

fastdeploy/model_executor/guided_decoding/base_guided_decoding.py:152

    def _create_processor(self):

try:
self.ll_tokenizer = llguidance_hf.from_tokenizer(self.hf_tokenizer, self.vocab_size)
except Exception as e:
import traceback
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The traceback module is already imported at the top of the file (line 19). This duplicate import inside the exception handler is redundant and should be removed.

Suggested change
import traceback

Copilot uses AI. Check for mistakes.
from unittest.mock import MagicMock, patch

# --- Mocking Setup ---
# 优先模拟这些懒加载的模块,以便在未安装这些库的环境中进行测试。
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple docstrings and comments throughout the test file are written in Chinese (Simplified). According to the custom guidelines for external contributors, all comments and documentation should be in English. These should be translated for consistency with the rest of the codebase.

Copilot uses AI. Check for mistakes.
sys.modules["llguidance.hf"] = mock_llguidance_hf
sys.modules["llguidance.torch"] = mock_llguidance_torch

# 模拟设置完成后,再导入需要测试的模块
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple docstrings and comments throughout the test file are written in Chinese (Simplified). According to the custom guidelines for external contributors, all comments and documentation should be in English. These should be translated for consistency with the rest of the codebase.

Copilot uses AI. Check for mistakes.
batch_size=self.batch_size,
)

# 正常token
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple inline comments throughout the test file are written in Chinese (Simplified). According to the custom guidelines for external contributors, all comments should be in English for consistency. These should be translated.

Copilot uses AI. Check for mistakes.
self.assertTrue(result)
mock_matcher.consume_tokens.assert_called_with([0])

# EOS token
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple inline comments throughout the test file are written in Chinese (Simplified). According to the custom guidelines for external contributors, all comments should be in English for consistency. These should be translated.

Copilot uses AI. Check for mistakes.
self.fd_config.structured_outputs_config.reasoning_parser = None

def test_initialization(self, mock_from_tokenizer, mock_matcher):
# 测试后端初始化
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple inline comments throughout the test file are written in Chinese (Simplified). According to the custom guidelines for external contributors, all comments should be in English for consistency. These should be translated.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants