You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've encountered an issue when trying to use transformers-cfg for constrained generation with the recently released gemma-3 models.
Adding GrammarConstrainedLogitsProcessor from transformers-cfg to the logits_processor causes the model.generate() to fail with: ValueError: impossible for tokenizer vocab to be less than model vocab
The error occurs regardless of whether the model is loaded using AutoModelForCausalLM or the specific Gemma3ForCausalLM class (same for the tokenizer class). The example below uses Gemma3ForCausalLM:
------------------------------------------------------------
ENVIRONMENT:
- Python: 3.12.3
- transformers: 4.51.1
- torch: 2.4.0
- accelerate: 1.0.1
- bitsandbytes: 0.45.5
- trans-cfg: 0.2.7
------------------------------------------------------------
Using device: cuda
Loading tokenizer...
Loading model (Gemma3ForCausalLM)...
(Post-load check) Model vocab: 262144, Tokenizer vocab: 262144
Creating grammar processor...
`generation_config` default values have been modified to match model-specific defaults: {'do_sample': True, 'cache_implementation': 'hybrid', 'top_k': 64, 'top_p': 0.95, 'bos_token_id': 2}. If this is not desired, please set these values explicitly.
ERROR Type: AssertionError
ERROR Message: impossible for tokenizer vocab to be less than model vocab
-----------------------------------
Traceback (most recent call last):
File "C:\Users\georg\AppData\Local\Temp\ipykernel_45580\3883546750.py", line 95, in <module>
outputs = model.generate(
^^^^^^^^^^^^^^^
File "c:\Users\georg\anaconda3\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\georg\anaconda3\Lib\site-packages\transformers\generation\utils.py", line 2463, in generate
result = self._sample(
^^^^^^^^^^^^^
File "c:\Users\georg\anaconda3\Lib\site-packages\transformers\generation\utils.py", line 3448, in _sample
next_token_scores = logits_processor(input_ids, next_token_logits)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\georg\anaconda3\Lib\site-packages\transformers\generation\logits_process.py", line 88, in __call__
scores = processor(input_ids, scores)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\georg\anaconda3\Lib\site-packages\transformers_cfg\generation\logits_process.py", line 164, in __call__
return self.process_logits(input_ids, scores)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\georg\anaconda3\Lib\site-packages\transformers_cfg\generation\logits_process.py", line 157, in process_logits
masked_scores = self.mask_logits(scores, device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\georg\anaconda3\Lib\site-packages\transformers_cfg\generation\logits_process.py", line 77, in mask_logits
acceptance_vocab_size < masked_logits_vocab_size
AssertionError: impossible for tokenizer vocab to be less than model vocab
The text was updated successfully, but these errors were encountered:
Hello @GeorgeDeac
It does seem to be caused by the mismatch of model embedding size and tokenizer's vocab. Could you have a try with the PR #126 ? I still need to double check it before merging but it looks good to me
Hey @GeorgeDeac, thanks a lot for the detailed setup information and the reproducibility script — really appreciate it!
It turns out the issue was caused by an inconsistency in the tokenizer implementation for Gemma-3. I’ve added a fix here: #128
Thanks for the contribution of PR #126 , I will review it this weekend @urroxyz
I've encountered an issue when trying to use transformers-cfg for constrained generation with the recently released gemma-3 models.
Adding GrammarConstrainedLogitsProcessor from transformers-cfg to the logits_processor causes the model.generate() to fail with:
ValueError: impossible for tokenizer vocab to be less than model vocab
The error occurs regardless of whether the model is loaded using AutoModelForCausalLM or the specific Gemma3ForCausalLM class (same for the tokenizer class). The example below uses Gemma3ForCausalLM:
Which for me throws:
The text was updated successfully, but these errors were encountered: