how does vllm handle wrong tokens in speculative decoding? #4284
-
The core problem is, accepted tokens in different sentences of same batch are different.
thanks. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
Before the engine appends token ids to sequences, it removes -1 tokens. The logic is here: |
Beta Was this translation helpful? Give feedback.
-
these lines suggests only the last token id would be appended into "input_ids". For example, in the last sentence, I don't understand how target model compute the key & value for 1, 2, 3. |
Beta Was this translation helpful? Give feedback.
Before the engine appends token ids to sequences, it removes -1 tokens. The logic is here:
vllm/vllm/engine/output_processor/multi_step.py
Line 68 in 34128a6