Speed is even slower than the baseline. #2

Open

opened

on Jun 26, 2026

Due to hardware limitations, I used Llama-3.1-1B as the draft model and Llama-3.1-13B as the target model; why is the resulting generation speed slower than the baseline?

Metadata

Assignees

No one assigned

Labels

No labels

No labels

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests