Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[shortfin_apps.llm] Batcher-level generation tests #923

Open
renxida opened this issue Feb 6, 2025 · 0 comments
Open

[shortfin_apps.llm] Batcher-level generation tests #923

renxida opened this issue Feb 6, 2025 · 0 comments

Comments

@renxida
Copy link
Contributor

renxida commented Feb 6, 2025

We need integration tests that test to make sure batching does not change generation results. Previously, we've had issues like the one fixed by #873 where improper masking caused

  • token corruption due to a batch having more than 1 request
  • token corruption due to a batch being not full

Our current overall shortfin llm server integration tests weren't able to reliably expose them in a replicate-able way because our concurrency tests go through the http endpoint, where the timing in which the requests come in causes differences in how they are batched.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant