[shortfin_apps.llm] Batcher-level generation tests #923

renxida · 2025-02-06T01:04:35Z

We need integration tests that test to make sure batching does not change generation results. Previously, we've had issues like the one fixed by #873 where improper masking caused

token corruption due to a batch having more than 1 request
token corruption due to a batch being not full

Our current overall shortfin llm server integration tests weren't able to reliably expose them in a replicate-able way because our concurrency tests go through the http endpoint, where the timing in which the requests come in causes differences in how they are batched.

ScottTodd mentioned this issue Feb 12, 2025

Expand on release management docs following 3.2.0 release. iree-org/iree#19965

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[shortfin_apps.llm] Batcher-level generation tests #923

[shortfin_apps.llm] Batcher-level generation tests #923

renxida commented Feb 6, 2025

[shortfin_apps.llm] Batcher-level generation tests #923

[shortfin_apps.llm] Batcher-level generation tests #923

Comments

renxida commented Feb 6, 2025