[Shortfin LLM Server] Server continues generating after client cancels request #1111

stbaione · 2025-03-18T15:22:09Z

Description

When a request is submitted to our server, the connection remains open until the generation process is complete. However, when we kill the client request, the server does not interrupt the generation process, and continues to internally finish the request. This is a waste of throughput and compute. If a request is cancelled, there's no point in finishing the generation.

We should gracefully kill the GenerateItemProcess of a given request, if that request closes connection on the client side, prior to generation being complete.

Repro

Set SHORTFIN_APPS_LOG_LEVEL=DEBUG, to view the log outputs as the server generates.
Can follow llama_serving guide to spin up the shortfin server.
Send a request, leaving yourself enough time to kill it before it's complete:

curl http://localhost:8000/generate \
    -H "Content-Type: application/json" \
    -d '{
        "text": "<|begin_of_text|>Generate 50 random characters.<|eot_id|>",
        "sampling_params": {"max_completion_tokens": 50}
    }'

ctrl+c the curl request, to close the connection from the client side
Observe that the server still continues to generate tokens, i.e:

[2025-03-18 15:14:56.558] [info] [batcher.py:254] INVOKE ProgramFunction(decode_bs4$async: 0rrrrrrr_r): 
  0: [4, 1]
  1: [4]
  2: [4]
  3: [4, 5]
  4: [512, 2097152]
DEBUG:shortfin_apps.llm.components.batcher:Prefill bs=4, bsl=160
INFO:shortfin_apps.llm.components.batcher:INVOKE ProgramFunction(decode_bs4$async: 0rrrrrrr_r): 
  0: [4, 1]
  1: [4]
  2: [4]
  3: [4, 5]
  4: [512, 2097152]
[2025-03-18 15:14:56.559] [info] [batcher.py:254] INVOKE ProgramFunction(decode_bs4$async: 0rrrrrrr_r): 
  0: [4, 1]
  1: [4]
  2: [4]
  3: [4, 5]
  4: [512, 2097152]

The text was updated successfully, but these errors were encountered:

stbaione added the bug Something isn't working label Mar 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Shortfin LLM Server] Server continues generating after client cancels request #1111

[Shortfin LLM Server] Server continues generating after client cancels request #1111

stbaione commented Mar 18, 2025

[Shortfin LLM Server] Server continues generating after client cancels request #1111

[Shortfin LLM Server] Server continues generating after client cancels request #1111

Comments

stbaione commented Mar 18, 2025

Description

Repro