Is there any easy way to just get the generated tokens instead of the full output? #204

arnavgarg1 · 2023-08-14T15:30:02Z

arnavgarg1
Aug 14, 2023

Same as the discussion title - I am wondering if I can just get the generated tokens in the response as opposed to the input tokens + generated tokens.

I couldn't see anything around this here: https://github.com/bentoml/OpenLLM/blob/main/src/openllm/client/runtimes/base.py. Probably just needs a new return_response strategy?

aarnphm · 2023-08-15T02:20:23Z

aarnphm
Aug 15, 2023
Maintainer

This is working in progress, in the same line with SSE

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there any easy way to just get the generated tokens instead of the full output? #204

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Is there any easy way to just get the generated tokens instead of the full output? #204

arnavgarg1 Aug 14, 2023

Replies: 1 comment

aarnphm Aug 15, 2023 Maintainer

arnavgarg1
Aug 14, 2023

aarnphm
Aug 15, 2023
Maintainer