Is there any easy way to just get the generated tokens instead of the full output? #204
arnavgarg1
started this conversation in
General
Replies: 1 comment
-
This is working in progress, in the same line with SSE |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Same as the discussion title - I am wondering if I can just get the generated tokens in the response as opposed to the input tokens + generated tokens.
I couldn't see anything around this here: https://github.com/bentoml/OpenLLM/blob/main/src/openllm/client/runtimes/base.py. Probably just needs a new
return_response
strategy?Beta Was this translation helpful? Give feedback.
All reactions