-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: enable streaming usage metrics for OpenAI-compatible providers #4200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: enable streaming usage metrics for OpenAI-compatible providers #4200
Conversation
|
This pull request has merge conflicts that must be resolved before it can be merged. @skamenan7 please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
0b6843b to
4325345
Compare
|
I think #4127 should supersede this probably, right? |
|
This pull request has merge conflicts that must be resolved before it can be merged. @skamenan7 please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
|
Yes, Charlie, even with such overhaul of the telemetry (Thanks @iamemilio ), OpenAI provider will not send usage data unless we explicitly ask for it with stream_options={"include_usage": True} as Automatic instrumentation libraries usually do not modify your API payloads to ask for extra data. |
4325345 to
c81784c
Compare
|
@skamenan7 Sorry to say, my changes are going to make a bit of work for you. I would really suggest working in the bounds of the changes I made and experimenting with automatic instrumentation from opentelemetry, because tokens are something it actively captures. That said, you are correct that tokens are not included in the payloads from streaming data unless you set it in the arguments. Please do experiment with my PR and see what has changed, the old telemetry system you are using is going to be removed soon. If llama stack wanted to enable token metrics to all the services it routes inference streaming requests too, that is a clever solution, but we also need to make sure we are respecting the client's preferences and not returning the token metrics chunk if they did not enable it. I'm happy to help if you need it! |
mattf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iamemilio this change is focused and addresses a known issue with the streaming metric generation. can you help by having it go in and then align it w/ the new telemetry architecture as part of #4127?
@skamenan7 bedrock needs updating as well
|
I think we are doing this backwards @mattf. @skamenan7 and I are going to pair on this to position this change as a follow up PR to #4127. I can not keep increasing the complexity of what is already an egregiously large pull request otherwise it will be too difficult to review and test. Having to handle this would be a major time sink and setback for me. |
c81784c to
553d2e5
Compare
|
Thanks @mattf for that catch. I updated Bedrock as well. |
|
Yes, me and Emilio are meeting soon to discuss but I made the updates so as not to forget. |
Inject stream_options for telemetry, add completion streaming metrics, fix params mutation, remove duplicate provider logic. Add unit tests.
553d2e5 to
37d588d
Compare
What does this PR do?
Injects
stream_options={"include_usage": True}for OpenAI-compatible providers when streaming and telemetry is active. This allows token usage metrics to be collected and emitted for streaming responses.Changes include:
stream_optionsinOpenAIMixin(completion & chat) when tracing is enabledInferenceRouterCloses #3981
Test Plan
Added unit tests in
tests/unit/providers/utils/inference/test_openai_mixin.pyverifying:Ran tests locally:
PYTHONPATH=src pytest tests/unit/providers/utils/inference/test_openai_mixin.py -v