Skip to content

Conversation

@skamenan7
Copy link
Contributor

@skamenan7 skamenan7 commented Nov 19, 2025

What does this PR do?

Injects stream_options={"include_usage": True} for OpenAI-compatible providers when streaming and telemetry is active. This allows token usage metrics to be collected and emitted for streaming responses.

Changes include:

  • Injecting stream_options in OpenAIMixin (completion & chat) when tracing is enabled
  • Adding metric emission logic for completion streaming in InferenceRouter
  • Removing duplicate logic from WatsonX and Runpod providers

Closes #3981

Test Plan

Added unit tests in tests/unit/providers/utils/inference/test_openai_mixin.py verifying:

Ran tests locally:
PYTHONPATH=src pytest tests/unit/providers/utils/inference/test_openai_mixin.py -v

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 19, 2025
@mergify
Copy link

mergify bot commented Nov 19, 2025

This pull request has merge conflicts that must be resolved before it can be merged. @skamenan7 please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Nov 19, 2025
@skamenan7 skamenan7 force-pushed the feat/3981-enable-streaming-usage-metrics-all branch from 0b6843b to 4325345 Compare November 19, 2025 22:11
@mergify mergify bot removed the needs-rebase label Nov 19, 2025
@cdoern
Copy link
Collaborator

cdoern commented Nov 19, 2025

I think #4127 should supersede this probably, right?

@mergify
Copy link

mergify bot commented Nov 19, 2025

This pull request has merge conflicts that must be resolved before it can be merged. @skamenan7 please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Nov 19, 2025
@skamenan7
Copy link
Contributor Author

skamenan7 commented Nov 20, 2025

Yes, Charlie, even with such overhaul of the telemetry (Thanks @iamemilio ), OpenAI provider will not send usage data unless we explicitly ask for it with stream_options={"include_usage": True} as Automatic instrumentation libraries usually do not modify your API payloads to ask for extra data.

@skamenan7 skamenan7 force-pushed the feat/3981-enable-streaming-usage-metrics-all branch from 4325345 to c81784c Compare November 20, 2025 14:13
@mergify mergify bot removed the needs-rebase label Nov 20, 2025
@iamemilio
Copy link
Contributor

iamemilio commented Nov 20, 2025

@skamenan7 Sorry to say, my changes are going to make a bit of work for you. I would really suggest working in the bounds of the changes I made and experimenting with automatic instrumentation from opentelemetry, because tokens are something it actively captures. That said, you are correct that tokens are not included in the payloads from streaming data unless you set it in the arguments. Please do experiment with my PR and see what has changed, the old telemetry system you are using is going to be removed soon. If llama stack wanted to enable token metrics to all the services it routes inference streaming requests too, that is a clever solution, but we also need to make sure we are respecting the client's preferences and not returning the token metrics chunk if they did not enable it. I'm happy to help if you need it!

Copy link
Collaborator

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iamemilio this change is focused and addresses a known issue with the streaming metric generation. can you help by having it go in and then align it w/ the new telemetry architecture as part of #4127?

@skamenan7 bedrock needs updating as well

@iamemilio
Copy link
Contributor

I think we are doing this backwards @mattf. @skamenan7 and I are going to pair on this to position this change as a follow up PR to #4127.

I can not keep increasing the complexity of what is already an egregiously large pull request otherwise it will be too difficult to review and test. Having to handle this would be a major time sink and setback for me.

@skamenan7 skamenan7 force-pushed the feat/3981-enable-streaming-usage-metrics-all branch from c81784c to 553d2e5 Compare November 20, 2025 18:27
@skamenan7
Copy link
Contributor Author

Thanks @mattf for that catch. I updated Bedrock as well.

@skamenan7
Copy link
Contributor Author

Yes, me and Emilio are meeting soon to discuss but I made the updates so as not to forget.

@skamenan7
Copy link
Contributor Author

cc: @leseb @rhuss

Inject stream_options for telemetry, add completion streaming metrics,
fix params mutation, remove duplicate provider logic. Add unit tests.
@skamenan7 skamenan7 force-pushed the feat/3981-enable-streaming-usage-metrics-all branch from 553d2e5 to 37d588d Compare November 20, 2025 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable streaming usage metrics in OpenAIMixin for all OpenAI-compatible providers

4 participants