fix: fix usage reporting with CustomWrapper #14961

mhdawson · 2025-09-26T21:04:34Z

Llama stack unconditionally expects usage information when using Responses API and streaming when telemetry is enabled. For full details see llamastack/llama-stack#3571.

Debugging that issue revealed that LiteLLM does not honour a request for usage when streaming and using the vertex api. This PR adds that reporting using the same function as used elsewhere.

NOTE: Some of the changes were due to running make format. It seems like the files I updated did not previously meet the formatting requirements so that added changes beyond the lines I added/changed.

Title

Fix usage reporting with CustomWrapper

Relevant issues

Refs: llamastack/llama-stack#3571

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
I have added a screenshot of my new test passing locally
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem

Type

🐛 Bug Fix

Changes

Refs: llamastack/llama-stack#3571 Llama stack unconditionally expects usage information when using Responses API and streaming when telemetry is enabled. For full details see llamastack/llama-stack#3571. Debugging that issue revealed that LiteLLM does not honour a request for usage when streaming and using the vertex api. This PR adds that reporting using the same function as used elsewhere. Signed-off-by: Michael Dawson <[email protected]>

vercel · 2025-09-26T21:04:39Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
litellm	Ready	Preview	Comment	Sep 26, 2025 9:06pm

CLAassistant · 2025-09-26T21:04:40Z

All committers have signed the CLA.

mhdawson · 2025-09-26T21:10:37Z

The linting failures don't seem related to any files that I changed and also seem to fail on prior PRs.

mhdawson · 2025-09-26T21:22:19Z

Looking through recent history I don't see very many PRs that have passed the Mock tests. That along with not seeing how the test that failed would be related to any of the changes in the PR make me think its not related to the PR.

krrishdholakia · 2025-09-27T14:41:07Z

litellm/litellm_core_utils/streaming_handler.py

+            and hasattr(self, "chunks")
+        ):
+            # Calculate usage from accumulated chunks
+            usage = calculate_total_usage(chunks=self.chunks)


this would do it every time the model response object is created.

i can see us doing a usage calculation for gemini on streaming already @mhdawson

litellm/litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py

Line 2123 in dc605b1

usage = VertexGeminiConfig._calculate_usage(

is there a minimal script you can share for me to reproduce the issue? Curious what's happening

@krrishdholakia thanks for following up. My recreate unfortuantely is with llama stack and the responses API where the usage field was not being populated. Does the test that is being added in the PR potentially show the issue as I think it confirms usage is not populated when it is not requested and the custom wrapper is in use?

Since I don't know the code base well, I asked Claude to explain the issue. This is what is said:

_calculate_usage was being called during streaming - specifically in ModelResponseIterator.chunk_parser() at vertex_and_google_ai_studio_gemini.py:2130, where it calculates
usage for each individual chunk.

The problem was that CustomStreamWrapper wasn't aggregating this usage information from the chunks.

Here's the flow:

Per-chunk calculation (already happening): ModelResponseIterator.chunk_parser() calls _calculate_usage() on each streaming chunk and sets model_response.usage (line 2142)

Missing aggregation (the bug): CustomStreamWrapper collects these chunks in self.chunks but wasn't extracting/aggregating the usage data when stream_options was enabled

The fix:
- Passes stream_options to CustomStreamWrapper so it knows to enable usage tracking (lines 1749, 1763 in vertex file)
- Added code in CustomStreamWrapper.model_response_creator() (streaming_handler.py:665-672) that calls calculate_total_usage(chunks=self.chunks) to aggregate usage from all
chunks

So _calculate_usage was running, but its results were being discarded. The fix enabled CustomStreamWrapper to collect and report the aggregated usage when send_stream_usage=True.

@krrishdholakia not sure if there is anything I need to do so it gets untagged for waiting on a response.

vercel bot deployed to Preview September 26, 2025 21:06 View deployment

krrishdholakia reviewed Sep 27, 2025

View reviewed changes

krrishdholakia added the awaiting: user response label Sep 27, 2025

mhdawson mentioned this pull request Sep 30, 2025

fix: ensure usage is requested if telemetry is enabled llamastack/llama-stack#3571

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: fix usage reporting with CustomWrapper #14961

fix: fix usage reporting with CustomWrapper #14961

mhdawson commented Sep 26, 2025 •

edited

Loading

Uh oh!

vercel bot commented Sep 26, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Sep 26, 2025 •

edited

Loading

Uh oh!

mhdawson commented Sep 26, 2025 •

edited

Loading

Uh oh!

mhdawson commented Sep 26, 2025 •

edited

Loading

Uh oh!

krrishdholakia Sep 27, 2025

Uh oh!

mhdawson Sep 29, 2025 •

edited

Loading

Uh oh!

mhdawson Sep 30, 2025 •

edited

Loading

Uh oh!

mhdawson Oct 2, 2025

Uh oh!

Uh oh!

Uh oh!

fix: fix usage reporting with CustomWrapper #14961

Are you sure you want to change the base?

fix: fix usage reporting with CustomWrapper #14961

Conversation

mhdawson commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Title

Relevant issues

Pre-Submission checklist

Type

Changes

Uh oh!

vercel bot commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mhdawson commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mhdawson commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

krrishdholakia Sep 27, 2025

Choose a reason for hiding this comment

Uh oh!

mhdawson Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhdawson Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhdawson Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mhdawson commented Sep 26, 2025 •

edited

Loading

vercel bot commented Sep 26, 2025 •

edited

Loading

CLAassistant commented Sep 26, 2025 •

edited

Loading

mhdawson commented Sep 26, 2025 •

edited

Loading

mhdawson commented Sep 26, 2025 •

edited

Loading

mhdawson Sep 29, 2025 •

edited

Loading

mhdawson Sep 30, 2025 •

edited

Loading