Skip to content

Incorrect usage calculation for gemini models in stream mode #1736

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 tasks done
zahariash opened this issue May 15, 2025 · 2 comments · May be fixed by #1752
Open
2 tasks done

Incorrect usage calculation for gemini models in stream mode #1736

zahariash opened this issue May 15, 2025 · 2 comments · May be fixed by #1752
Assignees

Comments

@zahariash
Copy link

Initial Checks

Description

Usage for gemini models in stream mode is calculated incorrectly.
Request_tokens seems to be multiplied by number of chunks. As a result total_tokens is also too high.

Example Code

import asyncio
from pydantic_ai import Agent


async def main():
    agent = Agent(
        "google-gla:gemini-2.0-flash",
    )

    prompt = """return only "word_1 word_2 word_3 word_4 word_5 word_6 word_7 word_8 word_9 word_10" """

    results = await agent.run(prompt)
    print(f"Run usage:\n {results.usage()}")

    async with agent.run_stream(prompt) as results:
        chunks = len([chunk async for chunk in results.stream_text(debounce_by=None)])
        print(f"Stream run usage ({chunks} chunks):\n {results.usage()}")


asyncio.run(main())

# Run usage:
#  Usage(requests=1, request_tokens=36, response_tokens=32, total_tokens=68, details=None)
# Stream run usage (4 chunks):
#  Usage(requests=1, request_tokens=147, response_tokens=32, total_tokens=179, details=None)

Python, Pydantic AI & LLM client version

python 3.13.2
pydantic-ai 0.2.4
@amiyapatanaik
Copy link

Take a look at #1577
This is a known issue that needs to be resolved asap.

@DouweM
Copy link
Contributor

DouweM commented May 19, 2025

#1752 looks good, I intend to merge that soon.

@DouweM DouweM self-assigned this May 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants