-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Additional functionality related to thinking, for Google and Anthropic LLMs. #3175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| logger.info(f"Client disconnected") | ||
| await task.cancel() | ||
|
|
||
| @transcript.event_handler("on_transcript_update") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure whether it made more sense to piggyback on the existing "on_transcript_update" event—which can now produce two kinds of messages, TranscriptionMessage and ThoughtTranscriptionMessage—or to introduce a new event like "on_thought_transcript_update" that treats thoughts conceptually as a side transcript and not the main transcript.
Thoughts? (Joke intended)
| if part.thought: | ||
| # Gemini emits fully-formed thoughts rather | ||
| # than chunks so bracket each thought in | ||
| # start/end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that there may be multiple fully-formed thoughts before the assistant response, so by doing this (bracketing each thought in start/end) we may actually end up with a transcript (as generated by the TranscriptProcessor) looking like:
user: ...
thought: ...
thought: ...
thought: ...
assistant: ...
In my opinion, that's totally fine, and might even be helpful for debugging. But it does deviate from the pattern of one entry in the transcript per "speaker".
f6d7794 to
7837d4d
Compare
|
A thought. I'm considering punting on this task
to a follow-up PR to keep this one from ballooning too much. In my testing, Gemini 3 Pro doesn't seem to be particularly well-suited for real-time conversations due to high processing times, so it's probably OK if proper support is a fast follow. UPDATE: ended up doing the task. |
d70c330 to
576ed8b
Compare
|
Marking as ready for review. Mostly just CHANGELOG and some more testing work left, but don't foresee major changes. Some additional stuff can be punted to a follow-up PR, if need be. |
3293724 to
03925be
Compare
79bd0a6 to
62f48c8
Compare
|
From convo with @aconchillo:
UPDATE: done. |
|
Cont'd
UPDATE: done |
|
Cont'd:
UPDATE: done |
…thropic LLMs. Thinking, sometimes called "extended thinking" or "reasoning", is an LLM process where the model takes some additional time before giving an answer. It's useful for complex tasks that may require some level of planning and structured, step-by-step reasoning. The model can output its thoughts (or thought summaries, depending on the model) in addition to the answer. The thoughts are usually pretty granular and not really suitable for being spoken out loud in a conversation, but can be useful for logging or prompt debugging. Here's what's added: 1. New typed input parameters for Google and Anthropic LLMs that control the models' thinking behavior (like how much thinking to do, and whether to output thoughts or thought summaries). 2. New frames for representing thoughts output by LLMs. 3. A generic mechanism for associating extra LLM-specific data with a function call in context, used specifically to support Google's function-call-related "thought signatures", which are necessary to ensure thinking continuity between function calls in a chain (where the model thinks, makes a function call, thinks some more, etc.) 4. A generic mechanism for recording LLM thoughts to context, used specifically to support Anthropic, whose thought signatures are expected to appear alongside the text of the thoughts within assistant context messages. 5. An expansion of `TranscriptProcessor` to process LLM thoughts in addition to user and assistant utterances.
…ersion where `thinking_level`—required for controlling Gemini 3 Pro thinking—is introduced
…er than associating a loose `thought_metadata` with each thought, use a `signature`. Thought signatures are the only "thought metadata" we use today.
…emini as much (Gemini found the original prompt a bit ambiguous, it seems)
…ext messages in the Gemini adapter
62f48c8 to
44aa117
Compare
…tantTranscriptProcessor` instead
…o control whether to handle thoughts in addition to assistant utterances. Defaults to `False`.
… `FunctionInProgressFrame.append_extra_context_messages`.
…`LLMMessagesAppendFrame`
|
Noticing some unexpected behavior now that we should account for:
UPDATE: done |
d1708f7 to
aa0529f
Compare
…arg for controlling which LLM to use. This change is preparation for adding these examples to our suite of evals.
| "content": "Say hello briefly.", | ||
| } | ||
| ) | ||
| # Here are some example example prompts conducive to demonstrating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
example example
| "content": "Say hello briefly.", | ||
| } | ||
| ) | ||
| # Here are some example example prompts conducive to demonstrating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
example example
|
LGTM! The only thing is all that logic needed for Gemini, but that's something that we can always update in the future if we find a better way. |
Summary
Adds additional functionality related to "thinking", for Google and Anthropic LLMs.
Thinking, sometimes called "extended thinking" or "reasoning", is an LLM process where the model takes some additional time before giving an answer. It's useful for complex tasks that may require some level of planning and structured, step-by-step reasoning. The model can output its thoughts (or thought summaries, depending on the model) in addition to the answer. The thoughts are usually pretty granular and not really suitable for being spoken out loud in a conversation, but can be useful for logging or prompt debugging.
See the CHANGELOG entry for a description of what's in this PR.
Remaining work
SystemFrames and thus being delivered to the assistant context aggregator before the preceding thought frames. Also confirmed that @aconchillo's work in this PR resolves the issue 👍Notes to reviewers
I'd recommend first running
uv run python examples/foundational/49-thinking-functions.py -t daily -d --llm google # or anthropicand watching the console output to get a sense of the new functionality.
I'm fairly confident that we're appropriately passing the thought signatures to Gemini (when using Gemini 2.5 series models) because without them thoughts stop after the first function call, but with them thoughts are appropriately interleaved between function calls.
Interleaved thinking/function calling is also working with Claude.