Additional functionality related to thinking, for Google and Anthropic LLMs. #3175

kompfner · 2025-12-03T14:33:14Z

Summary

Adds additional functionality related to "thinking", for Google and Anthropic LLMs.

Thinking, sometimes called "extended thinking" or "reasoning", is an LLM process where the model takes some additional time before giving an answer. It's useful for complex tasks that may require some level of planning and structured, step-by-step reasoning. The model can output its thoughts (or thought summaries, depending on the model) in addition to the answer. The thoughts are usually pretty granular and not really suitable for being spoken out loud in a conversation, but can be useful for logging or prompt debugging.

See the CHANGELOG entry for a description of what's in this PR.

Remaining work

Add support for Gemini 3 Pro, which can return thought signatures in content parts other than function calls.
Investigate Anthropic "interleaved" (between-function-call) thought not getting written into context (or getting written into a too-late assistant message in the context)
- Confirmed that this was due to function call frames being SystemFrames and thus being delivered to the assistant context aggregator before the preceding thought frames. Also confirmed that @aconchillo's work in this PR resolves the issue 👍
Fix commit history
Address any remaining code TODOs
Remove any lingering debug logging
Add CHANGELOG entry
Sanity-check different models
- My testing has been focused on the default Pipecat Google and Anthropic models, Gemini 2.5 Flash and Claude Sonnet 4.5, respectively
- Claude 3.7 specifically provides full thoughts instead of thought summaries, so that'll be interesting to test
Test thinking in a scenario with simultaneous function calls (Gemini will associate the thought signature with the first in a group)
- Works! But notably this is pretty tricky to test—the model's behavior is nondeterministic, and it seems to prefer making function calls sequentially when it's asked to think and explain its thoughts
Support changing LLM thinking configuration "on the fly" during the course of a conversation, which is important for anyone looking to build something like a "think harder" toggle
- There is a restriction (or maybe a best practice) around when such a change can take place—it should only be at the start of a conceptual turn (i.e. not in the middle of a chain of tool calls, even though the LLM is invoked with each tool call result).
- In the interest of this PR not getting too involved and take too long, support for this could come in a follow-up PR.

Notes to reviewers

I'd recommend first running

uv run python examples/foundational/49-thinking-functions.py -t daily -d --llm google # or anthropic

and watching the console output to get a sense of the new functionality.

I'm fairly confident that we're appropriately passing the thought signatures to Gemini (when using Gemini 2.5 series models) because without them thoughts stop after the first function call, but with them thoughts are appropriately interleaved between function calls.

Interleaved thinking/function calling is also working with Claude.

codecov · 2025-12-03T14:36:25Z

Codecov Report

❌ Patch coverage is 54.16667% with 110 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/pipecat/adapters/services/gemini_adapter.py	16.90%	59 Missing ⚠️
...t/processors/aggregators/llm_response_universal.py	45.94%	20 Missing ⚠️
src/pipecat/services/google/llm.py	28.00%	18 Missing ⚠️
src/pipecat/services/anthropic/llm.py	33.33%	10 Missing ⚠️
src/pipecat/frames/frames.py	94.44%	2 Missing ⚠️
src/pipecat/adapters/services/anthropic_adapter.py	80.00%	1 Missing ⚠️

Files with missing lines	Coverage Δ
src/pipecat/processors/transcript_processor.py	`95.04% <100.00%> (+1.93%)`	⬆️
src/pipecat/services/llm_service.py	`38.27% <100.00%> (+0.29%)`	⬆️
src/pipecat/adapters/services/anthropic_adapter.py	`67.12% <80.00%> (+0.22%)`	⬆️
src/pipecat/frames/frames.py	`87.00% <94.44%> (+0.62%)`	⬆️
src/pipecat/services/anthropic/llm.py	`31.64% <33.33%> (-0.03%)`	⬇️
src/pipecat/services/google/llm.py	`32.00% <28.00%> (-0.21%)`	⬇️
...t/processors/aggregators/llm_response_universal.py	`64.07% <45.94%> (-1.79%)`	⬇️
src/pipecat/adapters/services/gemini_adapter.py	`52.91% <16.90%> (-17.79%)`	⬇️

... and 11 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

kompfner · 2025-12-03T17:22:59Z

examples/foundational/49c-thinking-functions-anthropic.py

+        logger.info(f"Client disconnected")
+        await task.cancel()
+
+    @transcript.event_handler("on_transcript_update")


I wasn't sure whether it made more sense to piggyback on the existing "on_transcript_update" event—which can now produce two kinds of messages, TranscriptionMessage and ThoughtTranscriptionMessage—or to introduce a new event like "on_thought_transcript_update" that treats thoughts conceptually as a side transcript and not the main transcript.

Thoughts? (Joke intended)

kompfner · 2025-12-04T14:27:46Z

src/pipecat/services/google/llm.py

+                                if part.thought:
+                                    # Gemini emits fully-formed thoughts rather
+                                    # than chunks so bracket each thought in
+                                    # start/end


Note that there may be multiple fully-formed thoughts before the assistant response, so by doing this (bracketing each thought in start/end) we may actually end up with a transcript (as generated by the TranscriptProcessor) looking like:

user: ... thought: ... thought: ... thought: ... assistant: ...

In my opinion, that's totally fine, and might even be helpful for debugging. But it does deviate from the pattern of one entry in the transcript per "speaker".

kompfner · 2025-12-04T15:05:50Z

A thought. I'm considering punting on this task

Add support for Gemini 3 Pro, which can return thought signatures in content parts other than function calls.

to a follow-up PR to keep this one from ballooning too much. In my testing, Gemini 3 Pro doesn't seem to be particularly well-suited for real-time conversations due to high processing times, so it's probably OK if proper support is a fast follow.

UPDATE: ended up doing the task.

kompfner · 2025-12-04T22:11:55Z

Marking as ready for review. Mostly just CHANGELOG and some more testing work left, but don't foresee major changes. Some additional stuff can be punted to a follow-up PR, if need be.

kompfner · 2025-12-05T19:06:30Z

From convo with @aconchillo:

Add role: Literal["assistant"] = "assistant" to ThoughtTranscriptionMessage
Get rid of TranscriptProcessor.thought(), moving logic into assistant
Add flag so that we can do TranscriptProcessor(process_thoughts=True) (default False)

UPDATE: done.

kompfner · 2025-12-05T19:17:54Z

Cont'd

Make it so caller (Google LLM) fully constructs FunctionInProgressFrame.append_context_message with something like { "type": "fn_call_thought_signature", "signature": ... } rather than assistant aggregator wrapping its contents in a "tool_call_extra" and constructing its own LLM specific message

UPDATE: done

kompfner · 2025-12-05T19:31:23Z

Cont'd:

Get rid of LLMThoughtSignatureFrame
Use LLMMessagesAppendFrame instead, with LLMSpecificMessageFrame containing something like { "type": "non_fn_call_thought_signature", "signature": ... }

UPDATE: done

…thropic LLMs. Thinking, sometimes called "extended thinking" or "reasoning", is an LLM process where the model takes some additional time before giving an answer. It's useful for complex tasks that may require some level of planning and structured, step-by-step reasoning. The model can output its thoughts (or thought summaries, depending on the model) in addition to the answer. The thoughts are usually pretty granular and not really suitable for being spoken out loud in a conversation, but can be useful for logging or prompt debugging. Here's what's added: 1. New typed input parameters for Google and Anthropic LLMs that control the models' thinking behavior (like how much thinking to do, and whether to output thoughts or thought summaries). 2. New frames for representing thoughts output by LLMs. 3. A generic mechanism for associating extra LLM-specific data with a function call in context, used specifically to support Google's function-call-related "thought signatures", which are necessary to ensure thinking continuity between function calls in a chain (where the model thinks, makes a function call, thinks some more, etc.) 4. A generic mechanism for recording LLM thoughts to context, used specifically to support Anthropic, whose thought signatures are expected to appear alongside the text of the thoughts within assistant context messages. 5. An expansion of `TranscriptProcessor` to process LLM thoughts in addition to user and assistant utterances.

…ersion where `thinking_level`—required for controlling Gemini 3 Pro thinking—is introduced

…ures

…er than associating a loose `thought_metadata` with each thought, use a `signature`. Thought signatures are the only "thought metadata" we use today.

…emini as much (Gemini found the original prompt a bit ambiguous, it seems)

…ext messages in the Gemini adapter

…tantTranscriptProcessor` instead

…o control whether to handle thoughts in addition to assistant utterances. Defaults to `False`.

… `FunctionInProgressFrame.append_extra_context_messages`.

…`LLMMessagesAppendFrame`

kompfner · 2025-12-08T16:22:39Z

Noticing some unexpected behavior now that we should account for:

I'm seeing some non-function thought signatures with Gemini 2.5, which the Google docs suggest aren't even a thing...they only seem to be appearing if there are function calls involved in the conversation.
Non-function thought signatures aren't necessarily appearing at the end of every assistant response...so we need to figure out how to only associate them with the relevant responses. A tad frustratingly, the not-every-response behavior is happening only for non-Gemini-3 (Gemini 2.5 Flash), where I wasn't even expecting these signatures.

UPDATE: done

…arg for controlling which LLM to use. This change is preparation for adding these examples to our suite of evals.

aconchillo · 2025-12-11T22:01:38Z

examples/foundational/49a-thinking-anthropic.py

+                "content": "Say hello briefly.",
+            }
+        )
+        # Here are some example example prompts conducive to demonstrating


example example

aconchillo · 2025-12-11T22:02:04Z

examples/foundational/49b-thinking-google.py

+                "content": "Say hello briefly.",
+            }
+        )
+        # Here are some example example prompts conducive to demonstrating


example example

aconchillo · 2025-12-11T22:09:07Z

LGTM! The only thing is all that logic needed for Gemini, but that's something that we can always update in the future if we find a better way.

kompfner commented Dec 3, 2025

View reviewed changes

kompfner requested review from aconchillo, filipi87, kwindla, markbackman and mattieruth December 3, 2025 20:19

kompfner commented Dec 4, 2025

View reviewed changes

kompfner force-pushed the pk/thinking-exploration branch from f6d7794 to 7837d4d Compare December 4, 2025 14:28

kompfner force-pushed the pk/thinking-exploration branch 3 times, most recently from d70c330 to 576ed8b Compare December 4, 2025 21:02

kompfner marked this pull request as ready for review December 4, 2025 22:11

kompfner force-pushed the pk/thinking-exploration branch 2 times, most recently from 3293724 to 03925be Compare December 5, 2025 15:00

kompfner changed the title ~~Thinking exploration~~ Additional functionality related to thinking, for Google and Anthropic LLMs. Dec 5, 2025

kompfner mentioned this pull request Dec 5, 2025

introduce uninterruptible frames #3189

Merged

kompfner force-pushed the pk/thinking-exploration branch from 79bd0a6 to 62f48c8 Compare December 5, 2025 17:03

kompfner added 7 commits December 8, 2025 09:29

Bump Google GenAI library version to at least 1.51.0, as that's the v…

0cdf0c4

…ersion where `thinking_level`—required for controlling Gemini 3 Pro thinking—is introduced

Add support for Gemini 3 Pro non-function-call-related thought signat…

c8c6f42

…ures

Simplify, at the expense of a bit of not-yet-needed flexibility: rath…

15f5583

…er than associating a loose `thought_metadata` with each thought, use a `signature`. Thought signatures are the only "thought metadata" we use today.

Tweak the prompt of the thinking + functions example to not confuse G…

747bd4f

…emini as much (Gemini found the original prompt a bit ambiguous, it seems)

Slight refactor of handling thought-signature-containing special cont…

4ea51ff

…ext messages in the Gemini adapter

Added CHANGELOG entry describing new thinking-related functionality

49f1f7d

Minor docstring update for accuracy

44aa117

kompfner force-pushed the pk/thinking-exploration branch from 62f48c8 to 44aa117 Compare December 8, 2025 14:33

kompfner added 5 commits December 8, 2025 09:59

Get rid of ThoughtTranscriptProcessor, moving its logic into `Assis…

ef703e9

…tantTranscriptProcessor` instead

Add unit tests for ThoughtTranscriptProcessor

8ccc2cb

Add process_thought constructor argument to TranscriptProcessor t…

61674d7

…o control whether to handle thoughts in addition to assistant utterances. Defaults to `False`.

Change FunctionInProgressFrame.llm_specific_extra to a more generic…

17203ba

… `FunctionInProgressFrame.append_extra_context_messages`.

Remove LLMThoughtSignatureFrame in favor of using the more generic …

7e92597

…`LLMMessagesAppendFrame`

Update comments for accuracy

aa0529f

kompfner force-pushed the pk/thinking-exploration branch from d1708f7 to aa0529f Compare December 8, 2025 16:47

kompfner added 6 commits December 8, 2025 13:07

Better handle Gemini non-function thought signatures

1249ee3

Better handle Gemini non-function thought signatures

229ff79

Update CHANGELOG

c5ff5cc

Add ThoughtTranscriptionMessage.role, which is always "assistant"

0e88ad6

Split up thinking examples so that there isn't an llm command-line …

28248e9

…arg for controlling which LLM to use. This change is preparation for adding these examples to our suite of evals.

Add thinking examples to eval suite

1297929

aconchillo reviewed Dec 11, 2025

View reviewed changes

Fix a couple of typos in comments

ccdd6cd

aconchillo approved these changes Dec 11, 2025

View reviewed changes

kompfner merged commit 1e98094 into main Dec 11, 2025
10 checks passed

kompfner deleted the pk/thinking-exploration branch December 11, 2025 22:15

aconchillo mentioned this pull request Dec 11, 2025

[EXPERIMENT] claude thinking model support #3167

Closed

Additional functionality related to thinking, for Google and Anthropic LLMs. #3175

Additional functionality related to thinking, for Google and Anthropic LLMs. #3175

Uh oh!

Conversation

kompfner commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Remaining work

Notes to reviewers

Uh oh!

codecov bot commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kompfner Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kompfner Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

kompfner commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kompfner commented Dec 4, 2025

Uh oh!

kompfner commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kompfner commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kompfner commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kompfner commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aconchillo Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

aconchillo Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

aconchillo commented Dec 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kompfner commented Dec 3, 2025 •

edited

Loading

codecov bot commented Dec 3, 2025 •

edited

Loading

kompfner Dec 3, 2025 •

edited

Loading

kompfner commented Dec 4, 2025 •

edited

Loading

kompfner commented Dec 5, 2025 •

edited

Loading

kompfner commented Dec 5, 2025 •

edited

Loading

kompfner commented Dec 5, 2025 •

edited

Loading

kompfner commented Dec 8, 2025 •

edited

Loading