Skip to content

Conversation

@kompfner
Copy link
Contributor

@kompfner kompfner commented Dec 3, 2025

Summary

Adds additional functionality related to "thinking", for Google and Anthropic LLMs.

Thinking, sometimes called "extended thinking" or "reasoning", is an LLM process where the model takes some additional time before giving an answer. It's useful for complex tasks that may require some level of planning and structured, step-by-step reasoning. The model can output its thoughts (or thought summaries, depending on the model) in addition to the answer. The thoughts are usually pretty granular and not really suitable for being spoken out loud in a conversation, but can be useful for logging or prompt debugging.

See the CHANGELOG entry for a description of what's in this PR.

Remaining work

  • Add support for Gemini 3 Pro, which can return thought signatures in content parts other than function calls.
  • Investigate Anthropic "interleaved" (between-function-call) thought not getting written into context (or getting written into a too-late assistant message in the context)
    • Confirmed that this was due to function call frames being SystemFrames and thus being delivered to the assistant context aggregator before the preceding thought frames. Also confirmed that @aconchillo's work in this PR resolves the issue 👍
  • Fix commit history
  • Address any remaining code TODOs
  • Remove any lingering debug logging
  • Add CHANGELOG entry
  • Sanity-check different models
    • My testing has been focused on the default Pipecat Google and Anthropic models, Gemini 2.5 Flash and Claude Sonnet 4.5, respectively
    • Claude 3.7 specifically provides full thoughts instead of thought summaries, so that'll be interesting to test
  • Test thinking in a scenario with simultaneous function calls (Gemini will associate the thought signature with the first in a group)
    • Works! But notably this is pretty tricky to test—the model's behavior is nondeterministic, and it seems to prefer making function calls sequentially when it's asked to think and explain its thoughts
  • Support changing LLM thinking configuration "on the fly" during the course of a conversation, which is important for anyone looking to build something like a "think harder" toggle
    • There is a restriction (or maybe a best practice) around when such a change can take place—it should only be at the start of a conceptual turn (i.e. not in the middle of a chain of tool calls, even though the LLM is invoked with each tool call result).
    • In the interest of this PR not getting too involved and take too long, support for this could come in a follow-up PR.

Notes to reviewers

I'd recommend first running

uv run python examples/foundational/49-thinking-functions.py -t daily -d --llm google # or anthropic

and watching the console output to get a sense of the new functionality.

I'm fairly confident that we're appropriately passing the thought signatures to Gemini (when using Gemini 2.5 series models) because without them thoughts stop after the first function call, but with them thoughts are appropriately interleaved between function calls.

Interleaved thinking/function calling is also working with Claude.

@codecov
Copy link

codecov bot commented Dec 3, 2025

Codecov Report

❌ Patch coverage is 54.16667% with 110 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/pipecat/adapters/services/gemini_adapter.py 16.90% 59 Missing ⚠️
...t/processors/aggregators/llm_response_universal.py 45.94% 20 Missing ⚠️
src/pipecat/services/google/llm.py 28.00% 18 Missing ⚠️
src/pipecat/services/anthropic/llm.py 33.33% 10 Missing ⚠️
src/pipecat/frames/frames.py 94.44% 2 Missing ⚠️
src/pipecat/adapters/services/anthropic_adapter.py 80.00% 1 Missing ⚠️
Files with missing lines Coverage Δ
src/pipecat/processors/transcript_processor.py 95.04% <100.00%> (+1.93%) ⬆️
src/pipecat/services/llm_service.py 38.27% <100.00%> (+0.29%) ⬆️
src/pipecat/adapters/services/anthropic_adapter.py 67.12% <80.00%> (+0.22%) ⬆️
src/pipecat/frames/frames.py 87.00% <94.44%> (+0.62%) ⬆️
src/pipecat/services/anthropic/llm.py 31.64% <33.33%> (-0.03%) ⬇️
src/pipecat/services/google/llm.py 32.00% <28.00%> (-0.21%) ⬇️
...t/processors/aggregators/llm_response_universal.py 64.07% <45.94%> (-1.79%) ⬇️
src/pipecat/adapters/services/gemini_adapter.py 52.91% <16.90%> (-17.79%) ⬇️

... and 11 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

logger.info(f"Client disconnected")
await task.cancel()

@transcript.event_handler("on_transcript_update")
Copy link
Contributor Author

@kompfner kompfner Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure whether it made more sense to piggyback on the existing "on_transcript_update" event—which can now produce two kinds of messages, TranscriptionMessage and ThoughtTranscriptionMessage—or to introduce a new event like "on_thought_transcript_update" that treats thoughts conceptually as a side transcript and not the main transcript.

Thoughts? (Joke intended)

if part.thought:
# Gemini emits fully-formed thoughts rather
# than chunks so bracket each thought in
# start/end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that there may be multiple fully-formed thoughts before the assistant response, so by doing this (bracketing each thought in start/end) we may actually end up with a transcript (as generated by the TranscriptProcessor) looking like:

user: ...
thought: ...
thought: ...
thought: ...
assistant: ...

In my opinion, that's totally fine, and might even be helpful for debugging. But it does deviate from the pattern of one entry in the transcript per "speaker".

@kompfner kompfner force-pushed the pk/thinking-exploration branch from f6d7794 to 7837d4d Compare December 4, 2025 14:28
@kompfner
Copy link
Contributor Author

kompfner commented Dec 4, 2025

A thought. I'm considering punting on this task

  • Add support for Gemini 3 Pro, which can return thought signatures in content parts other than function calls.

to a follow-up PR to keep this one from ballooning too much. In my testing, Gemini 3 Pro doesn't seem to be particularly well-suited for real-time conversations due to high processing times, so it's probably OK if proper support is a fast follow.

UPDATE: ended up doing the task.

@kompfner kompfner force-pushed the pk/thinking-exploration branch 3 times, most recently from d70c330 to 576ed8b Compare December 4, 2025 21:02
@kompfner
Copy link
Contributor Author

kompfner commented Dec 4, 2025

Marking as ready for review. Mostly just CHANGELOG and some more testing work left, but don't foresee major changes. Some additional stuff can be punted to a follow-up PR, if need be.

@kompfner kompfner marked this pull request as ready for review December 4, 2025 22:11
@kompfner kompfner force-pushed the pk/thinking-exploration branch 2 times, most recently from 3293724 to 03925be Compare December 5, 2025 15:00
@kompfner kompfner changed the title Thinking exploration Additional functionality related to thinking, for Google and Anthropic LLMs. Dec 5, 2025
@kompfner kompfner force-pushed the pk/thinking-exploration branch from 79bd0a6 to 62f48c8 Compare December 5, 2025 17:03
@kompfner
Copy link
Contributor Author

kompfner commented Dec 5, 2025

From convo with @aconchillo:

  • Add role: Literal["assistant"] = "assistant" to ThoughtTranscriptionMessage
  • Get rid of TranscriptProcessor.thought(), moving logic into assistant
  • Add flag so that we can do TranscriptProcessor(process_thoughts=True) (default False)

UPDATE: done.

@kompfner
Copy link
Contributor Author

kompfner commented Dec 5, 2025

Cont'd

  • Make it so caller (Google LLM) fully constructs FunctionInProgressFrame.append_context_message with something like { "type": "fn_call_thought_signature", "signature": ... } rather than assistant aggregator wrapping its contents in a "tool_call_extra" and constructing its own LLM specific message

UPDATE: done

@kompfner
Copy link
Contributor Author

kompfner commented Dec 5, 2025

Cont'd:

  • Get rid of LLMThoughtSignatureFrame
  • Use LLMMessagesAppendFrame instead, with LLMSpecificMessageFrame containing something like { "type": "non_fn_call_thought_signature", "signature": ... }

UPDATE: done

…thropic LLMs.

Thinking, sometimes called "extended thinking" or "reasoning", is an LLM process where the model takes some additional time before giving an answer. It's useful for complex tasks that may require some level of planning and structured, step-by-step reasoning. The model can output its thoughts (or thought summaries, depending on the model) in addition to the answer. The thoughts are usually pretty granular and not really suitable for being spoken out loud in a conversation, but can be useful for logging or prompt debugging.

Here's what's added:

1. New typed input parameters for Google and Anthropic LLMs that control the models' thinking behavior (like how much thinking to do, and whether to output thoughts or thought summaries).
2. New frames for representing thoughts output by LLMs.
3. A generic mechanism for associating extra LLM-specific data with a function call in context, used specifically to support Google's function-call-related "thought signatures", which are necessary to ensure thinking continuity between function calls in a chain (where the model thinks, makes a function call, thinks some more, etc.)
4. A generic mechanism for recording LLM thoughts to context, used specifically to support Anthropic, whose thought signatures are expected to appear alongside the text of the thoughts within assistant context messages.
5. An expansion of `TranscriptProcessor` to process LLM thoughts in addition to user and assistant utterances.
…ersion where `thinking_level`—required for controlling Gemini 3 Pro thinking—is introduced
…er than associating a loose `thought_metadata` with each thought, use a `signature`. Thought signatures are the only "thought metadata" we use today.
…emini as much (Gemini found the original prompt a bit ambiguous, it seems)
@kompfner kompfner force-pushed the pk/thinking-exploration branch from 62f48c8 to 44aa117 Compare December 8, 2025 14:33
…o control whether to handle thoughts in addition to assistant utterances. Defaults to `False`.
… `FunctionInProgressFrame.append_extra_context_messages`.
@kompfner
Copy link
Contributor Author

kompfner commented Dec 8, 2025

Noticing some unexpected behavior now that we should account for:

  • I'm seeing some non-function thought signatures with Gemini 2.5, which the Google docs suggest aren't even a thing...they only seem to be appearing if there are function calls involved in the conversation.
  • Non-function thought signatures aren't necessarily appearing at the end of every assistant response...so we need to figure out how to only associate them with the relevant responses. A tad frustratingly, the not-every-response behavior is happening only for non-Gemini-3 (Gemini 2.5 Flash), where I wasn't even expecting these signatures.

UPDATE: done

@kompfner kompfner force-pushed the pk/thinking-exploration branch from d1708f7 to aa0529f Compare December 8, 2025 16:47
"content": "Say hello briefly.",
}
)
# Here are some example example prompts conducive to demonstrating
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example example

"content": "Say hello briefly.",
}
)
# Here are some example example prompts conducive to demonstrating
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example example

@aconchillo
Copy link
Contributor

LGTM! The only thing is all that logic needed for Gemini, but that's something that we can always update in the future if we find a better way.

@kompfner kompfner merged commit 1e98094 into main Dec 11, 2025
10 checks passed
@kompfner kompfner deleted the pk/thinking-exploration branch December 11, 2025 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants