-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Additional functionality related to thinking, for Google and Anthropic LLMs. #3175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 20 commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
217f03b
Add additional functionality related to "thinking", for Google and An…
kompfner 0cdf0c4
Bump Google GenAI library version to at least 1.51.0, as that's the v…
kompfner c8c6f42
Add support for Gemini 3 Pro non-function-call-related thought signat…
kompfner 15f5583
Simplify, at the expense of a bit of not-yet-needed flexibility: rath…
kompfner 747bd4f
Tweak the prompt of the thinking + functions example to not confuse G…
kompfner 4ea51ff
Slight refactor of handling thought-signature-containing special cont…
kompfner 49f1f7d
Added CHANGELOG entry describing new thinking-related functionality
kompfner 44aa117
Minor docstring update for accuracy
kompfner ef703e9
Get rid of `ThoughtTranscriptProcessor`, moving its logic into `Assis…
kompfner 8ccc2cb
Add unit tests for `ThoughtTranscriptProcessor`
kompfner 61674d7
Add `process_thought` constructor argument to `TranscriptProcessor` t…
kompfner 17203ba
Change `FunctionInProgressFrame.llm_specific_extra` to a more generic…
kompfner 7e92597
Remove `LLMThoughtSignatureFrame` in favor of using the more generic …
kompfner aa0529f
Update comments for accuracy
kompfner 1249ee3
Better handle Gemini non-function thought signatures
kompfner 229ff79
Better handle Gemini non-function thought signatures
kompfner c5ff5cc
Update CHANGELOG
kompfner 0e88ad6
Add `ThoughtTranscriptionMessage.role`, which is always `"assistant"`
kompfner 28248e9
Split up thinking examples so that there isn't an `llm` command-line …
kompfner 1297929
Add thinking examples to eval suite
kompfner ccdd6cd
Fix a couple of typos in comments
kompfner File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,41 @@ | ||
| - Added additional functionality related to "thinking", for Google and Anthropic | ||
| LLMs. | ||
|
|
||
| 1. New typed parameters for Google and Anthropic LLMs that control the | ||
| models' thinking behavior (like how much thinking to do, and whether to | ||
| output thoughts or thought summaries): | ||
| - `AnthropicLLMService.ThinkingConfig` | ||
| - `GoogleLLMService.ThinkingConfig` | ||
| 2. New frames for representing thoughts output by LLMs: | ||
| - `LLMThoughtStartFrame` | ||
| - `LLMThoughtTextFrame` | ||
| - `LLMThoughtEndFrame` | ||
| 3. A mechanism for appending arbitrary context messages after a function call | ||
| message, used specifically to support Google's function-call-related | ||
| "thought signatures", which are necessary to ensure thinking continuity | ||
| between function calls in a chain (where the model thinks, makes a function | ||
| call, thinks some more, etc.). See: | ||
| - `append_extra_context_messages` field in `FunctionInProgressFrame` and | ||
| helper types | ||
| - `GoogleLLMService` leveraging the new mechanism to add a Google-specific | ||
| `"fn_thought_signature"` message | ||
| - `LLMAssistantAggregator` handling of `append_extra_context_messages` | ||
| - `GeminiLLMAdapter` handling of `"fn_thought_signature"` messages | ||
| 4. A generic mechanism for recording LLM thoughts to context, used | ||
| specifically to support Anthropic, whose thought signatures are expected to | ||
| appear alongside the text of the thoughts within assistant context | ||
| messages. See: | ||
| - `LLMThoughtEndFrame.signature` | ||
| - `LLMAssistantAggregator` handling of the above field | ||
| - `AnthropicLLMAdapter` handling of `"thought"` context messages | ||
| 5. Google-specific logic for inserting non-function-call-related thought | ||
| signatures into the context, to help maintain thinking continuity in a | ||
| chain of LLM calls. See: | ||
| - `GoogleLLMService` sending `LLMMessagesAppendFrame`s to add LLM-specific | ||
| `"non_fn_thought_signature"` messages to context | ||
| - `GeminiLLMAdapter` handling of `"non_fn_thought_signature"` messages | ||
| 6. An expansion of `TranscriptProcessor` to process LLM thoughts in addition | ||
| to user and assistant utterances. See: | ||
| - `TranscriptProcessor(process_thoughts=True)` (defaults to `False`) | ||
| - `ThoughtTranscriptionMessage`, which is now also emitted with the | ||
| `"on_transcript_update"` event |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,161 @@ | ||
| # | ||
| # Copyright (c) 2024–2025, Daily | ||
| # | ||
| # SPDX-License-Identifier: BSD 2-Clause License | ||
| # | ||
|
|
||
| import os | ||
|
|
||
| from dotenv import load_dotenv | ||
| from loguru import logger | ||
|
|
||
| from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams | ||
| from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3 | ||
| from pipecat.audio.vad.silero import SileroVADAnalyzer | ||
| from pipecat.audio.vad.vad_analyzer import VADParams | ||
| from pipecat.frames.frames import LLMRunFrame, ThoughtTranscriptionMessage, TranscriptionMessage | ||
| from pipecat.pipeline.pipeline import Pipeline | ||
| from pipecat.pipeline.runner import PipelineRunner | ||
| from pipecat.pipeline.task import PipelineParams, PipelineTask | ||
| from pipecat.processors.aggregators.llm_context import LLMContext | ||
| from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair | ||
| from pipecat.processors.transcript_processor import TranscriptProcessor | ||
| from pipecat.runner.types import RunnerArguments | ||
| from pipecat.runner.utils import create_transport | ||
| from pipecat.services.anthropic.llm import AnthropicLLMService | ||
| from pipecat.services.cartesia.tts import CartesiaTTSService | ||
| from pipecat.services.deepgram.stt import DeepgramSTTService | ||
| from pipecat.transports.base_transport import BaseTransport, TransportParams | ||
| from pipecat.transports.daily.transport import DailyParams | ||
| from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams | ||
|
|
||
| load_dotenv(override=True) | ||
|
|
||
| # We store functions so objects (e.g. SileroVADAnalyzer) don't get | ||
| # instantiated. The function will be called when the desired transport gets | ||
| # selected. | ||
| transport_params = { | ||
| "daily": lambda: DailyParams( | ||
| audio_in_enabled=True, | ||
| audio_out_enabled=True, | ||
| vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)), | ||
| turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()), | ||
| ), | ||
| "twilio": lambda: FastAPIWebsocketParams( | ||
| audio_in_enabled=True, | ||
| audio_out_enabled=True, | ||
| vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)), | ||
| turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()), | ||
| ), | ||
| "webrtc": lambda: TransportParams( | ||
| audio_in_enabled=True, | ||
| audio_out_enabled=True, | ||
| vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)), | ||
| turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()), | ||
| ), | ||
| } | ||
|
|
||
|
|
||
| async def run_bot(transport: BaseTransport, runner_args: RunnerArguments): | ||
| logger.info(f"Starting bot") | ||
|
|
||
| stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) | ||
|
|
||
| tts = CartesiaTTSService( | ||
| api_key=os.getenv("CARTESIA_API_KEY"), | ||
| voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady | ||
| ) | ||
|
|
||
| llm = AnthropicLLMService( | ||
| api_key=os.getenv("ANTHROPIC_API_KEY"), | ||
| params=AnthropicLLMService.InputParams( | ||
| thinking=AnthropicLLMService.ThinkingConfig(type="enabled", budget_tokens=2048) | ||
| ), | ||
| ) | ||
|
|
||
| transcript = TranscriptProcessor(process_thoughts=True) | ||
|
|
||
| messages = [ | ||
| { | ||
| "role": "system", | ||
| "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.", | ||
| }, | ||
| ] | ||
|
|
||
| context = LLMContext(messages) | ||
| context_aggregator = LLMContextAggregatorPair(context) | ||
|
|
||
| pipeline = Pipeline( | ||
| [ | ||
| transport.input(), # Transport user input | ||
| stt, | ||
| transcript.user(), # User transcripts | ||
| context_aggregator.user(), # User responses | ||
| llm, # LLM | ||
| tts, # TTS | ||
| transport.output(), # Transport bot output | ||
| transcript.assistant(), # Assistant transcripts (including thoughts) | ||
| context_aggregator.assistant(), # Assistant spoken responses | ||
| ] | ||
| ) | ||
|
|
||
| task = PipelineTask( | ||
| pipeline, | ||
| params=PipelineParams( | ||
| enable_metrics=True, | ||
| enable_usage_metrics=True, | ||
| ), | ||
| idle_timeout_secs=runner_args.pipeline_idle_timeout_secs, | ||
| ) | ||
|
|
||
| @transport.event_handler("on_client_connected") | ||
| async def on_client_connected(transport, client): | ||
| logger.info(f"Client connected") | ||
| # Kick off the conversation. | ||
| messages.append( | ||
| { | ||
| "role": "user", | ||
| "content": "Say hello briefly.", | ||
| } | ||
| ) | ||
| # Here are some example example prompts conducive to demonstrating | ||
| # thinking (picked from Google and Anthropic docs). | ||
| # messages.append( | ||
| # { | ||
| # "role": "user", | ||
| # "content": "Analogize photosynthesis and growing up. Keep your answer concise.", | ||
| # # "content": "Compare and contrast electric cars and hybrid cars." | ||
| # # "content": "Are there an infinite number of prime numbers such that n mod 4 == 3?" | ||
| # } | ||
| # ) | ||
| await task.queue_frames([LLMRunFrame()]) | ||
|
|
||
| @transport.event_handler("on_client_disconnected") | ||
| async def on_client_disconnected(transport, client): | ||
| logger.info(f"Client disconnected") | ||
| await task.cancel() | ||
|
|
||
| # Register event handler for transcript updates | ||
| @transcript.event_handler("on_transcript_update") | ||
| async def on_transcript_update(processor, frame): | ||
| for msg in frame.messages: | ||
| if isinstance(msg, (ThoughtTranscriptionMessage, TranscriptionMessage)): | ||
| timestamp = f"[{msg.timestamp}] " if msg.timestamp else "" | ||
| role = "THOUGHT" if isinstance(msg, ThoughtTranscriptionMessage) else msg.role | ||
| logger.info(f"Transcript: {timestamp}{role}: {msg.content}") | ||
|
|
||
| runner = PipelineRunner(handle_sigint=runner_args.handle_sigint) | ||
|
|
||
| await runner.run(task) | ||
|
|
||
|
|
||
| async def bot(runner_args: RunnerArguments): | ||
| """Main bot entry point compatible with Pipecat Cloud.""" | ||
| transport = await create_transport(runner_args, transport_params) | ||
| await run_bot(transport, runner_args) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| from pipecat.runner.run import main | ||
|
|
||
| main() | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
example example