Skip to content

feat: support for agent v1 #520

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

feat: support for agent v1 #520

wants to merge 5 commits into from

Conversation

naomi-lgbt
Copy link
Contributor

@naomi-lgbt naomi-lgbt commented Apr 28, 2025

Proposed changes

Types of changes

What types of changes does your code introduce to the community Python SDK?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update or tests (if none of the other choices apply)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • I have read the CONTRIBUTING doc
  • I have lint'ed all of my code using repo standards
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

Summary by CodeRabbit

  • New Features
    • Added a new example script demonstrating how to use the voice agent without microphone input, including audio streaming and event handling.
  • Improvements
    • Enhanced agent configuration options with more detailed settings for listening, thinking, speaking, language preferences, and greetings.
    • Expanded support for custom endpoints, provider models, and voice/language configurations in agent setup.
  • Bug Fixes
    • Updated event and option names for clarity and consistency across the agent interface.
  • Documentation
    • Added detailed instructions for building and installing the package locally.
  • Chores
    • Updated .gitignore to exclude generated audio and chat log files.
    • Added the requests library to example dependencies.

Copy link
Contributor

coderabbitai bot commented Apr 28, 2025

Walkthrough

This update introduces a comprehensive refactor and enhancement of the agent-related configuration and client code within the Deepgram SDK. Key changes include the renaming and restructuring of configuration classes (e.g., SettingsConfigurationOptions to SettingsOptions, UpdateInstructionsOptions to UpdatePromptOptions), the introduction of new provider abstractions (ListenProvider, SpeakProvider, ThinkProvider), and the addition of an Endpoint class for custom HTTP endpoints. Enum members and event names are updated to match the new terminology. Example scripts are revised to use the new configuration schema, and a new example demonstrates agent usage without a microphone. Documentation and ignore rules are also updated.

Changes

Files/Paths Change Summary
.github/CONTRIBUTING.md Added a "Building Locally" section with step-by-step instructions for building and installing the package using pipenv, including running an example script.
.gitignore Added ignore patterns for chatlog.txt and output_*.wav files generated by examples.
examples/requirements-examples.txt Added requests package to the example requirements.
deepgram/__init__.py, deepgram/client.py, deepgram/clients/__init__.py,
deepgram/clients/agent/__init__.py, deepgram/clients/agent/v1/__init__.py,
deepgram/clients/agent/v1/websocket/__init__.py
Refactored imports to replace SettingsConfigurationOptions/UpdateInstructionsOptions with SettingsOptions/UpdatePromptOptions, removed Provider/Context, and added new provider classes and Endpoint.
deepgram/clients/agent/enums.py Renamed enum members: SettingsConfigurationSettings, UpdateInstructionsUpdatePrompt in AgentWebSocketEvents.
deepgram/clients/agent/v1/websocket/options.py Major refactor: replaced/renamed configuration dataclasses (SettingsConfigurationOptionsSettingsOptions, etc.), introduced new provider classes (ListenProvider, SpeakProvider, ThinkProvider), added Endpoint and CartesiaVoice, restructured Listen, Speak, Think, Agent, and Language classes, and updated deserialization logic.
deepgram/clients/agent/v1/websocket/async_client.py,
deepgram/clients/agent/v1/websocket/client.py
Updated all type annotations, method signatures, and settings handling to use SettingsOptions instead of SettingsConfigurationOptions, changed endpoint to "v1/agent/converse", updated nested attribute access for provider settings, and revised logging/error messages to new terminology.
deepgram/clients/agent/client.py Updated import aliases to match new provider and options class names.
examples/agent/simple/main.py, examples/agent/async_simple/main.py Refactored example scripts to use SettingsOptions and updated configuration structure to match new schema (e.g., nested provider, renamed instructions to prompt, added greeting, language, etc.).
examples/agent/no_mic/main.py Added a new example script demonstrating agent usage without microphone input, including detailed agent configuration, event handling, audio file streaming, and chat logging.

Sequence Diagram(s)

sequenceDiagram
    participant UserScript
    participant DeepgramClient
    participant AgentWebSocketClient
    participant AgentProvider

    UserScript->>DeepgramClient: Initialize with API key and options
    UserScript->>AgentWebSocketClient: Connect with SettingsOptions
    AgentWebSocketClient->>AgentProvider: Send Settings (with nested provider config)
    AgentProvider-->>AgentWebSocketClient: Acknowledge settings
    UserScript->>AgentWebSocketClient: Register event handlers
    UserScript->>AgentWebSocketClient: Stream audio data (if applicable)
    AgentWebSocketClient->>AgentProvider: Transmit audio/data/events
    AgentProvider-->>AgentWebSocketClient: Emit events (audio, text, state)
    AgentWebSocketClient-->>UserScript: Invoke registered handlers with event data
    UserScript->>AgentWebSocketClient: Close connection
Loading

Possibly related PRs

  • deepgram/deepgram-python-sdk#497: Introduced and organized the initial Agent API, including the classes and structures that are being renamed and refactored in this PR.
  • deepgram/deepgram-python-sdk#502: Updated agent websocket options and related classes for the nova-3 model, modifying the same configuration classes and structures as this PR.
  • deepgram/deepgram-python-sdk#468: Introduced a shared WebSocket client abstraction and refactored agent websocket clients, overlapping with the client structure changes in this update.

Suggested reviewers

  • naomi-lgbt
  • SandraRodgers

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 37fa347 and e9246c2.

📒 Files selected for processing (1)
  • examples/agent/no_mic/main.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • examples/agent/no_mic/main.py
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Comment on lines +56 to +65
```bash
# Install deps
pipenv install
# Build package
pipenv run python3 -m build
# Install package from local build
pipenv install ./dist/deepgram_sdk-0.0.0.tar.gz
# Try an example!
DEEPGRAM_API_KEY=<key> pipenv run python3 examples/agent/async_simple/main.py
```
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added this in this PR because I don't have my command history on the new laptop, and had to do quite a bit of digging to figure out how to make this work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me. I use PipEnv and always need to check my notes on how to run it.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🔭 Outside diff range comments (1)
deepgram/clients/agent/v1/websocket/client.py (1)

181-189: 🛠️ Refactor suggestion

start() rejects None for options, breaking previous behaviour

options is declared as Optional[SettingsOptions], but the subsequent isinstance / elif chain ends with an unconditional else: raise DeepgramError("Invalid options type").
Passing None – which used to be valid for callers that wanted to send the Settings message later – will now raise an exception and make the call site fail at runtime.

If None is no longer acceptable, the type hint should be changed to SettingsOptions (non-optional) and the docstring updated.
If None should stay legal, add an early guard and skip the type-based conversion instead of raising.

-        options: Optional[SettingsOptions] = None,
+        # `None` means: “I will send the Settings message manually later”.
+        options: Optional[SettingsOptions] = None,
...
-        else:
-            raise DeepgramError("Invalid options type")
+        elif options is None:
+            self._logger.notice("No SettingsOptions supplied – caller will send later")
+            self._settings = None
+        else:
+            raise DeepgramError(
+                f"Invalid options type {type(options).__name__}; "
+                "expected SettingsOptions | dict | str | None"
+            )

Also applies to: 199-204

🧹 Nitpick comments (6)
deepgram/clients/__init__.py (1)

367-382: Provider model refactored with granular provider types

The agent configuration has been refactored from a general-purpose Provider class to specific provider types (ListenProvider, SpeakProvider, ThinkProvider) and a new Endpoint abstraction.

This change provides better separation of concerns and more explicit configuration options for each capability of the agent. The granular provider model allows for more targeted configuration and clearer API boundaries.

deepgram/clients/agent/v1/websocket/client.py (2)

229-231: Safety check can dereference None when model is missing

self._settings.agent.listen.provider is guaranteed by defaults, but model
may still be None if the caller intentionally leaves it unset (e.g. to let
the server decide).
The current condition:

if self._settings.agent.listen.provider.keyterms is not None and \
   self._settings.agent.listen.provider.model is not None and \
   not self._settings.agent.listen.provider.model.startswith("nova-3"):

will raise AttributeError if provider itself is None (possible if a
malformed payload sneaks through from_dict). A defensive guard eliminates the
risk.

-if self._settings.agent.listen.provider.keyterms is not None and \
-   self._settings.agent.listen.provider.model is not None and \
-   not self._settings.agent.listen.provider.model.startswith("nova-3"):
+listen_provider = getattr(self._settings.agent.listen, "provider", None)
+if (
+    listen_provider
+    and listen_provider.keyterms
+    and listen_provider.model
+    and not listen_provider.model.startswith("nova-3")
+):

191-197: Consider redacting sensitive data in INFO logs

self._logger.info("settings: %s", options) prints the entire Settings object,
which may contain organisation-specific prompts, model identifiers or future
secret fields. Moving this to DEBUG, or implementing a redaction helper (e.g.
masking long strings), protects users who run the SDK with verbose logging in
production.

deepgram/clients/agent/v1/websocket/options.py (3)

215-223: SpeakProvider.__getitem__ assumes lists for voice / language, conflicting with schema

voice is a singular CartesiaVoice instance, and language /
language_code are strings, but the getter transforms them into lists. This
breaks symmetry (SpeakProvider.from_dict(...).voice returns list, not
CartesiaVoice) and surprises downstream consumers.

-        if "voice" in _dict:
-            _dict["voice"] = [
-                CartesiaVoice.from_dict(voice) for voice in _dict["voice"]
-            ]
-        if "language" in _dict:
-            _dict["language"] = [str(language) for language in _dict["language"]]
+        if "voice" in _dict and isinstance(_dict["voice"], dict):
+            _dict["voice"] = CartesiaVoice.from_dict(_dict["voice"])

(The same pattern occurs in Think, Listen, and Speak classes and could be
refactored via a helper.)


78-81: Endpoint.method is Optional[str] but default is "POST"

Either drop Optional (preferred) or make the default None. Staying
consistent avoids accidental None assignment later.


364-370: Language dataclass shadows the built-in type keyword

Although legal, using the attribute name type invites confusion with
Python’s built-in. Consider renaming to code or lang for clarity.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e48e415 and 7ad48ea.

📒 Files selected for processing (14)
  • .github/CONTRIBUTING.md (1 hunks)
  • deepgram/__init__.py (1 hunks)
  • deepgram/client.py (1 hunks)
  • deepgram/clients/__init__.py (1 hunks)
  • deepgram/clients/agent/__init__.py (1 hunks)
  • deepgram/clients/agent/client.py (2 hunks)
  • deepgram/clients/agent/enums.py (1 hunks)
  • deepgram/clients/agent/v1/__init__.py (1 hunks)
  • deepgram/clients/agent/v1/websocket/__init__.py (1 hunks)
  • deepgram/clients/agent/v1/websocket/async_client.py (7 hunks)
  • deepgram/clients/agent/v1/websocket/client.py (7 hunks)
  • deepgram/clients/agent/v1/websocket/options.py (7 hunks)
  • examples/agent/async_simple/main.py (2 hunks)
  • examples/agent/simple/main.py (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
deepgram/clients/agent/v1/websocket/__init__.py (2)
Learnt from: dvonthenen
PR: deepgram/deepgram-python-sdk#426
File: deepgram/clients/listen/v1/websocket/__init__.py:8-8
Timestamp: 2024-07-01T19:21:39.778Z
Learning: Unused imports in `deepgram/clients/listen/v1/websocket/__init__.py` are retained to maintain backward compatibility and should not be flagged for removal in reviews.
Learnt from: dvonthenen
PR: deepgram/deepgram-python-sdk#426
File: deepgram/clients/listen/v1/websocket/__init__.py:8-8
Timestamp: 2024-10-09T02:19:46.086Z
Learning: Unused imports in `deepgram/clients/listen/v1/websocket/__init__.py` are retained to maintain backward compatibility and should not be flagged for removal in reviews.
🧬 Code Graph Analysis (7)
deepgram/client.py (1)
deepgram/clients/agent/v1/websocket/options.py (13)
  • SettingsOptions (373-404)
  • UpdatePromptOptions (411-417)
  • Listen (257-270)
  • ListenProvider (145-160)
  • Speak (274-294)
  • SpeakProvider (177-223)
  • Header (21-27)
  • Function (94-127)
  • Think (227-253)
  • ThinkProvider (164-173)
  • Agent (298-318)
  • Audio (346-360)
  • Endpoint (73-90)
deepgram/clients/agent/v1/websocket/__init__.py (1)
deepgram/clients/agent/v1/websocket/options.py (13)
  • SettingsOptions (373-404)
  • UpdatePromptOptions (411-417)
  • Listen (257-270)
  • ListenProvider (145-160)
  • Speak (274-294)
  • SpeakProvider (177-223)
  • Header (21-27)
  • Function (94-127)
  • Think (227-253)
  • ThinkProvider (164-173)
  • Agent (298-318)
  • Audio (346-360)
  • Endpoint (73-90)
deepgram/__init__.py (1)
deepgram/clients/agent/v1/websocket/options.py (22)
  • SettingsOptions (373-404)
  • UpdatePromptOptions (411-417)
  • UpdateSpeakOptions (424-430)
  • InjectAgentMessageOptions (437-443)
  • FunctionCallResponse (450-457)
  • AgentKeepAlive (464-469)
  • Listen (257-270)
  • ListenProvider (145-160)
  • Speak (274-294)
  • SpeakProvider (177-223)
  • Header (21-27)
  • Item (31-37)
  • Properties (41-52)
  • Parameters (56-69)
  • Function (94-127)
  • Think (227-253)
  • ThinkProvider (164-173)
  • Agent (298-318)
  • Input (322-328)
  • Output (332-342)
  • Audio (346-360)
  • Endpoint (73-90)
deepgram/clients/agent/v1/websocket/async_client.py (3)
deepgram/clients/agent/v1/websocket/options.py (3)
  • SettingsOptions (373-404)
  • UpdatePromptOptions (411-417)
  • check (391-404)
deepgram/utils/verboselogs/__init__.py (1)
  • notice (150-153)
deepgram/clients/common/v1/websocket_response.py (1)
  • ErrorResponse (41-51)
deepgram/clients/agent/v1/__init__.py (1)
deepgram/clients/agent/v1/websocket/options.py (13)
  • SettingsOptions (373-404)
  • UpdatePromptOptions (411-417)
  • Listen (257-270)
  • ListenProvider (145-160)
  • Speak (274-294)
  • SpeakProvider (177-223)
  • Header (21-27)
  • Function (94-127)
  • Think (227-253)
  • ThinkProvider (164-173)
  • Agent (298-318)
  • Audio (346-360)
  • Endpoint (73-90)
deepgram/clients/agent/v1/websocket/client.py (3)
deepgram/clients/agent/v1/websocket/options.py (3)
  • SettingsOptions (373-404)
  • UpdatePromptOptions (411-417)
  • check (391-404)
deepgram/utils/verboselogs/__init__.py (1)
  • notice (150-153)
deepgram/clients/common/v1/websocket_response.py (1)
  • ErrorResponse (41-51)
deepgram/clients/agent/v1/websocket/options.py (2)
deepgram/clients/common/v1/shared_response.py (1)
  • BaseResponse (16-44)
deepgram/clients/agent/enums.py (1)
  • AgentWebSocketEvents (10-37)
🪛 GitHub Actions: Check - static
deepgram/clients/agent/v1/websocket/options.py

[error] 136-136: mypy: Incompatible types in assignment (expression has type "None", variable has type "str") [assignment]


[error] 139-139: mypy: Incompatible types in assignment (expression has type "None", variable has type "str") [assignment]

🔇 Additional comments (24)
.github/CONTRIBUTING.md (1)

51-65: Great addition of building documentation!

The new "Building Locally" section provides clear step-by-step instructions for contributors to build and test the package. This is particularly valuable given the extensive refactoring of agent-related classes and options introduced in this PR.

deepgram/clients/agent/enums.py (1)

32-33: Renamed WebSocket events for consistency with option classes

These enum value renamings align with the broader restructuring of agent configuration classes in this PR:

  • SettingsConfigurationSettings (matches SettingsOptions)
  • UpdateInstructionsUpdatePrompt (matches UpdatePromptOptions)

This is a breaking change as noted in the PR description, but improves API consistency.

deepgram/client.py (1)

350-352: Updated imports to match agent v1 provider architecture

These import changes reflect the significant restructuring of the agent configuration model:

  1. Renamed options classes:

    • SettingsConfigurationOptionsSettingsOptions
    • UpdateInstructionsOptionsUpdatePromptOptions
  2. Replaced generic Provider and Context with specialized provider classes:

    • ListenProvider, SpeakProvider, ThinkProvider
    • Added Endpoint abstraction

The new structure enables more granular configuration of agent capabilities through specialized provider classes.

Also applies to: 358-360, 367-368, 372-373

deepgram/clients/agent/v1/websocket/__init__.py (1)

29-30: Updated imports to expose new agent v1 classes

These import changes are consistent with the broader refactoring in this PR and expose the restructured agent configuration classes to users of this namespace:

  1. Renamed option classes:

    • SettingsConfigurationOptionsSettingsOptions
    • UpdateInstructionsOptionsUpdatePromptOptions
  2. Added specialized provider classes:

    • ListenProvider, SpeakProvider, ThinkProvider
    • Endpoint abstraction

These changes maintain consistency with the new agent v1 architecture throughout the codebase.

Also applies to: 37-39, 46-47, 51-52

deepgram/clients/agent/__init__.py (3)

34-36: Class renames align with restructured agent configuration model

The renaming from SettingsConfigurationOptions to SettingsOptions and UpdateInstructionsOptions to UpdatePromptOptions represents part of a broader restructuring of the agent configuration API.


42-45: Provider-specific classes replace generic Provider

The introduction of specialized provider classes (ListenProvider, SpeakProvider, ThinkProvider) replaces the previous generic Provider class, allowing for more granular configuration options specific to each agent capability.

Also applies to: 51-52


56-57: Added Endpoint abstraction supports custom endpoint configuration

The new Endpoint class provides a formal way to configure custom endpoints for agent interactions.

deepgram/clients/agent/client.py (2)

33-36: Updates to import aliases maintain API consistency

The import aliases from .v1 have been updated to reflect the renamed classes and new provider types, ensuring consistency with the broader API restructuring.

Also applies to: 41-44, 50-51, 55-56


82-84: Export aliases updated to match restructured imports

The export aliases at the bottom of the file have been consistently updated to match the restructured imports from the .v1 module.

Also applies to: 90-92, 99-100, 104-104

examples/agent/async_simple/main.py (2)

14-15: Updated import to use new SettingsOptions class

The import statement has been updated to use the new SettingsOptions class, which replaces SettingsConfigurationOptions.


113-121: Example updated to use new agent configuration structure

The example demonstrates the new provider-based configuration structure:

  1. Uses nested provider properties to access model configuration
  2. Renames instructions to prompt to match the new API terminology
  3. Adds new configurations like greeting, listen.provider.keyterms, and top-level language

These changes showcase how to correctly use the new agent configuration model.

deepgram/__init__.py (3)

336-338: Updated top-level exports for agent configuration options

The exports have been updated to use the new class names SettingsOptions and UpdatePromptOptions, ensuring consistency with the new API structure.


344-347: Added provider-specific classes and endpoint abstraction to exports

The specialized provider classes (ListenProvider, SpeakProvider, ThinkProvider) and the Endpoint abstraction have been added to the exports, making them available through the top-level package.

Also applies to: 353-354, 358-359


334-359: Consider adding migration documentation for this breaking change

While the code changes are consistent and complete, the introduction of provider-based nesting and renamed fields constitutes a breaking change that will require client code updates.

Consider adding migration guides or documentation that explains how to update from the previous API to the new structure, especially highlighting the path changes (e.g., think.modelthink.provider.model) and renamed fields (e.g., instructionsprompt).

examples/agent/simple/main.py (1)

11-11: Import updated to use new SettingsOptions class

The agent configuration class has been renamed from SettingsConfigurationOptions to SettingsOptions.

deepgram/clients/agent/v1/__init__.py (2)

38-40: Core settings classes renamed

The agent settings classes have been renamed for clarity and consistency:

  • SettingsConfigurationOptionsSettingsOptions
  • UpdateInstructionsOptionsUpdatePromptOptions

46-61: Provider model refactored with explicit provider types

The agent architecture has been refactored to use specific provider types instead of a general-purpose Provider:

  • ListenProvider: Manages speech recognition configuration
  • SpeakProvider: Manages speech synthesis configuration
  • ThinkProvider: Manages language model configuration
  • Endpoint: New abstraction for custom API integrations

This refactoring improves code organization and API clarity while allowing for more granular configuration of each agent capability.

deepgram/clients/agent/v1/websocket/async_client.py (7)

34-40: Import statements updated to reflect renamed classes

Imports have been updated to use the new class names:

  • SettingsOptions replaces SettingsConfigurationOptions
  • UpdatePromptOptions replaces UpdateInstructionsOptions

82-82: Class attribute type updated

The _settings attribute type has been updated to SettingsOptions.


182-182: Method signature updated

The start method parameter type has been updated to use SettingsOptions.


198-201: Settings validation using updated class type

The instance check and validation now use SettingsOptions.


216-225: Settings object instantiation updated

The code for instantiating settings objects from different input types has been updated to use SettingsOptions.


228-229: Validation logic updated for nested provider structure

The validation logic for keyterms has been updated to check through the nested provider structure:

  • Now validates using self._settings.agent.listen.provider.keyterms and self._settings.agent.listen.provider.model

This ensures that keyterms validation works with the new nested configuration structure.


280-288: Log message and error message references updated

References to "ConfigurationSettings" have been changed to "Settings" in log messages and error descriptions.

This maintains consistency with the class renaming from SettingsConfigurationOptions to SettingsOptions.

Comment on lines +359 to 361
SettingsOptions,
UpdatePromptOptions,
UpdateSpeakOptions,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Breaking change: Core settings classes renamed

The renaming of SettingsConfigurationOptions to SettingsOptions and UpdateInstructionsOptions to UpdatePromptOptions represents a breaking change for client code. These core classes are used throughout the agent API for configuration.

When upgrading, clients will need to update imports and instantiations of these classes. The UpdatePromptOptions rename also indicates a conceptual shift from "instructions" to "prompt" for clarity.

Comment on lines +138 to 146
options: SettingsOptions = SettingsOptions()
options.agent.think.provider.type = "open_ai"
options.agent.think.model = "gpt-4o-mini"
options.agent.think.instructions = "You are a helpful AI assistant."

options.agent.think.provider.model = "gpt-4o-mini"
options.agent.think.prompt = "You are a helpful AI assistant."
options.greeting = "Hello, this is a text to speech example using Deepgram."
options.agent.listen.provider.keyterms = ["hello", "goodbye"]
options.agent.listen.provider.model = "nova-3"
options.language = "en"
if dg_connection.start(options) is False:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Configuration structure updated to use nested provider model

The agent configuration has been significantly restructured:

  1. Uses SettingsOptions instead of SettingsConfigurationOptions
  2. Provider-specific configuration is now nested (e.g., agent.think.provider.type)
  3. Renamed instructions to prompt
  4. Added new top-level configuration fields (greeting, language)
  5. Listen configuration now uses provider-specific fields

This is a breaking change that requires clients to update their configuration structure. The new model is more organized but requires careful migration.

@@ -98,7 +98,7 @@ def __init__(self, config: DeepgramClientOptions):
self._config = config

# needs to be "wss://agent.deepgram.com/agent"
self._endpoint = "agent"
self._endpoint = "v1/agent/converse"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

WebSocket endpoint path updated

The WebSocket endpoint has been changed from "agent" to "v1/agent/converse", explicitly versioning the API.

This endpoint change is a breaking change that impacts connection handling. The new path uses versioning (v1), which is a good practice for API lifecycle management.

Comment on lines +123 to +127
if "endpoint" in _dict:
_dict["endpoint"] = [
Endpoint.from_dict(endpoint) for endpoint in _dict["endpoint"]
]
return _dict[key]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Function.__getitem__ treats endpoint as a list but the field is singular

endpoint is declared as Optional[Endpoint], yet the getter converts it using
a list comprehension, which will fail with TypeError: 'Endpoint' object is not iterable. Convert the single dict directly.

-        if "endpoint" in _dict:
-            _dict["endpoint"] = [
-                Endpoint.from_dict(endpoint) for endpoint in _dict["endpoint"]
-            ]
+        if "endpoint" in _dict and isinstance(_dict["endpoint"], dict):
+            _dict["endpoint"] = Endpoint.from_dict(_dict["endpoint"])
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if "endpoint" in _dict:
_dict["endpoint"] = [
Endpoint.from_dict(endpoint) for endpoint in _dict["endpoint"]
]
return _dict[key]
if "endpoint" in _dict and isinstance(_dict["endpoint"], dict):
_dict["endpoint"] = Endpoint.from_dict(_dict["endpoint"])
return _dict[key]

Comment on lines +136 to +141
mode: str = field(
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
)
id: str = field(
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

CartesiaVoice.mode and id default to None but are typed str – Mypy error

mypy rightfully complains that assigning None to a str violates the type
contract (see pipeline failure). Mark the fields as Optional[str] or provide
non-None defaults.

-    mode: str = field(
+    mode: Optional[str] = field(
...
-    id: str = field(
+    id: Optional[str] = field(
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
mode: str = field(
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
)
id: str = field(
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
)
mode: Optional[str] = field(
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
)
id: Optional[str] = field(
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
)
🧰 Tools
🪛 GitHub Actions: Check - static

[error] 136-136: mypy: Incompatible types in assignment (expression has type "None", variable has type "str") [assignment]


[error] 139-139: mypy: Incompatible types in assignment (expression has type "None", variable has type "str") [assignment]

"""
This class defines the provider for the Think model.
"""

type: Optional[str] = field(
type: str = field(default="open_ai")
Copy link
Contributor

@jpvajda jpvajda Apr 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@naomi-lgbt I don't think think.provider has an actual default in the API. You might want to check with @RyanChimienti.

The API Spec doesn't currently have a default think.provider set. If we find this is the default I can specify that accordingly in the API Spec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct John, it doesn't.

"""
This class defines the provider for the Think model.
"""

type: Optional[str] = field(
type: str = field(default="open_ai")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type: str = field(default="open_ai")
type: str = field

@@ -0,0 +1,14 @@
Connection opened: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably should add this file to gitignore.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7ad48ea and a38e670.

⛔ Files ignored due to path filters (1)
  • examples/agent/no_mic/output_0.wav is excluded by !**/*.wav
📒 Files selected for processing (2)
  • examples/agent/no_mic/chatlog.txt (1 hunks)
  • examples/agent/no_mic/main.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • examples/agent/no_mic/chatlog.txt
🔇 Additional comments (4)
examples/agent/no_mic/main.py (4)

10-21: Replace ad-hoc prints & unused imports with structured logging

verboselogs is imported but never used, and a series of print() statements are scattered throughout the file for diagnostic purposes. Prefer the std-lib logging module (or verboselogs if you really need verbose levels) so that:

  • log levels can be configured from the command line / env vars
  • output can be redirected or silenced by downstream users
  • unit tests aren’t polluted with console noise
-import requests
-import wave
-import io
-import time
-from datetime import datetime
-from deepgram.utils import verboselogs
-
-# Add debug prints for imports
-print("Checking imports...")
+import logging
+import requests
+import wave
+import io
+import time
+from datetime import datetime
+
+# Configure once, near the top-level of your application
+logging.basicConfig(level=logging.INFO)
+log = logging.getLogger(__name__)
+
+# Example replacement
+log.debug("Imports successful")

Later print(...) calls can be replaced with log.debug/info/warning/error(...) as appropriate.
[ suggest_optional_refactor ]


57-77: convert_audio_to_linear16 neither re-samples nor is it ever invoked

The helper promises “convert … to linear16” but:

  1. It simply writes/reads the same frames – no resampling or format conversion is done.
  2. The function is never called, leaving dead code in the example.

If you genuinely need resampling, use a DSP library (e.g. pydub, scipy, soundfile) or the Deepgram SDK’s built-in conversion. Otherwise, remove the function to reduce cognitive load.

[ suggest_optional_refactor ]


226-235: Download loop lacks basic robustness

  • response.raise_for_status() is omitted – silent HTTP 4xx/5xx will yield empty audio.
  • The stream is never closed (with requests.get(..., stream=True) as response:).
  • If the file is large, you may want back-pressure or to respect response.iter_content(None) when the WebSocket is paused.
-response = requests.get(url, stream=True)
-for chunk in response.iter_content(chunk_size=8192):
+with requests.get(url, stream=True, timeout=15) as response:
+    response.raise_for_status()
+    for chunk in response.iter_content(chunk_size=8192):
         if chunk:
             dg_connection.send(chunk)

[ suggest_essential_refactor ]


234-241: Arbitrary time.sleep(5) may end the script before the agent finishes

A fixed delay is brittle – network latency and model inference time vary.
Instead, either:

  • Wait for a specific terminating event (AgentAudioDone + Close) before exiting, or
  • Call a (hypothetical) dg_connection.wait_until_closed() if provided by the SDK.

[ suggest_optional_refactor ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants