Skip to content

Conversation

@stbenjam
Copy link
Member

@stbenjam stbenjam commented Nov 5, 2025

  • Add support for multiple LLM models including ChatVertexAnthropic
  • Implement models.yaml configuration system for model management
  • Add model selection UI in chat settings
  • Update environment configuration and documentation
  • Supports thought signatures (now required) for Gemini 3.0

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Multi-model support: backend model listing, per-request model selection, model shown on messages, settings UI to choose models, and model_id returned in responses.
    • Claude (Vertex AI) support with extended-thinking controls and runtime agent parameters (temperature, iterations, execution time, persona).
    • CLI/server options to configure Google/Vertex settings and external models config path.
  • Documentation

    • Expanded README with Vertex AI/Claude setup, extended-thinking guidance, and models.yaml examples.
  • Chores

    • Added dependencies and gitignore rule for models.yaml.

@openshift-ci-robot
Copy link

Pipeline controller notification
This repository is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. Review these jobs and use /test <job> to manually trigger optional jobs most likely to be impacted by the proposed changes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 5, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: stbenjam

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 5, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 5, 2025

Note

.coderabbit.yaml has unrecognized properties

CodeRabbit is using all valid settings from your configuration. Unrecognized properties (listed below) have been ignored and may indicate typos or deprecated fields that can be removed.

⚠️ Parsing warnings (1)
Validation error: Unrecognized key(s) in object: 'tools'
⚙️ Configuration instructions
  • Please see the configuration documentation for more information.
  • You can also validate your configuration using the online YAML validator.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Walkthrough

Adds multi-model support with a YAML model registry, Google Vertex AI (Claude) integration including extended-thinking modes, async agent initialization, an AgentManager for per-model agents, per-request model selection and propagation (model_id), backend /chat/models endpoint, and corresponding frontend model UI/state and attribution.

Changes

Cohort / File(s) Summary
Env / Examples
chat/.env.example, chat/.gitignore, chat/models.yaml.example
Consolidates LLM config into a unified Sippy AI Agent section with new globals (TEMPERATURE, SIPPY_API_URL, MAX_ITERATIONS, MAX_EXECUTION_TIME, PERSONA), de-emphasizes SIPPY_READ_ONLY_DATABASE_DSN, adds models.yaml ignore and provides chat/models.yaml.example schema and entries.
Docs
chat/README.md
Adds Claude/Vertex AI setup, authentication options, "Claude Extended Thinking" guidance, and a "Multiple Model Configuration (Optional)" section with examples and server/CLI usage for models.yaml.
CLI / Bootstrap
chat/main.py
Adds CLI options --google-project, --google-location, --thinking-budget, --models-config; applies overrides to Config (google_project_id, google_location, extended_thinking_budget) and passes models_config_path to the web server.
Config & Loader
chat/sippy_agent/config.py
Adds ModelConfig (per-model overrides + to_config()), extends Config with google_project_id, google_location, extended_thinking_budget, is_claude_model(), and adds load_models_config() to parse/validate YAML registry and determine default model.
Agent & Manager
chat/sippy_agent/agent.py
Defers heavy setup via async _initialize(), extends _create_llm() to support ChatAnthropicVertex (Claude) with extended-thinking handling and streaming thinking, emits thinking callbacks, improves tool-loading warnings, and introduces AgentManager to load/cache per-model SippyAgent instances and expose listing/getter APIs.
API Models
chat/sippy_agent/api_models.py
Adds model_id to ChatRequest and ChatResponse; introduces ModelInfo and ModelsResponse to expose available models and default selection.
Web Server / Routes
chat/sippy_agent/web_server.py
Replaces single-agent wiring with AgentManager, adds GET /chat/models, resolves per-request agent via model_id (fallback to default), propagates model_id in HTTP/WebSocket/thinking callbacks, and accepts models_config_path in constructor.
Requirements
chat/requirements.txt
Adds dependencies: langchain-google-vertexai>=3.0.0, anthropic>=0.20.0, and pyyaml>=6.0.0 (and bumps several langchain-related packages).
Frontend — Store
sippy-ng/src/chat/store/modelsSlice.js, sippy-ng/src/chat/store/settingsSlice.js, sippy-ng/src/chat/store/useChatStore.js
Adds modelsSlice (models, defaultModel, loading/error + loadModels() fetching /api/chat/models), exposes useModels selector, and adds modelId to settings (initial null).
Frontend — WebSocket
sippy-ng/src/chat/store/webSocketSlice.js
Includes model_id in outgoing user payload and assistant final message; moves adding the user message to history after WS send; preserves per-message error recording.
Frontend — UI attribution
sippy-ng/src/chat/ChatMessage.js
Resolves modelName via useModels, shows tooltip "AI-generated by {modelName}" when available, replaces static "AI" chip with model name fallback, and adds model_id PropType.
Frontend — Settings UI
sippy-ng/src/chat/ChatSettings.js
Adds "AI Model" section using useModels, lazy-loads models, renders Select dropdown with model descriptions and icons, updates settings.modelId, and handles loading/error states.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Frontend
    participant ModelsStore as Frontend:Models Store
    participant WebServer as Backend:Web Server
    participant AgentManager as Backend:Agent Manager
    participant Agent as Backend:SippyAgent
    participant LLM as Backend:LLM (Vertex AI / Claude)

    User->>Frontend: Open chat UI
    Frontend->>ModelsStore: loadModels()
    ModelsStore->>WebServer: GET /chat/models
    WebServer->>AgentManager: list_models()
    AgentManager-->>WebServer: {models, default_model}
    ModelsStore->>Frontend: update models state

    User->>Frontend: Select model + send message
    Frontend->>WebServer: POST /chat (model_id, message)
    WebServer->>AgentManager: get_agent(model_id)
    AgentManager->>Agent: return cached or create + Agent._initialize()
    Agent->>Agent: _create_llm() (Claude -> ChatAnthropicVertex)
    Agent->>LLM: send message (project/location, thinking budget)
    LLM->>Agent: stream thinking & content
    Agent->>WebServer: thinking_callback(model_id, partial_thought)
    Agent->>WebServer: final response(model_id, content)
    WebServer->>Frontend: ChatResponse(model_id, content)
    Frontend->>User: render message with model attribution
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas to focus on:

  • Claude/Vertex AI initialization, credential handling, project/location validation, and langchain integration.
  • Extended thinking logic (budgeting, token limits, temperature adjustments) and streaming correctness.
  • Async initialization and AgentManager caching for race conditions under concurrent requests.
  • models.yaml parsing/validation (duplicate/default handling) and fallback to env-based config.
  • End-to-end propagation of model_id across HTTP, WebSocket, backend responses, and frontend UI.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Single Responsibility And Clear Naming ⚠️ Warning AgentManager violates naming guidelines by using generic 'Manager' term. Config class accumulates multiple distinct concerns exceeding single responsibility principle. Rename AgentManager to specific name like MultiModelAgentRegistry or AgentFactory. Refactor Config class into focused sub-types: GoogleVertexConfig and ExtendedThinkingConfig.
✅ Passed checks (6 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely describes the main change: adding multi-model support with a configuration system, which is the primary focus across backend (agents, server, config) and frontend (chat settings, model store) changes.
Docstring Coverage ✅ Passed Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.
Go Error Handling ✅ Passed The custom check for Go Error Handling is not applicable to this pull request as no Go source files are present in the changes.
Sql Injection Prevention ✅ Passed PR adds multi-model LLM support with configuration management. No SQL operations, database queries, or SQL injection vulnerabilities detected in modified files.
Excessive Css In React Should Use Styles ✅ Passed React components use Material-UI and utility classes rather than large inline style objects, adhering to styling guidelines.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
chat/sippy_agent/web_server.py (2)

226-261: Avoid mutating shared agent config across concurrent requests

AgentManager caches a single SippyAgent per model, but both the HTTP and WebSocket paths mutate agent.config.show_thinking, agent.config.persona, and even rebuild agent.graph on that shared instance. With concurrent requests targeting the same model, those mutations race: persona or thinking overrides from request A can leak into request B before either finally block restores the original settings. A third request arriving in that window will run with the wrong persona/model behavior. Please isolate per-request overrides (e.g., clone the agent/config, add an agent-level async lock around these mutations, or extend SippyAgent.achat to accept override args so we never touch shared state).

Also applies to: 330-475


153-163: Return status from the actual default agent config

/status now resolves the default agent via AgentManager, but the response still reports model_name, endpoint, show_thinking, and persona from the base Config, which will be wrong whenever the default model comes from models.yaml. Surface the values from default_agent.config instead so the endpoint reflects the real model in use.

-            return AgentStatus(
-                available_tools=default_agent.list_tools(),
-                model_name=self.config.model_name,
-                endpoint=self.config.llm_endpoint,
-                thinking_enabled=self.config.show_thinking,
-                current_persona=self.config.persona,
+            agent_config = default_agent.config
+            return AgentStatus(
+                available_tools=default_agent.list_tools(),
+                model_name=agent_config.model_name,
+                endpoint=agent_config.llm_endpoint,
+                thinking_enabled=agent_config.show_thinking,
+                current_persona=agent_config.persona,
🧹 Nitpick comments (6)
chat/.gitignore (1)

142-145: LGTM!

Correctly ignores the instance-specific models.yaml configuration file while keeping the example template in version control.

Minor: Consider using a single blank line instead of two (lines 142-143) for consistency with the rest of the file, though this is purely stylistic.

chat/requirements.txt (1)

5-13: Dependencies align with multi-model support.

The three new dependencies correctly support Claude via Vertex AI and YAML-based configuration.

Consider tightening the anthropic>=0.20.0 constraint to anthropic>=0.20.0,<1.0.0 to avoid potential breaking changes in major version updates, as the Anthropic SDK has had breaking API changes in the past.

chat/.env.example (1)

1-92: Consider documenting the models.yaml approach in comments.

The restructured configuration is well-organized and clearly documents different model provider options. However, since this PR's main feature is the multi-model models.yaml configuration system, consider adding a comment at the top mentioning that:

  • These environment variables provide defaults/fallbacks when models.yaml isn't present
  • For multi-model support, users should create a models.yaml file (see models.yaml.example)

Additionally, line 66's SIPPY_API_URL defaults to production. Consider whether this should be commented out or point to localhost to avoid accidental production API calls during development.

sippy-ng/src/chat/store/webSocketSlice.js (1)

225-246: Review the UX implications of delayed message display.

The reordering makes sense for chat history management—the current message shouldn't be included in its own history context. However, this creates a delay where the user's message only appears in the UI after the network send completes.

Traditional chat UX typically shows the user's message immediately (optimistic UI) and displays an error if the send fails. The current approach may feel less responsive since users won't see their message until after the network round-trip.

Consider:

  1. Testing the perceived responsiveness with this change
  2. If the delay is noticeable, consider adding the message optimistically before send, then removing/marking it as failed if send fails
  3. The current approach does guarantee consistency between what the user sees and what was actually sent
sippy-ng/src/chat/store/modelsSlice.js (1)

12-49: Consider adding concurrent call protection.

The loadModels function correctly fetches models and sets the default, but doesn't protect against concurrent calls. If loadModels is called multiple times in quick succession (e.g., on reconnect or user navigation), multiple fetches could race and cause state inconsistencies.

Consider one of these approaches:

  1. Track in-flight requests and return early if already loading
  2. Abort previous requests using AbortController when a new request starts
  3. Ensure the UI only calls loadModels once on mount/initialization

Example with early return:

 loadModels: () => {
+  // Skip if already loading
+  if (get().modelsLoading) {
+    return
+  }
+
   const apiUrl =
     process.env.REACT_APP_CHAT_API_URL || window.location.origin + '/api/chat'
chat/sippy_agent/agent.py (1)

708-716: Drop the redundant f-string

description=f"Model from environment configuration" has no interpolation. Please remove the f prefix for clarity.

-                description=f"Model from environment configuration",
+                description="Model from environment configuration",
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between f78f207 and 1d2bc37.

📒 Files selected for processing (16)
  • chat/.env.example (1 hunks)
  • chat/.gitignore (1 hunks)
  • chat/README.md (2 hunks)
  • chat/main.py (3 hunks)
  • chat/models.yaml.example (1 hunks)
  • chat/requirements.txt (1 hunks)
  • chat/sippy_agent/agent.py (8 hunks)
  • chat/sippy_agent/api_models.py (3 hunks)
  • chat/sippy_agent/config.py (5 hunks)
  • chat/sippy_agent/web_server.py (10 hunks)
  • sippy-ng/src/chat/ChatMessage.js (5 hunks)
  • sippy-ng/src/chat/ChatSettings.js (6 hunks)
  • sippy-ng/src/chat/store/modelsSlice.js (1 hunks)
  • sippy-ng/src/chat/store/settingsSlice.js (1 hunks)
  • sippy-ng/src/chat/store/useChatStore.js (3 hunks)
  • sippy-ng/src/chat/store/webSocketSlice.js (4 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
chat/README.md

46-46: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🪛 Ruff (0.14.3)
chat/sippy_agent/agent.py

710-710: f-string without any placeholders

Remove extraneous f prefix

(F541)

🔇 Additional comments (7)
chat/README.md (1)

46-162: Excellent documentation of new multi-model features.

The documentation comprehensively covers:

  • Claude/Vertex AI authentication options
  • Extended thinking feature with important caveats (temperature=1.0 requirement, regional availability)
  • Multi-model configuration via models.yaml
  • Clear usage examples

The explanations are clear and include important warnings about limitations and requirements.

chat/models.yaml.example (1)

16-41: Well-structured model configuration examples.

The configuration examples are clear and demonstrate key features:

  • Multiple models from different providers
  • Default model selection
  • Thinking-enabled variant with required temperature: 1.0
  • Optional fields (description, temperature, extended_thinking_budget)

Ensure that the configuration loading code (likely in chat/sippy_agent/config.py) validates that only one model has default: true. The YAML format itself doesn't enforce this constraint.

sippy-ng/src/chat/store/settingsSlice.js (1)

10-10: LGTM!

Clean addition of modelId to the settings state. The comment clearly indicates the initialization strategy, and this integrates well with the modelsSlice which loads and sets the default model.

sippy-ng/src/chat/store/webSocketSlice.js (2)

55-55: LGTM!

Adding model_id to the assistant message metadata enables proper model attribution in the UI, aligning with the backend's multi-model support.


234-234: LGTM!

Correctly includes model_id in the outgoing payload, enabling the backend to use the user-selected model.

sippy-ng/src/chat/store/modelsSlice.js (2)

13-15: Base URL construction handles edge cases correctly.

The chained .replace() calls properly handle both trailing slashes and the /stream suffix.


33-37: Good UX: auto-setting default model.

Automatically setting the default model when the user hasn't selected one provides a smooth initial experience.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
sippy-ng/src/chat/ChatSettings.js (1)

339-417: Extract duplicated model resolution logic.

The model ID resolution logic is duplicated between the Select value (lines 362-376) and the description rendering (lines 389-414). This violates the DRY principle and creates a maintenance risk if one copy is updated but not the other.

Extract the resolution logic into a helper function as suggested in the past review:

+  const resolveSelectedModelId = () => {
+    if (models.length === 0) {
+      return ''
+    }
+
+    if (settings.modelId && models.find((m) => m.id === settings.modelId)) {
+      return settings.modelId
+    }
+
+    if (defaultModel && models.find((m) => m.id === defaultModel)) {
+      return defaultModel
+    }
+
+    return models[0].id
+  }
+
+  const selectedModelId = resolveSelectedModelId()
+  const selectedModel = models.find((m) => m.id === selectedModelId) || null

Then update the render to use these computed values:

               <Select
                 labelId="model-select-label"
-                value={
-                  (() => {
-                    // Resolve the effective model ID before rendering
-                    if (
-                      settings.modelId &&
-                      models.find((m) => m.id === settings.modelId)
-                    ) {
-                      return settings.modelId
-                    }
-                    if (defaultModel && models.find((m) => m.id === defaultModel)) {
-                      return defaultModel
-                    }
-                    return models.length > 0 ? models[0].id : ''
-                  })()
-                }
+                value={selectedModelId}
                 onChange={handleModelChange}

             ...

-            {(() => {
-              // Resolve model ID and look up the selected model once
-              const resolvedModelId =
-                settings.modelId && models.find((m) => m.id === settings.modelId)
-                  ? settings.modelId
-                  : defaultModel && models.find((m) => m.id === defaultModel)
-                    ? defaultModel
-                    : models.length > 0
-                      ? models[0].id
-                      : null
-
-              const selectedModel = resolvedModelId
-                ? models.find((m) => m.id === resolvedModelId)
-                : null
-
-              return (
-                selectedModel &&
-                selectedModel.description && (
-                  <Box className={classes.personaDescription}>
-                    <Typography variant="body2" color="textPrimary">
-                      {selectedModel.description}
-                    </Typography>
-                  </Box>
-                )
-              )
-            })()}
+            {selectedModel?.description && (
+              <Box className={classes.personaDescription}>
+                <Typography variant="body2" color="textPrimary">
+                  {selectedModel.description}
+                </Typography>
+              </Box>
+            )}
🧹 Nitpick comments (2)
sippy-ng/src/chat/ChatSettings.js (2)

187-189: Remove unused helper or refactor to use it.

The getSelectedModel() function is defined but never used in the render. The Model Selection UI (lines 362-376 and 389-414) uses inline resolution logic instead.

Either remove this unused function or refactor the duplicated resolution logic to use a proper helper (see next comment).

-  const getSelectedModel = () => {
-    return models.find((m) => m.id === settings.modelId) || models[0]
-  }

358-358: Consider renaming the reused style class.

The model FormControl reuses classes.personaSelect, which works functionally but is slightly misleading. Consider renaming it to a more generic name like selectControl or adding a separate modelSelect class for clarity.

This is a minor naming issue and can be deferred, but it would improve code clarity.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 1d2bc37 and eb36e37.

📒 Files selected for processing (3)
  • chat/README.md (3 hunks)
  • chat/sippy_agent/config.py (5 hunks)
  • sippy-ng/src/chat/ChatSettings.js (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • chat/sippy_agent/config.py
🔇 Additional comments (8)
chat/README.md (4)

46-46: Previous markdown issue resolved.

The MD036 violation flagged in the prior review (bold emphasis instead of heading) has been corrected. Line 46 now uses a proper level-4 heading (#### Optional: Claude Models via Google Vertex AI), which is appropriate as a sub-section within the level-3 "Configuration" section.


46-92: Comprehensive Claude/Vertex AI integration documentation.

The new section clearly documents setup requirements, authentication options (gcloud vs service account), extended thinking configuration, budget controls, and important caveats around temperature and regional availability. The examples with --thinking-budget and error-handling guidance are particularly helpful.


94-128: Well-structured models.yaml configuration documentation.

The section clearly explains the YAML configuration format, lists all relevant options (id, name, model_name, endpoint, temperature, extended_thinking_budget, default), and documents the fallback behavior when models.yaml is absent. The auto-loading behavior (line 128) is valuable context for users.


159-162: Clear example coverage for Claude/Vertex AI usage.

The new example at line 161 (python main.py serve --model claude-3-5-sonnet@20240620) complements the existing OpenAI and Gemini examples, making multi-model support discoverable to users.

sippy-ng/src/chat/ChatSettings.js (4)

27-27: LGTM - Imports follow existing patterns.

The new imports for ModelIcon and useModels are consistent with the existing persona implementation.

Also applies to: 40-40


114-115: LGTM - Consistent with persona loading pattern.

The useModels hook usage follows the same pattern as usePersonas, including loading state, error state, and load action.


135-139: LGTM - Correct on-demand loading.

The effect correctly loads models on demand using the same pattern as persona loading.


177-181: LGTM - Handler matches persona pattern.

The model change handler correctly updates settings.modelId, consistent with the persona handler.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
sippy-ng/src/chat/ChatSettings.js (1)

298-349: LGTM: AI Model UI is well-structured with proper state handling.

The model selection section correctly handles loading, error, and active states. The Select is properly bound to the resolved model ID, ensuring synchronization between the dropdown and description.

Optional: Consider simplifying the description rendering to match the persona pattern.

The IIFE pattern (lines 334-346) works correctly but is more verbose than necessary. For consistency with the persona section (line 389), consider this refactor:

-            {(() => {
-              const selectedModel = getSelectedModel()
-              return (
-                selectedModel &&
-                selectedModel.description && (
-                  <Box className={classes.personaDescription}>
-                    <Typography variant="body2" color="textPrimary">
-                      {selectedModel.description}
-                    </Typography>
-                  </Box>
-                )
-              )
-            })()}
+            {getSelectedModel()?.description && (
+              <Box className={classes.personaDescription}>
+                <Typography variant="body2" color="textPrimary">
+                  {getSelectedModel().description}
+                </Typography>
+              </Box>
+            )}

This matches the persona rendering style and eliminates the IIFE wrapper.

chat/sippy_agent/config.py (2)

28-42: Clarify the endpoint handling logic.

Line 34 uses if self.endpoint to check whether to override the endpoint. Since endpoint defaults to "" (empty string for Vertex AI), this condition is falsy for Vertex AI models, causing them to fall back to base_config.llm_endpoint. While this may be intentional (Vertex AI doesn't use OpenAI-compatible endpoints), the logic would be clearer with an explicit empty-string check.

Consider making the intent more explicit:

-        config_dict["llm_endpoint"] = self.endpoint if self.endpoint else base_config.llm_endpoint
+        config_dict["llm_endpoint"] = self.endpoint if self.endpoint != "" else base_config.llm_endpoint

Alternatively, document that empty string means "use base endpoint" or make endpoint Optional with None as the sentinel value.


174-225: Consider preserving exception context in error handling.

The function has excellent validation logic and structure. However, the broad except Exception as e catch on line 224 wraps all errors in ValueError, which can hide useful debugging information from different exception types (e.g., yaml.YAMLError, ValidationError).

Consider preserving the exception chain:

     except Exception as e:
-        raise ValueError(f"Error loading models configuration: {e}")
+        raise ValueError(f"Error loading models configuration: {e}") from e

This maintains the original traceback while still providing a clear high-level error message.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between eb36e37 and 0f86963.

📒 Files selected for processing (3)
  • chat/README.md (3 hunks)
  • chat/sippy_agent/config.py (5 hunks)
  • sippy-ng/src/chat/ChatSettings.js (8 hunks)
🔇 Additional comments (14)
chat/README.md (3)

36-93: Documentation update is well-structured and comprehensive.

The additions for Claude/Vertex AI setup, extended thinking configuration, and multi-model support are clearly organized, with options for both local development (gcloud auth) and production (service account). The past markdown linting issue has been properly resolved with heading-level formatting.


94-129: Multi-model configuration section clearly documents new feature.

The models.yaml schema documentation, backward-compatibility note (line 120), and auto-loading behavior (line 128) provide good context for users. The distinction between per-model configuration and shared environment variables is clear.


145-162: Usage examples cover key scenarios effectively.

Examples progress logically from OpenAI, Gemini, to Claude/Vertex AI models, maintaining consistent command-line syntax. The final example (lines 160–161) demonstrates multi-model support integration.

sippy-ng/src/chat/ChatSettings.js (6)

27-27: LGTM: Icon and hook imports are correct.

The ModelIcon and useModels additions properly support the new AI model selection feature.

Also applies to: 40-40


91-93: LGTM: Consistent styling applied.

The fullWidthSelect style properly ensures consistent width for both model and persona selectors.

Also applies to: 317-317, 372-372


114-115: LGTM: Model loading logic mirrors personas pattern.

The useModels hook integration and lazy-loading useEffect are implemented correctly and consistently with the existing personas approach.

Also applies to: 135-139


177-181: LGTM: Model change handler is correct.

The handleModelChange implementation properly updates settings when the user selects a different model.


187-202: LGTM: Fallback logic correctly resolves model selection.

The helper functions properly implement a three-tier fallback (user setting → default → first available) and handle all edge cases including empty model lists and stale settings. This addresses the previous review concern about Select/description synchronization.


351-351: LGTM: Section divider maintains visual consistency.

The divider properly separates the AI Model and AI Persona sections, consistent with the rest of the settings drawer.

chat/sippy_agent/config.py (5)

6-8: LGTM!

The new imports are appropriate for the multi-model configuration functionality.


16-27: LGTM!

The ModelConfig fields are well-defined. The temperature and extended_thinking_budget are correctly declared as Optional with default=None, which properly addresses the past review concern about clobbering base configuration defaults.


66-74: LGTM!

The new Config fields for Google/Vertex AI integration are well-defined with appropriate defaults and environment variable bindings.

Also applies to: 119-122


138-140: LGTM!

The is_claude_model() method follows the established pattern and correctly identifies Claude models.


157-164: LGTM!

The validation for Claude/Vertex AI configuration requirements is appropriate and provides clear error messaging.

@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e

stbenjam and others added 4 commits November 6, 2025 07:04
- Add support for multiple LLM models including ChatVertexAnthropic
- Implement models.yaml configuration system for model management
- Add model selection UI in chat settings
- Update environment configuration and documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- ChatSettings.js: Resolve effective model ID before rendering to prevent
  unbound Select value warnings and keep picker/description in sync
- config.py: Make ModelConfig temperature and extended_thinking_budget
  optional to avoid clobbering environment/CLI defaults in to_config
- README.md: Replace bold emphasis with proper Markdown headings for
  better document structure and accessibility
- Extract duplicated model resolution into getResolvedModelId() helper
- Update getSelectedModel() to use the resolution helper
- Rename personaSelect to fullWidthSelect for semantic clarity
- Move AI Model section above AI Persona (more important)

Addresses additional CodeRabbitAI feedback for DRY principle and
code clarity.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
chat/sippy_agent/web_server.py (1)

218-265: Shared agent config mutation is not request‑safe (pre‑existing, now per‑model)

chat() mutates agent.config.show_thinking and agent.config.persona per request, then restores them in finally. Because the same SippyAgent instance is reused across concurrent HTTP requests for a model, overlapping requests can race on these fields, leading to cross‑request leakage of persona or thinking settings (this pattern already existed with the single global agent; AgentManager now applies it per model). To make overrides request‑local, avoid mutating shared config and instead pass persona/show‑thinking as parameters into the graph/LLM layer, or gate access to a given agent behind an asyncio.Lock so only one request at a time can use it. The former keeps concurrency, the latter preserves current behavior but serializes usage.

🧹 Nitpick comments (4)
sippy-ng/src/chat/ChatSettings.js (2)

187-202: Consider memoizing model resolution helpers.

The helper functions getResolvedModelId() and getSelectedModel() are called inline during render (lines 321, 335), causing them to re-execute on every render. For better performance, consider computing the resolved model ID once outside the render path.

Example approach using useMemo:

+  const resolvedModelId = React.useMemo(() => {
+    if (settings.modelId && models.find((m) => m.id === settings.modelId)) {
+      return settings.modelId
+    }
+    if (defaultModel && models.find((m) => m.id === defaultModel)) {
+      return defaultModel
+    }
+    return models.length > 0 ? models[0].id : ''
+  }, [settings.modelId, defaultModel, models])
+
+  const selectedModel = React.useMemo(() => {
+    return resolvedModelId ? models.find((m) => m.id === resolvedModelId) : null
+  }, [resolvedModelId, models])

334-346: Simplify description rendering.

The IIFE pattern here is unconventional and adds cognitive overhead. Since getSelectedModel() already handles the null case, you can simplify this block.

-            {(() => {
-              const selectedModel = getSelectedModel()
-              return (
-                selectedModel &&
-                selectedModel.description && (
-                  <Box className={classes.personaDescription}>
-                    <Typography variant="body2" color="textPrimary">
-                      {selectedModel.description}
-                    </Typography>
-                  </Box>
-                )
-              )
-            })()}
+            {getSelectedModel()?.description && (
+              <Box className={classes.personaDescription}>
+                <Typography variant="body2" color="textPrimary">
+                  {getSelectedModel().description}
+                </Typography>
+              </Box>
+            )}

Note: If you implement the memoization suggestion from the previous comment, you would use the memoized selectedModel variable instead of calling getSelectedModel() multiple times.

chat/sippy_agent/web_server.py (1)

17-31: Multi‑model wiring and /chat/models endpoint look coherent

The switch to AgentManager plus the /chat/models endpoint is wired cleanly: models are listed via AgentManager.list_models(), wrapped into ModelInfo, and the default model ID is exposed via ModelsResponse. Using agent_manager.get_agent() in /status ensures tools are initialized before calling list_tools(). One minor design question: metrics.agent_info still uses config.model_name/config.llm_endpoint, which may not reflect the default model once models.yaml is in use. Consider either documenting that these metrics refer to the base config only, or updating them to reflect get_default_model_id()/its config if that’s what you care about operationally.

Also applies to: 88-112, 139-147, 153-163

chat/sippy_agent/agent.py (1)

40-47: Async initialization pattern looks good; list_tools guard prevents misuse

Deferring tools/graph creation into _initialize() and calling it from both AgentManager.get_agent() and SippyAgent.achat() gives you lazy startup while keeping callers safe. The new _initialized check plus the list_tools() guard (returning [] when uninitialized) avoids attribute errors on status calls. If you ever expect heavy parallel startup, you may want an asyncio.Lock around the _initialize body to avoid duplicate tool/graph construction, but that’s an optimization rather than a correctness issue.

Also applies to: 48-54, 670-675

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 0f86963 and 80fe8b4.

📒 Files selected for processing (16)
  • chat/.env.example (1 hunks)
  • chat/.gitignore (1 hunks)
  • chat/README.md (3 hunks)
  • chat/main.py (3 hunks)
  • chat/models.yaml.example (1 hunks)
  • chat/requirements.txt (1 hunks)
  • chat/sippy_agent/agent.py (8 hunks)
  • chat/sippy_agent/api_models.py (3 hunks)
  • chat/sippy_agent/config.py (5 hunks)
  • chat/sippy_agent/web_server.py (10 hunks)
  • sippy-ng/src/chat/ChatMessage.js (5 hunks)
  • sippy-ng/src/chat/ChatSettings.js (8 hunks)
  • sippy-ng/src/chat/store/modelsSlice.js (1 hunks)
  • sippy-ng/src/chat/store/settingsSlice.js (1 hunks)
  • sippy-ng/src/chat/store/useChatStore.js (3 hunks)
  • sippy-ng/src/chat/store/webSocketSlice.js (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • sippy-ng/src/chat/store/modelsSlice.js
  • sippy-ng/src/chat/store/settingsSlice.js
  • sippy-ng/src/chat/store/webSocketSlice.js
  • sippy-ng/src/chat/ChatMessage.js
  • chat/.gitignore
  • chat/requirements.txt
🧰 Additional context used
🪛 Ruff (0.14.5)
chat/sippy_agent/agent.py

711-711: f-string without any placeholders

Remove extraneous f prefix

(F541)

🔇 Additional comments (4)
chat/models.yaml.example (1)

34-46: No changes needed. Model version is current and temperature constraint is properly enforced.

The claude-sonnet-4-5@20250929 version is correct and current as of November 2025. The temperature constraint concern is mitigated by the agent initialization logic in chat/sippy_agent/agent.py (line 80-86): extended thinking is only enabled when show_thinking=True AND extended_thinking_budget > 0, and when enabled, temperature is automatically set to 1.0. This prevents runtime errors and ensures the API requirement is satisfied.

chat/sippy_agent/agent.py (3)

176-204: Tool setup change (logging when DB DSN missing) is reasonable

Emitting a warning when SIPPY_READ_ONLY_DATABASE_DSN is not configured makes the absence of the DB tool explicit without changing behavior when it is present. This is a nice observability improvement and should help diagnose missing configuration in lower envs.


485-555: ---

Review comment contains incorrect assumptions about thinking content stream format

The original review assumes thinking content arrives as chunk.content parts with {"type": "thinking", "thinking": "..."} for both Gemini and Claude. This is not accurate per current LangChain documentation:

  • Gemini: Streams thinking via parts marked with a part.thought boolean flag, not a type: "thinking" key with a separate thinking field
  • Claude-on-Vertex (extended thinking): Uses reasoning_content blocks, not type: "thinking" with a thinking field

The code at lines 485–555 checks part.get("type") == "thinking" and "thinking" in part, which won't match either model's actual format. This logic requires correction to extract thinking content successfully for these models.

Likely an incorrect or invalid review comment.


55-116: Concern unfounded: ChatAnthropicVertex supports per-call thinking overrides

The review comment's concern about extended thinking being "fixed for the lifetime of the agent" is not accurate. ChatAnthropicVertex exposes thinking options (thinking_budget, include_thoughts, etc.) that can be passed per-invocation via kwargs/invoke/predict/predict_messages. This means per-request control over extended thinking is available without needing separate agents or LLMs. The thinking configuration set at construction time serves as a default, but individual calls can override these settings, making the suggested workarounds unnecessary.

Likely an incorrect or invalid review comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
chat/sippy_agent/agent.py (1)

725-737: Fix Ruff F541 on synthetic model description; optional concurrency hardening in get_agent

The synthetic ModelConfig uses an f-string without interpolation, which Ruff flags as F541. You can resolve this by dropping the f prefix as previously suggested in the earlier review. This also keeps CI green. As this repeats an earlier comment, tagging as duplicate.

-            synthetic_model = ModelConfig(
-                id="default",
-                name=base_config.model_name,
-                description=f"Model from environment configuration",
-                model_name=base_config.model_name,
-                endpoint=base_config.llm_endpoint,
-                temperature=base_config.temperature,
-                extended_thinking_budget=base_config.extended_thinking_budget,
-                default=True,
-            )
+            synthetic_model = ModelConfig(
+                id="default",
+                name=base_config.model_name,
+                description="Model from environment configuration",
+                model_name=base_config.model_name,
+                endpoint=base_config.llm_endpoint,
+                temperature=base_config.temperature,
+                extended_thinking_budget=base_config.extended_thinking_budget,
+                default=True,
+            )

Separately, AgentManager.get_agent() is straightforward and readable, but if you expect high concurrency it might be worth guarding agent creation with a simple async lock or similar so two concurrent callers for the same model_id don’t both pay the initialization cost before the cache is populated. Not required for correctness, just an efficiency/robustness improvement. (As per static analysis hints and past review comments.)

Also applies to: 756-793

🧹 Nitpick comments (3)
chat/sippy_agent/agent.py (2)

40-55: Async lazy initialization looks good; consider guarding add_tool usage

The _initialized flag and _initialize() give you a clean lazy-init path for tools/graph, and list_tools() now safely returns [] when uninitialized. The only remaining sharp edge is add_tool(), which assumes self.tools is non‑None; if someone calls it before the first achat()/get_agent(), it will hit a NoneType append. Consider either documenting that add_tool requires a fully initialized agent or adding a defensive check there (e.g., raising a clear error if _initialized is False), so misuse fails loudly rather than with an attribute error.

Also applies to: 678-691


494-571: Thought streaming logic cleanly separates Gemini vs Claude; small optional guard

The updated _achat_streaming() correctly distinguishes Gemini’s complete-thought chunks from Claude’s token stream, buffers thoughts, and then injects each as a separate thinking_steps entry at the front, preserving chronological order. As a micro‑optimization, you could early‑exit the “thinking” parsing paths when both thinking_callback is None and show_thinking is False, since in that case the accumulated thought_buffer is never surfaced. Not a blocker, just a small potential perf/clarity win.

Also applies to: 661-668

chat/requirements.txt (1)

1-8: I need to verify the LangChain ecosystem's versioning practices and compatibility recommendations to determine if the upper-bound constraint concern is justified.

Upper-bound version constraints are optional best practice, not required.

LangChain and LangGraph follow Semantic Versioning principles, with the first stable releases (1.0.0) providing production-ready APIs. These 1.0 releases mark a commitment to stability and no breaking changes until 2.0. The pip dependency resolution test confirmed no conflicts among the specified versions.

While adding upper-bound constraints (e.g., langchain>=1.0.0,<2.0.0) would provide ultra-conservative stability and prevent unexpected major-version jumps, they are not required for safe operation. The current lower-bound-only approach is defensible given that:

  1. With LangChain 1.0, minor releases do not include breaking changes
  2. All core packages (langchain, langchain-core, langchain-openai, langgraph) are at 1.0+ with semantic versioning guarantees
  3. Legacy versions (LangChain 0.3 and LangGraph 0.4) are in MAINTENANCE mode until December 2026

Adding upper bounds is a valid optimization for reproducibility and risk-averse deployments but remains optional given the ecosystem's versioning commitments. Consider upper bounds only if: (a) stricter repeatability is required, or (b) you want to enforce explicit major-version control across your team's deployments.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 80fe8b4 and 7dbf01a.

📒 Files selected for processing (2)
  • chat/requirements.txt (1 hunks)
  • chat/sippy_agent/agent.py (8 hunks)
🧰 Additional context used
🪛 Ruff (0.14.5)
chat/sippy_agent/agent.py

728-728: f-string without any placeholders

Remove extraneous f prefix

(F541)

🔇 Additional comments (3)
chat/sippy_agent/agent.py (2)

197-203: DB tool warning improves observability

Adding an explicit warning when SIPPY_READ_ONLY_DATABASE_DSN is not configured is a nice touch; it makes it much clearer why the DB query tool is missing instead of silently omitting it.


457-459: Ensuring _initialize() runs before handling chats is the right safeguard

Awaiting self._initialize() at the top of achat() guarantees tools and graph are ready before streaming, and because _initialize() is idempotent, repeated calls are cheap. This aligns well with the new lazy-init pattern.

chat/requirements.txt (1)

5-5: Let me check the pyyaml security status to complete the verification:

Dependency versions are current, stable, and secure.

PyYAML 6.0.3 is the latest non-vulnerable version, fixing vulnerabilities that existed before version 5.4. Anthropic 0.20.0 has no known vulnerabilities, and the latest stable anthropic release is 0.73.0. PyYAML demonstrates a positive version release cadence with recent releases. All three dependencies use permissive >= version constraints, allowing automatic updates to the latest secure releases. The specified versions are current as of the PR's timeline and free from known security vulnerabilities.

Comment on lines +55 to +116
def _create_llm(self) -> Union[ChatOpenAI, ChatGoogleGenerativeAI, ChatAnthropicVertex]:
"""Create the language model instance."""
if self.config.verbose:
logger.info(f"Creating LLM with endpoint: {self.config.llm_endpoint}")
logger.info(f"Using model: {self.config.model_name}")

# Use ChatAnthropicVertex for Claude models via Vertex AI
if self.config.is_claude_model():
if not self.config.google_project_id:
raise ValueError(
"Google Cloud project ID is required for Claude models via Vertex AI"
)

# Set credentials file if provided, otherwise use Application Default Credentials (gcloud auth)
if self.config.google_credentials_file:
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = self.config.google_credentials_file
if self.config.verbose:
logger.info(f"Using explicit credentials: {self.config.google_credentials_file}")
else:
if self.config.verbose:
logger.info("Using Application Default Credentials (gcloud auth)")

# Enable extended thinking for Claude if show_thinking is enabled
# Note: Extended thinking requires temperature=1 and max_tokens > budget_tokens
enable_extended_thinking = self.config.show_thinking and self.config.extended_thinking_budget > 0

llm_kwargs = {
"model_name": self.config.model_name,
"project": self.config.google_project_id,
"location": self.config.google_location,
"temperature": 1.0 if enable_extended_thinking else self.config.temperature,
}

if enable_extended_thinking:
# max_tokens must be greater than thinking budget
# Claude's max output is 64K tokens
max_tokens = 64000

llm_kwargs["max_tokens"] = max_tokens
llm_kwargs["model_kwargs"] = {
"thinking": {
"type": "enabled",
"budget_tokens": self.config.extended_thinking_budget
}
}
if self.config.verbose:
logger.info(f"Extended thinking enabled with budget: {self.config.extended_thinking_budget} tokens")
logger.info(f"Max tokens set to {max_tokens}")
logger.info("Temperature automatically set to 1.0 (required for extended thinking)")
elif self.config.show_thinking:
if self.config.verbose:
logger.info("Extended thinking disabled (budget=0)")

if self.config.verbose:
logger.info(
f"Using ChatAnthropicVertex for Claude model: {self.config.model_name} "
f"(project: {self.config.google_project_id}, location: {self.config.google_location})"
)

return ChatAnthropicVertex(**llm_kwargs)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

Claude Vertex AI integration and extended-thinking wiring look coherent; please verify constructor API

The new Claude/Vertex branch correctly enforces google_project_id, handles ADC vs explicit credentials, and toggles temperature/max tokens when extended thinking is enabled. The only thing I’d double‑check is that ChatAnthropicVertex indeed expects model_name, project, location, temperature, max_tokens, and model_kwargs["thinking"] in this shape and that max_tokens=64000 is within the current limits for the chosen Claude model variant.

You might also consider clamping or validating extended_thinking_budget against the chosen max_tokens to fail fast if someone misconfigures the budget.


🌐 Web query:

Verify the current constructor signature and supported keyword arguments for `langchain_google_vertexai.model_garden.ChatAnthropicVertex` (including `model_name`, `project`, `location`, `temperature`, `max_tokens`, and `model_kwargs`) and how to configure Claude “extended thinking”/thinking budgets correctly.

💡 Result:

Short answer:

  • Current constructor/kwargs for langchain_google_vertexai.model_garden.ChatAnthropicVertex include (explicit highlights): model_name (alias model), project, location (default "us-central1"), max_output_tokens (alias max_tokens), temperature, top_k, top_p, max_retries, tuned_model_name, thinking_budget, full_model_name, additional_headers, access_token, api_endpoint, api_transport, disable_streaming, safety_settings, etc. The class also accepts arbitrary provider-specific options via **kwargs (passed to the invoke call). See the API reference for the full parameter list. [1]

  • To enable Claude “extended thinking” you must pass Anthropic’s thinking block in the model request: set thinking.type = "enabled" and provide thinking.budget_tokens (minimum ~1,024 tokens; recommended to start there and increase as needed). Budget tokens control the model’s internal reasoning budget (billed) and should be set with max_tokens / max_output_tokens larger than or equal to the budget; follow Anthropic guidance on limits (practical suggestions around 16k+, and caution above ~32k). Example form (JSON): { "thinking": { "type": "enabled", "budget_tokens": 10000 } }. Anthropic docs also describe interleaved-thinking headers for tool use and other considerations (billing, latency, min budget, recommendations). [2][3]

Notes / pointers:

  • Use ChatAnthropicVertex's model_name (or model) to select a Claude model (e.g., the Sonnet/Claude model id) and use project/location to target Vertex AI. The LangChain class documents the aliases: max_output_tokens == max_tokens and model_name == model. [1]
  • Extended thinking parameters are part of Anthropic’s request payload (not a LangChain-specific named keyword). When using ChatAnthropicVertex you supply them via model kwargs / extra request fields that LangChain forwards to the provider (i.e., include the thinking object in the **kwargs passed to invoke/generate). Confirm exact forwarding in your LangChain version if you need programmatic assurance. [1][2]

Sources:

  1. LangChain ChatAnthropicVertex API reference (parameter list, aliases). [LangChain docs]. (api.python.langchain.com)
  2. Anthropic “Building with extended thinking” (how to enable thinking, budget_tokens, min/recommendations, interleaved thinking notes). (anthropic.mintlify.app)

If you want, I can show a concrete code example for LangChain (Python) that sets model_name/project/location/max_tokens/temperature and passes the thinking block.

Citations:


Based on the web search results, I can now verify the review comment:

ChatAnthropicVertex constructor API is compatible; however, hardcoded max_tokens and missing budget validation need attention

The web search confirms that ChatAnthropicVertex accepts model_name, project, location, temperature, and max_tokens (aliased max_output_tokens), and the code's usage is correct.

However, two concerns remain:

  1. Hardcoded max_tokens=64000 may exceed limits. Anthropic recommends caution above ~32k tokens for thinking budgets. The hardcoded 64000 should be validated against the specific Claude model variant's limits and potentially made configurable.

  2. Missing extended_thinking_budget validation. The code does not validate that:

    • extended_thinking_budget meets the minimum (~1,024 tokens per Anthropic guidance)
    • extended_thinking_budget does not exceed max_tokens
    • The budget aligns with the chosen model's constraints

The original review's suggestion to validate extended_thinking_budget against max_tokens and fail fast on misconfiguration should be implemented before merging.

🤖 Prompt for AI Agents
In chat/sippy_agent/agent.py around lines 55-116, replace the hardcoded
max_tokens and add validation for extended_thinking_budget: derive max_tokens
from a configurable value (e.g., self.config.max_output_tokens or model-specific
limit) instead of 64000, ensure extended_thinking_budget is >=1024 and <=
max_tokens (and <= model limit if available), log the resolved max_tokens and
budget, and raise a clear ValueError if validation fails so misconfiguration
fails fast before constructing ChatAnthropicVertex.

@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e

@stbenjam
Copy link
Member Author

/hold

I think its reviewable/testable but looking at one of the coderabbit comments about a race

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 18, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 18, 2025

@stbenjam: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e 7dbf01a link true /test e2e

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants