Skip to content

feat(memory): add STM condensation scheduling (STMO)#1400

Open
spalimpaaces-star wants to merge 19 commits into
lightseekorg:mainfrom
spalimpaaces-star:feat/stmo-write
Open

feat(memory): add STM condensation scheduling (STMO)#1400
spalimpaaces-star wants to merge 19 commits into
lightseekorg:mainfrom
spalimpaaces-star:feat/stmo-write

Conversation

@spalimpaaces-star
Copy link
Copy Markdown
Contributor

@spalimpaaces-star spalimpaaces-star commented Apr 27, 2026

Description

Problem

When STM condensation (STMO) is enabled, the Responses API does not write a job entry to the queue after persisting a conversation turn. As a result, STM condensation never gets triggered regardless of the configuration.

A second class of problems was identified during review: even with the scheduling wired up, ConversationTurnInfo — the struct carrying user_turns and total_items used to decide when and where to schedule the job — was being computed from the wrong input in several paths. This caused jobs to be enqueued at the wrong turn boundary or with an incorrect target_item_end, making STMO fire too early, too late, or point at the wrong position in the conversation.

Note: The actual DB write for the ConversationMemory job row is a no-op in this PR — the ConversationMemoryWriter implementation is currently a stub and will be replaced with a real Postgres/Oracle write in a follow-up PR.

Solution

Core feature

  • Read stm_enabled and stm_condenser_model_id from the x-conversation-memory-config request header, gated by memory.runtime.enabled in server config
  • After each Responses API turn is persisted, evaluate the STMO boundary: user_turns >= 4 && (user_turns - 1) % 3 == 0 — fires at turns 4, 7, 10, 13, …
  • If triggered and stm_enabled is active, write a ConversationMemory row with memory_type = stmo and status = ready

ConversationTurnInfo correctness

ConversationTurnInfo carries user_turns and total_items. Both values must be accurate — wrong values cause STMO to fire on the wrong turn or write a job pointing at the wrong position in the conversation. The following decisions were made to guarantee correctness across all paths:

1. Compute from assembled input, not raw request
user_turns and total_items are counted after history has been fully loaded and prepended to request.input. Counting from the raw request alone gives user_turns = 1 regardless of how long the real conversation is.

2. Use raw DB item count for total_items
The conversation loaders filter stored items before building the inference window — gRPC regular keeps only messages, OpenAI drops reasoning items. Counting from the filtered window undercounts total_items. We capture stored_items.len() before any filtering and use raw_count + current_input_count for total_items so target_item_end points at the real end of the conversation.

3. Skip STMO when history load fails
If list_items returns an error, the loaders warn and continue with only the current message in request.input. Rather than compute a wrong ConversationTurnInfo, we return None so persistence skips STMO entirely for that request. The signal is conversation.is_some() && raw_stored_item_count.is_none().

4. Cap overflow detection
list_items is bounded by max_conversation_history_items. If the conversation has grown past the cap, even the raw count is wrong. We always fetch cap + 1 items — harmless since it is just a SQL LIMIT. If we get cap + 1 rows back, we reject with 400 conversation_too_large. This rejection only fires when stm_enabled && store.unwrap_or(true) — read-only calls (store=false) are never affected since persist_response_if_needed already short-circuits before any STMO enqueue.

5. Harmony path — request.conversation without history load
load_previous_messages only loads from previous_response_id. If request.conversation is set but no history is loaded for it, we return None for turn_info rather than count from partial input.

6. Skip raw-count correction when response chain is also loaded
conversation and previous_response_id are mutually exclusive by API contract but not enforced inside the loaders. If both paths ran, the chain merge overwrites modified_request.input last — count_conversation_turn_info already saw every item so applying the raw-count override would drop the chain items from total_items. Guard: only apply the correction when request.previous_response_id.is_none().

7. MCP streaming path never persists
execute_tool_loop_streaming does not call persist_response_if_needed, so STMO is never enqueued there. To prevent the cap check from rejecting those requests, ensure_mcp_connection is moved before load_conversation_history in the streaming handler and stm_enabled = false is passed to the loader for the MCP streaming path.

Changes

Section 1 — Config

  • config/types.rs — added max_conversation_history_items: usize to router config (default 100)
  • config/builder.rs — wires the value from YAML config into the struct
  • config/validation.rs — rejects 0 at startup

Section 2 — Header parsing → MemoryExecutionContext

  • routers/common/header_utils.rs — parses stm_enabled and stm_condenser_model_id from x-conversation-memory-config
  • memory/context.rsstm_enabled promoted to MemoryExecutionState (3-state enum: NotRequested / GatedOff / Active) matching the LTM pattern; stm_condenser_model_id added
  • routers/grpc/router.rs — builds a fresh MemoryExecutionContext per request from headers
  • routers/grpc/common/responses/context.rs — adds PersistenceHandles struct grouping the four storage backends; adds memory_execution_context to ResponsesContext
  • routers/openai/context.rs — same wiring for the OpenAI Responses path

Section 3 — History loading with correctness guards

  • routers/openai/responses/history.rsload_input_history: cap overflow detection, skip-on-failure guard, raw-count correction, chain-overlap guard, store=false gate
  • routers/grpc/regular/responses/common.rsload_conversation_history: same guards as above
  • routers/grpc/regular/responses/handlers.rs — streaming handler: ensure_mcp_connection moved before history loading; passes stm_enabled && !has_mcp_tools to loader
  • routers/grpc/regular/responses/streaming.rs — removed unused _conversation_turn_info parameter from execute_tool_loop_streaming
  • routers/grpc/harmony/responses/common.rsload_previous_messages: early-return guard on request.conversation.is_none()

Section 4 — ConversationTurnInfo computation

  • routers/common/persistence_utils.rscount_conversation_turn_info counts user_turns (role = "user") and total_items from the fully assembled input

Section 5 — Execute and persist (plumbing)

  • routers/grpc/common/responses/handlers.rs — gRPC persist call site
  • routers/grpc/common/responses/mod.rs — export update
  • routers/grpc/common/responses/utils.rs — persist helper utilities
  • routers/grpc/harmony/responses/non_streaming.rs — passes conversation_turn_info to persist
  • routers/grpc/harmony/responses/streaming.rs — same for streaming path
  • routers/grpc/regular/responses/non_streaming.rs — same for regular non-streaming path
  • routers/openai/responses/non_streaming.rs — OpenAI persist call site
  • routers/openai/responses/streaming.rs — same for streaming path
  • routers/openai/responses/route.rs — OpenAI entry point

Section 6 — STMO boundary check and job write

  • routers/common/persistence_utils.rshandle_stmo_after_persist: evaluates boundary, writes ConversationMemory row with memory_type = stmo, status = ready, and target_item_end from total_items

Test Plan

Unit tests (persistence_utils.rs):

  • stmo_fires_at_boundary_turns — verifies trigger fires at turns 4, 7, 10 and not between
  • count_user_turns_case_insensitive — verifies role matching is case-insensitive
  • total_items_exceeds_user_turns — verifies target_item_end in the STMO payload accounts for non-user items
Checklist
  • cargo +nightly fmt passes
  • cargo clippy --all-targets --all-features -- -D warnings passes
  • (Optional) Documentation updated
  • (Optional) Please join us on Slack #sig-smg to discuss, review, and merge PRs

Summary by CodeRabbit

  • New Features

    • Configurable per-request conversation history limit with a sensible default.
    • Short-term memory (STM) controls per request, including optional condenser selection and gated runtime behavior.
    • Automatic STM condensation scheduling that enqueues condensation jobs at conversation boundaries.
    • Conversation turn tracking propagated through request handling to improve memory indexing and persistence.
  • Bug Fixes / Validation

    • Configuration validation now rejects non-positive conversation history limits.

@github-actions github-actions Bot added grpc gRPC client and router changes model-gateway Model gateway crate changes openai OpenAI router changes labels Apr 27, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds per-request short-term memory (STM) gating, STM condenser handling, a configurable conversation-history cap, STMO scheduling/enqueueing during persistence, consolidation of persistence handles, and threads memory execution state plus conversation-turn metadata through routing, history loading, and persistence flows.

Changes

Cohort / File(s) Summary
Configuration & Validation
model_gateway/src/config/builder.rs, model_gateway/src/config/types.rs, model_gateway/src/config/validation.rs
Adds max_conversation_history_items: usize (default 100) to RouterConfig, a fluent builder setter, and validation rejecting 0.
Memory Context & Header Parsing
model_gateway/src/memory/context.rs, model_gateway/src/routers/common/header_utils.rs
Extends MemoryExecutionContext and MemoryHeaderView with STM fields (stm_enabled, stm_condenser_model_id) and derives STM gating from headers + runtime config.
Persistence: STMO Scheduling & API
model_gateway/src/routers/common/persistence_utils.rs
Adds STMO scheduling predicate (user-turn counting, periodicity), ConversationTurnInfo and count_conversation_turn_info; updates persist_conversation_items API to accept a ConversationMemoryWriter, MemoryExecutionContext, and optional ConversationTurnInfo; enqueue errors are logged and ignored.
Persistence Handles Consolidation
model_gateway/src/routers/grpc/common/responses/context.rs, model_gateway/src/routers/grpc/common/responses/mod.rs
Introduces PersistenceHandles bundling storage/writer handles; ResponsesContext carries persistence, max_conversation_history_items, and memory_execution_context; re-exports PersistenceHandles.
Handlers & Utils Wiring
model_gateway/src/routers/grpc/common/responses/handlers.rs, model_gateway/src/routers/grpc/common/responses/utils.rs
Rewires handlers/utils to use &PersistenceHandles and thread MemoryExecutionContext / ConversationTurnInfo into persistence calls.
History Loaders & Turn Info
model_gateway/src/routers/grpc/harmony/.../common.rs, model_gateway/src/routers/grpc/regular/.../common.rs, model_gateway/src/routers/openai/responses/history.rs
Loaders accept stm_enabled and return wrapper structs (LoadedRequest/LoadedInputHistory) including optional ConversationTurnInfo; history loading uses max_conversation_history_items and may fetch +1 item when STM enabled to detect overflow.
Streaming & Non-streaming Routes
model_gateway/src/routers/grpc/.../streaming.rs, model_gateway/src/routers/grpc/.../non_streaming.rs, model_gateway/src/routers/openai/responses/{streaming,non_streaming}.rs
Threads conversation_turn_info and memory_execution_context through streaming/non-streaming flows into final persistence; replaces cloned individual storage handles with PersistenceHandles.
gRPC Router Core
model_gateway/src/routers/grpc/router.rs
Builds per-request MemoryExecutionContext from headers, stores max_conversation_history_items and memory_runtime_config, and constructs request-scoped ResponsesContext with persistence and memory state rather than reusing prebuilt contexts.
OpenAI Context Types & Routing
model_gateway/src/routers/openai/context.rs, model_gateway/src/routers/openai/responses/route.rs
Adds conversation_turn_info to payload/streaming contexts, computes stm_enabled once per request, and passes it into history loading and memory injection logic.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Router as GrpcRouter
    participant Context as ResponsesContext
    participant Persistence as PersistenceHandles
    participant Queue as ConversationMemoryQueue

    Client->>Router: request + headers
    Router->>Context: build per-request MemoryExecutionContext (stm_enabled)
    Context->>Context: load history (limit = max_conversation_history_items [+1 if STM])
    Context->>Persistence: persist_response_if_needed(..., memory_execution_context, conversation_turn_info)
    Persistence->>Persistence: persist conversation items
    alt STM enabled & STMO boundary met
        Persistence->>Queue: enqueue NewConversationMemory { Stmo, Ready, condenser_model? }
        Queue-->>Persistence: ack / error (errors ignored)
    end
    Persistence-->>Router: persist result
    Router-->>Client: response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

tests

Suggested reviewers

  • CatherineSue
  • key4ng
  • slin1237

"I hop through headers, nibble on state,
Counting user turns to schedule what's late.
Condense short-term, queue memories right,
Bundled handles tidy, contexts set light.
A rabbit's patch — soft, quick, and bright." 🐇

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(memory): add STM condensation scheduling (STMO)' accurately describes the main feature added: STM condensation scheduling. It is specific, concise, and directly reflects the primary objective of the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 27, 2026

Hi @spalimpaaces-star, this PR has merge conflicts that must be resolved before it can be merged. Please rebase your branch:

git fetch origin main
git rebase origin/main
# resolve any conflicts, then:
git push --force-with-lease

@mergify mergify Bot added the needs-rebase PR has merge conflicts that need to be resolved label Apr 27, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces Short-Term Memory (STM) condensation (STMO) functionality, enabling the gateway to trigger background condensation jobs at specific conversation boundaries (turns 4, 7, 10, etc.). It adds a configurable max_conversation_history_items setting, refactors the persistence layer to use a bundled PersistenceHandles struct, and propagates a per-request MemoryExecutionContext through the gRPC and OpenAI router pipelines. Feedback identifies critical logic issues in the STMO trigger implementation: the turn and item counts are currently calculated relative to the current request's input rather than the absolute conversation state. This will prevent STMO from firing correctly for managed conversations and provide incorrect indices to the condensation worker.

Comment thread model_gateway/src/routers/common/persistence_utils.rs Outdated
Comment thread model_gateway/src/routers/common/persistence_utils.rs
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 859aa814b9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread model_gateway/src/routers/common/persistence_utils.rs Outdated
spalimpaaces-star added 3 commits April 27, 2026 11:15
Replace hardcoded MAX_CONVERSATION_HISTORY_ITEMS=100 with a configurable
max_conversation_history_items field in RouterConfig. Defaults to 100 so
existing deployments are unaffected. Validated at startup to reject zero.
Threaded through GrpcRouter, ResponsesContext, and OpenAI history path.

Signed-off-by: spalimpaaces-star <saikiran.palimpati@oracle.com>
… after persistence

- Add stm_enabled + stm_condenser_model_id to MemoryHeaderView and
  MemoryExecutionContext; gate stm_enabled by runtime.enabled so STM
  and LTM honour the same runtime memory switch
- Add memory_execution_context and max_conversation_history_items to
  ResponsesContext; build a fresh per-request context from headers in
  route_responses_impl for both Harmony and regular gRPC paths
- Store memory_runtime_config on GrpcRouter for per-request header
  parsing without re-reading router config on every call
- Thread conversation_memory_writer + memory_execution_context through
  persist_response_if_needed and persist_conversation_items across all
  gRPC and OpenAI response paths (streaming + non-streaming)
- After link_items_to_conversation succeeds, count user_turns from
  in-memory input_items (role eq_ignore_ascii_case "user") and
  total_items = input + output; enqueue STMO when
  user_turns >= 4 && (user_turns - 1) % 3 == 0
- Enqueue runs best-effort in a tokio::spawn with tracing span
  propagation so it never blocks or fails the response path

Signed-off-by: spalimpaaces-star <saikiran.palimpati@oracle.com>
- stmo_fires_at_boundary_turns: verifies trigger at 4, 7, 10 and
  absence between boundaries
- count_user_turns_case_insensitive: role matching is case-insensitive
  and items without a role field are excluded
- total_items_exceeds_user_turns: asserts target_item_end reflects all
  items (input + output), not just user turns

Signed-off-by: spalimpaaces-star <saikiran.palimpati@oracle.com>
@spalimpaaces-star spalimpaaces-star marked this pull request as draft April 27, 2026 18:16
Bundle the four storage backends (response, conversation, conversation_item,
conversation_memory_writer) into a PersistenceHandles struct on ResponsesContext,
mirroring the pattern from PR lightseekorg#1357 so both code paths stay consistent and
future merges remain clean.

- Add PersistenceHandles (Clone) to grpc/common/responses/context.rs
- Replace four individual fields on ResponsesContext with persistence: PersistenceHandles
- Reduce ResponsesContext::new() from 10 to 7 args (drops too_many_arguments suppress)
- Update persist_response_if_needed to take (&PersistenceHandles, MemoryExecutionContext)
  instead of four individual storage handles — drops from 8 to 5 args
- Update all gRPC call sites: harmony/non_streaming, harmony/streaming (x2),
  regular/non_streaming, regular/streaming
- Update direct storage field accesses in handlers.rs, harmony/common.rs,
  regular/common.rs, and router.rs

Signed-off-by: spalimpaaces-star <saikiran.palimpati@oracle.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@model_gateway/src/routers/common/persistence_utils.rs`:
- Around line 385-394: Replace the request-local counting logic (the
count_user_turns helper and any uses that compute target_item_end from
input_items/output_items) with a conversation-level count obtained from
ConversationItemStorage::count_items_and_user_turns(...) after
link_items_to_conversation completes; specifically, call the storage method to
get total items and user_turns and derive target_item_end from those
conversation-level counts, and preserve the current best-effort skip/fallback
behavior when the storage reports STMO as unsupported (handle the Result/Option
the storage returns and only enqueue when the storage indicates support). Update
both the place where count_user_turns is used and the other similar block
(currently computing target_item_end) to use the storage API and remove reliance
on request-local input/output slices.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9c30252d-478e-4b58-b821-82944944a522

📥 Commits

Reviewing files that changed from the base of the PR and between a99af9c and 859aa81.

📒 Files selected for processing (20)
  • model_gateway/src/config/builder.rs
  • model_gateway/src/config/types.rs
  • model_gateway/src/config/validation.rs
  • model_gateway/src/memory/context.rs
  • model_gateway/src/routers/common/header_utils.rs
  • model_gateway/src/routers/common/persistence_utils.rs
  • model_gateway/src/routers/grpc/common/responses/context.rs
  • model_gateway/src/routers/grpc/common/responses/handlers.rs
  • model_gateway/src/routers/grpc/common/responses/mod.rs
  • model_gateway/src/routers/grpc/common/responses/utils.rs
  • model_gateway/src/routers/grpc/harmony/responses/common.rs
  • model_gateway/src/routers/grpc/harmony/responses/non_streaming.rs
  • model_gateway/src/routers/grpc/harmony/responses/streaming.rs
  • model_gateway/src/routers/grpc/regular/responses/common.rs
  • model_gateway/src/routers/grpc/regular/responses/non_streaming.rs
  • model_gateway/src/routers/grpc/regular/responses/streaming.rs
  • model_gateway/src/routers/grpc/router.rs
  • model_gateway/src/routers/openai/responses/history.rs
  • model_gateway/src/routers/openai/responses/non_streaming.rs
  • model_gateway/src/routers/openai/responses/streaming.rs

Comment thread model_gateway/src/routers/common/persistence_utils.rs Outdated
The DB write for the STMO job is lightweight (single INSERT) and
should be awaited inline rather than spawned fire-and-forget.
Removes the tokio::spawn + .instrument() wrapper and calls
handle_stmo_after_persist directly within persist_conversation_items_inner.

Signed-off-by: spalimpaaces-star <saikiran.palimpati@oracle.com>
@mergify mergify Bot removed the needs-rebase PR has merge conflicts that need to be resolved label Apr 27, 2026
spalimpaaces-star added 4 commits April 27, 2026 13:58
Introduce ConversationTurnInfo { user_turns, total_items } computed
from the fully-assembled input (history + new turn) at load time, and
thread it through every persistence call site across OpenAI and gRPC
paths (harmony + regular, streaming + non-streaming).

Previously, handle_stmo_after_persist derived user_turns and
target_item_end from original_body.input, which for managed
conversations only contains the latest user message. This caused
should_enqueue_stmo to always see user_turns=1, so the 4/7/10
boundaries were never reached and STMO jobs were never enqueued.

count_conversation_turn_info is now called on modified_request.input
after load_conversation_history / load_previous_messages /
load_input_history assembles the full conversation, giving both
values their correct absolute position within the conversation.

Signed-off-by: spalimpaaces-star <saikiran.palimpati@oracle.com>
Mirrors the pattern introduced in PR lightseekorg#1357 — clones the context while
swapping in a request-scoped StorageRequestContext, removing the need
for duplicated constructor wiring at call sites that attach per-request
storage hook context.

Signed-off-by: spalimpaaces-star <saikiran.palimpati@oracle.com>
…context

- Remove with_request_context() from ResponsesContext — no current call
  site exists since per-request builds also swap memory_execution_context;
  will be re-added after PR lightseekorg#1357 rebase if still needed
- Add #[expect(clippy::too_many_arguments)] to persist_conversation_items_inner
  and execute_mcp_tool_loop_streaming to satisfy -D warnings

Signed-off-by: spalimpaaces-star <saikiran.palimpati@oracle.com>
Replace stm_enabled: bool in MemoryExecutionContext with
MemoryExecutionState, matching the pattern used for store_ltm and recall.

This lets call sites distinguish three states:
  NotRequested — client did not send the STM flag
  GatedOff     — client requested STM but runtime.enabled = false
  Active        — client requested STM and runtime allows it

Previously the bool collapsed NotRequested and GatedOff to false,
making it impossible to observe when STM was silently disabled at
the server level. All call sites updated to use .active().

Signed-off-by: spalimpaaces-star <saikiran.palimpati@oracle.com>
@spalimpaaces-star spalimpaaces-star marked this pull request as ready for review April 27, 2026 22:52
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 79aca7fb8f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread model_gateway/src/routers/openai/responses/history.rs Outdated
Comment thread model_gateway/src/routers/grpc/regular/responses/common.rs Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
model_gateway/src/routers/grpc/regular/responses/streaming.rs (1)

430-437: ⚠️ Potential issue | 🟠 Major

conversation_turn_info is dropped in MCP streaming, blocking STMO scheduling on that path.

At Line 436 the parameter is renamed to _conversation_turn_info and then not forwarded at Line 451-Line 458. This means MCP streaming requests cannot carry turn metadata into persistence/STMO logic, even though callers now compute and pass it.

Suggested direction
 pub(super) fn execute_tool_loop_streaming(
     ctx: &ResponsesContext,
     current_request: ResponsesRequest,
     original_request: &ResponsesRequest,
     params: ResponsesCallContext,
     mcp_servers: Vec<McpServerBinding>,
-    _conversation_turn_info: Option<ConversationTurnInfo>,
+    conversation_turn_info: Option<ConversationTurnInfo>,
 ) -> Response {
@@
         let result = execute_tool_loop_streaming_internal(
             &ctx_clone,
             current_request,
             &original_request_clone,
             params,
             mcp_servers,
+            conversation_turn_info,
             tx.clone(),
         )
         .await;

Then thread conversation_turn_info into the internal MCP completion/persistence point the same way non-MCP streaming already does.

Also applies to: 451-458

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/grpc/regular/responses/streaming.rs` around lines
430 - 437, The parameter conversation_turn_info is being dropped in
execute_tool_loop_streaming (renamed to _conversation_turn_info) and not
forwarded into the MCP streaming completion/persistence path; restore and thread
the ConversationTurnInfo through the MCP-specific completion/persistence call
the same way non-MCP streaming does by removing the underscore from the
parameter name and passing that variable into the internal MCP
completion/persistence invocation (the function/method where MCP responses are
finalized/persisted) so STMO scheduling and persistence receive the turn
metadata.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@model_gateway/src/routers/grpc/regular/responses/common.rs`:
- Around line 372-381: The current turn_info is computed from
modified_request.input which only contains rehydrated "message" rows and misses
function_call, function_call_output, reasoning, and MCP items; update the logic
so count_conversation_turn_info is run against the fully assembled ResponseInput
(i.e., after the conversation history/replay has been loaded and merged) or
alternatively supply a separate total_items value that includes the skipped
rows; specifically adjust the flow around LoadedRequest construction
(modified_request and turn_info) so ConversationTurnInfo { user_turns,
total_items } comes from the complete replayed input rather than the partial
modified_request.input to avoid undersized target_item_end in STMO.

In `@model_gateway/src/routers/openai/responses/history.rs`:
- Around line 240-244: The STMO turn counts are being computed from the filtered
request_body.input (which omits persisted reasoning/tool rows), so call
count_conversation_turn_info on the fully assembled replay/response input after
conversation history is loaded (not on request_body.input), or alternatively
build ConversationTurnInfo using the absolute persisted total_items (including
skipped rows) plus user_turns from the assembled input; update the stm_enabled
branch that currently computes conversation_turn_info to use that assembled
ResponseInput (or the persisted total) so ConversationTurnInfo.total_items
reflects the true conversation size.

---

Outside diff comments:
In `@model_gateway/src/routers/grpc/regular/responses/streaming.rs`:
- Around line 430-437: The parameter conversation_turn_info is being dropped in
execute_tool_loop_streaming (renamed to _conversation_turn_info) and not
forwarded into the MCP streaming completion/persistence path; restore and thread
the ConversationTurnInfo through the MCP-specific completion/persistence call
the same way non-MCP streaming does by removing the underscore from the
parameter name and passing that variable into the internal MCP
completion/persistence invocation (the function/method where MCP responses are
finalized/persisted) so STMO scheduling and persistence receive the turn
metadata.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2078e788-90fb-4fbe-b70a-59ea9f8da01d

📥 Commits

Reviewing files that changed from the base of the PR and between 859aa81 and 79aca7f.

📒 Files selected for processing (23)
  • model_gateway/src/config/builder.rs
  • model_gateway/src/config/types.rs
  • model_gateway/src/config/validation.rs
  • model_gateway/src/memory/context.rs
  • model_gateway/src/routers/common/header_utils.rs
  • model_gateway/src/routers/common/persistence_utils.rs
  • model_gateway/src/routers/grpc/common/responses/context.rs
  • model_gateway/src/routers/grpc/common/responses/handlers.rs
  • model_gateway/src/routers/grpc/common/responses/mod.rs
  • model_gateway/src/routers/grpc/common/responses/utils.rs
  • model_gateway/src/routers/grpc/harmony/responses/common.rs
  • model_gateway/src/routers/grpc/harmony/responses/non_streaming.rs
  • model_gateway/src/routers/grpc/harmony/responses/streaming.rs
  • model_gateway/src/routers/grpc/regular/responses/common.rs
  • model_gateway/src/routers/grpc/regular/responses/handlers.rs
  • model_gateway/src/routers/grpc/regular/responses/non_streaming.rs
  • model_gateway/src/routers/grpc/regular/responses/streaming.rs
  • model_gateway/src/routers/grpc/router.rs
  • model_gateway/src/routers/openai/context.rs
  • model_gateway/src/routers/openai/responses/history.rs
  • model_gateway/src/routers/openai/responses/non_streaming.rs
  • model_gateway/src/routers/openai/responses/route.rs
  • model_gateway/src/routers/openai/responses/streaming.rs

Comment thread model_gateway/src/routers/grpc/regular/responses/common.rs
Comment thread model_gateway/src/routers/openai/responses/history.rs
…on path

The gRPC and OpenAI conversation loaders filter stored items before
assembling the inference window (gRPC keeps only messages; OpenAI
also keeps function_call/function_call_output but drops reasoning).
count_conversation_turn_info on the filtered input undercounts
total_items for any conversation with prior tool or reasoning turns,
causing target_item_end to point at the wrong range.

Fix: capture stored_items.len() from list_items before filtering and
use raw_count + current_input_count as total_items, giving the correct
absolute position in the conversation regardless of which item types
the inference window excludes.

Signed-off-by: spalimpaaces-star <saikiran.palimpati@oracle.com>
When stm_enabled is active, request max_conversation_history_items + 1
from list_items. If cap + 1 items are returned the conversation has grown
past the configured ceiling, which would corrupt STMO turn-count math.
Both the gRPC regular and OpenAI Responses paths now return a 400
conversation_too_large error in that case, with no overhead for
deployments not using STMO.

Signed-off-by: spalimpaaces-star <saikiran.palimpati@oracle.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
model_gateway/src/routers/openai/responses/history.rs (1)

259-261: ⚠️ Potential issue | 🟠 Major

Skip STMO scheduling when conversation history falls back.

If list_items fails here, the request intentionally proceeds without loaded history, but the stm_enabled block still builds ConversationTurnInfo from the unhydrated request_body.input. That undercounts user_turns and total_items, so persistence can enqueue STMO on the wrong boundary or with the wrong target_item_end after a transient history-store failure. Return None for conversation_turn_info when a conversation-backed history load failed, and let this request skip STMO instead.

Suggested fix
-    let conversation_turn_info = if stm_enabled {
+    let conversation_history_loaded = conversation.is_none() || raw_stored_item_count.is_some();
+
+    let conversation_turn_info = if stm_enabled && conversation_history_loaded {
         let mut info = count_conversation_turn_info(&request_body.input);
         // Correct total_items when loaded from conversation storage: reasoning
         // items are skipped during load but are stored in the DB, so the
         // assembled input underestimates the true conversation size. Use the
         // raw DB count + current input item count for an accurate target_item_end.
         if let Some(raw_count) = raw_stored_item_count {
             info.total_items = raw_count + current_input_count;
         }
         Some(info)
     } else {
         None
     };

Based on learnings: load_input_history intentionally uses graceful degradation when storage operations fail, and count_conversation_turn_info(input: &ResponseInput) is designed for callers that pass the fully assembled absolute conversation input.

Also applies to: 275-287

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/openai/responses/history.rs` around lines 259 -
261, The error branch that just warns on failed history load should also null
out the computed ConversationTurnInfo so STMO logic doesn't use unhydrated
request input; in the Err(e) arm where you log "Failed to load conversation
history", set conversation_turn_info = None (or otherwise return None from the
load_input_history caller) instead of leaving a stale value, and ensure the
stm_enabled block checks that conversation_turn_info is Some before calling
count_conversation_turn_info or using user_turns/total_items; reference
load_input_history, conversation_turn_info, ConversationTurnInfo,
count_conversation_turn_info, and stm_enabled when making this change so callers
skip STMO when history load falls back.
model_gateway/src/routers/grpc/regular/responses/common.rs (1)

358-360: ⚠️ Potential issue | 🟡 Minor

Skip STMO turn-info generation when conversation history load degrades.

When list_items fails (Line 358), the code still computes turn_info from modified_request.input (Line 398), which may only contain request-local items. That can produce non-absolute counts for STMO scheduling in degraded mode.

🔧 Proposed fix
 pub(super) async fn load_conversation_history(
     ctx: &ResponsesContext,
     request: &ResponsesRequest,
     stm_enabled: bool,
 ) -> Result<LoadedRequest, Response> {
     let mut modified_request = request.clone();
     let mut conversation_items: Option<Vec<ResponseInputOutputItem>> = None;
+    let mut conversation_history_loaded = false;
     // Tracks the raw DB item count (all types) for the conversation path so
     // that total_items in ConversationTurnInfo is not undercounted when
     // function_call/function_call_output/MCP items are present in storage
     // but filtered out of the inference window.
     let mut raw_stored_item_count: Option<usize> = None;
@@
         match ctx
             .persistence
             .conversation_item_storage
             .list_items(&conv_id, params)
             .await
         {
             Ok(stored_items) => {
+                conversation_history_loaded = true;
                 if stm_enabled && stored_items.len() > cap {
                     return Err(error::bad_request(
                         "conversation_too_large",
@@
     let turn_info = if stm_enabled {
+        if request.conversation.is_some() && !conversation_history_loaded {
+            None
+        } else {
         let mut info = count_conversation_turn_info(&modified_request.input);
         // If we loaded from conversation storage, total_items from the
         // assembled (message-only) input undercounts — function_call,
         // function_call_output, and MCP items are in the DB but filtered
         // out of the inference window. Use the raw DB count instead so
         // target_item_end points at the correct absolute position.
         if let Some(raw_count) = raw_stored_item_count {
             let current_input_count = match &request.input {
                 ResponseInput::Text(_) => 1,
                 ResponseInput::Items(items) => items.len(),
             };
             info.total_items = raw_count + current_input_count;
         }
         Some(info)
+        }
     } else {
         None
     };

Based on learnings: count_conversation_turn_info(input: &ResponseInput) / ConversationTurnInfo are intended to run on a fully assembled ResponseInput so counts stay absolute.

Also applies to: 398-413

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/grpc/regular/responses/common.rs` around lines 358
- 360, When list_items fails while loading conversation history (the Err(e)
branch), avoid computing turn_info from the partially assembled
modified_request.input; instead record that history load degraded and skip/leave
ConversationTurnInfo unset. Specifically, in the block handling list_items
errors, set a flag or option (e.g., history_load_ok = false or turn_info = None)
and only call count_conversation_turn_info(&modified_request.input) and populate
ConversationTurnInfo when history_load_ok is true (i.e., when list_items
succeeded). Ensure any downstream uses expect an absent/None turn_info when
history load degraded so STMO scheduling uses only absolute counts computed from
full history.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@model_gateway/src/routers/grpc/regular/responses/common.rs`:
- Around line 358-360: When list_items fails while loading conversation history
(the Err(e) branch), avoid computing turn_info from the partially assembled
modified_request.input; instead record that history load degraded and skip/leave
ConversationTurnInfo unset. Specifically, in the block handling list_items
errors, set a flag or option (e.g., history_load_ok = false or turn_info = None)
and only call count_conversation_turn_info(&modified_request.input) and populate
ConversationTurnInfo when history_load_ok is true (i.e., when list_items
succeeded). Ensure any downstream uses expect an absent/None turn_info when
history load degraded so STMO scheduling uses only absolute counts computed from
full history.

In `@model_gateway/src/routers/openai/responses/history.rs`:
- Around line 259-261: The error branch that just warns on failed history load
should also null out the computed ConversationTurnInfo so STMO logic doesn't use
unhydrated request input; in the Err(e) arm where you log "Failed to load
conversation history", set conversation_turn_info = None (or otherwise return
None from the load_input_history caller) instead of leaving a stale value, and
ensure the stm_enabled block checks that conversation_turn_info is Some before
calling count_conversation_turn_info or using user_turns/total_items; reference
load_input_history, conversation_turn_info, ConversationTurnInfo,
count_conversation_turn_info, and stm_enabled when making this change so callers
skip STMO when history load falls back.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d656d7c9-bfce-47ac-90e5-22c163660165

📥 Commits

Reviewing files that changed from the base of the PR and between 79aca7f and 09c5213.

📒 Files selected for processing (2)
  • model_gateway/src/routers/grpc/regular/responses/common.rs
  • model_gateway/src/routers/openai/responses/history.rs

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 09c5213c1c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread model_gateway/src/routers/openai/responses/history.rs Outdated
Comment thread model_gateway/src/routers/grpc/regular/responses/common.rs Outdated
When list_items returns an error the assembled input only contains the
current request, so turn counts would be wrong. Guard both the OpenAI
and gRPC regular conversation paths: if a conversation was requested but
raw_stored_item_count is still None after the match, return None for
conversation_turn_info so persistence skips STMO entirely instead of
enqueueing a job with a drifted last_index/target_item_end.

Signed-off-by: spalimpaaces-star <saikiran.palimpati@oracle.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@model_gateway/src/routers/grpc/regular/responses/common.rs`:
- Around line 398-423: The override of info.total_items using
raw_stored_item_count + current_input_count can drop replayed response-chain
items that are already present in modified_request.input (when
previous_response_id was loaded); change the logic in the turn_info block so you
only replace info.total_items with raw_count + current_input_count when
request.previous_response_id is None (i.e., no replayed response-chain was
added), otherwise preserve the total_items computed by
count_conversation_turn_info(&modified_request.input); refer to turn_info,
stm_enabled, request, raw_stored_item_count, modified_request.input,
count_conversation_turn_info, and total_items to locate and update the
conditional.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d9b30e52-4828-4f0d-94f9-0708a4cc19c0

📥 Commits

Reviewing files that changed from the base of the PR and between 09c5213 and f1dec4d.

📒 Files selected for processing (2)
  • model_gateway/src/routers/grpc/regular/responses/common.rs
  • model_gateway/src/routers/openai/responses/history.rs

Comment thread model_gateway/src/routers/grpc/regular/responses/common.rs
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f1dec4da8d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread model_gateway/src/routers/grpc/harmony/responses/common.rs Outdated
…loaded

When previous_response_id is absent and request.conversation is set,
load_previous_messages returns early without loading conversation history.
Computing turn_info from request.input alone in that case gives wrong
user_turns and total_items. Guard the early-return branch so turn_info
is None whenever conversation is set but history was not loaded.

Signed-off-by: spalimpaaces-star <saikiran.palimpati@oracle.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ea0e87ef00

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread model_gateway/src/routers/openai/responses/history.rs Outdated
If a request somehow carries both previous_response_id and conversation,
the chain-item merge runs last and overwrites modified_request.input with
the full replayed history. Applying raw_stored_item_count + current_input
on top of that would drop the chain items from total_items. Guard the
correction with request.previous_response_id.is_none() in both the gRPC
regular and OpenAI paths. The two fields are mutually exclusive per API
contract, but the guard makes the function correct regardless.

Signed-off-by: spalimpaaces-star <saikiran.palimpati@oracle.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7925e8f69a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread model_gateway/src/routers/grpc/regular/responses/common.rs Outdated
The cap overflow check and the extra DB fetch (cap+1) were gated only
on stm_enabled, so requests with store=false could fail with 400 even
though persist_response_if_needed short-circuits before any STMO enqueue.
Add request.store.unwrap_or(true) to both the fetch_limit calculation and
the rejection condition in the gRPC regular and OpenAI history loaders so
read-only calls are never affected by the STMO conversation size limit.

Signed-off-by: spalimpaaces-star <saikiran.palimpati@oracle.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9e7d765252

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread model_gateway/src/routers/grpc/regular/responses/common.rs Outdated
spalimpaaces-star added 2 commits April 28, 2026 01:09
The MCP streaming tool loop (execute_tool_loop_streaming) never calls
persist_response_if_needed so STMO is never enqueued there. Applying the
conversation_too_large cap check in that path rejects valid requests for
no reason. Fix by moving ensure_mcp_connection before load_conversation_history
in the streaming handler and passing stm_enabled && !has_mcp_tools to the
loader, so the cap check and rejection are skipped for the MCP streaming path.

Signed-off-by: spalimpaaces-star <saikiran.palimpati@oracle.com>
Fetching cap+1 items from list_items is harmless — it is just a SQL LIMIT.
Remove the conditional fetch_limit calculation and always pass
cap.saturating_add(1) directly. The rejection check already carries the
correct stm_enabled && store=true gate, so no behaviour changes.

Signed-off-by: spalimpaaces-star <saikiran.palimpati@oracle.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5410bb19c6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread model_gateway/src/routers/openai/responses/history.rs Outdated
Comment thread model_gateway/src/routers/grpc/regular/responses/common.rs Outdated
spalimpaaces-star added 2 commits April 28, 2026 02:23
…p_streaming

The MCP streaming path never calls persist_response_if_needed so
conversation_turn_info was accepted but immediately discarded (_prefix).
Remove the parameter from execute_tool_loop_streaming and its call site
in the streaming handler.

Signed-off-by: spalimpaaces-star <saikiran.palimpati@oracle.com>
Fetching cap+1 and checking > cap is equivalent to fetching cap and
checking >= cap, but the latter is simpler: the SQL LIMIT already
bounds the inference window so no post-fetch truncation is needed.

Applied to both load_input_history (OpenAI) and
load_conversation_history (gRPC regular).

Signed-off-by: spalimpaaces-star <saikiran.palimpati@oracle.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ff8c4c25d9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread model_gateway/src/routers/openai/responses/history.rs
Comment thread model_gateway/src/routers/grpc/regular/responses/common.rs
@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had any activity within 14 days. It will be automatically closed if no further activity occurs within 16 days. Leave a comment if you feel this pull request should remain open. Thank you!

@github-actions github-actions Bot added the stale PR has been inactive for 14+ days label May 14, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 14, 2026

Hi @spalimpaaces-star, this PR has merge conflicts that must be resolved before it can be merged. Please rebase your branch:

git fetch origin main
git rebase origin/main
# resolve any conflicts, then:
git push --force-with-lease

@mergify mergify Bot added the needs-rebase PR has merge conflicts that need to be resolved label May 14, 2026
@github-actions github-actions Bot removed the stale PR has been inactive for 14+ days label May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

grpc gRPC client and router changes model-gateway Model gateway crate changes needs-rebase PR has merge conflicts that need to be resolved openai OpenAI router changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant