Skip to content

fix(task-store): retry on transient transport errors instead of dropping prompt#2090

Open
yashrajshuklaaa wants to merge 1 commit into
kagent-dev:mainfrom
yashrajshuklaaa:fix/task-store-transport-retry
Open

fix(task-store): retry on transient transport errors instead of dropping prompt#2090
yashrajshuklaaa wants to merge 1 commit into
kagent-dev:mainfrom
yashrajshuklaaa:fix/task-store-transport-retry

Conversation

@yashrajshuklaaa

Copy link
Copy Markdown

When the agent - - > controller HTTP hop raises httpx.TransportError ( idle keep-alive connection reset by Istio/HBONE mesh , controller pod reschedule , etc ) the error previously propagated uncaught out of KAgentTaskStore.get/save silently dropping the user prompt with no error surfaced and no recovery short of a pod restart

Fix :

introduce _request_with_retry( ) in KAgentTaskStore that catches TransportError calls aclose( ) to flush the stale connection pool and retries once on a fresh connection. Non-transport HTTP errors (4xx/5xx) are re-raised immediately without retrying. If the transport error persists after all retries it is re-raised so the caller sees a real error rather than a silent drop
fix lives entirely in kagent-core/_task_store.py and covers all
four framework adapters (langgraph, adk, openai, crewai) automatically
since they all share KAgentTaskStore

Fixes #2086

Copilot AI review requested due to automatic review settings June 25, 2026 18:17
@github-actions github-actions Bot added the bug Something isn't working label Jun 25, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds retry handling in the shared KAgentTaskStore HTTP layer to prevent BYO agents from silently dropping prompts when the agent→controller hop encounters transient httpx.TransportError conditions (e.g., stale keep-alive connections reset by the mesh).

Changes:

  • Introduce _request_with_retry() in KAgentTaskStore to retry once on httpx.TransportError.
  • Route save/get/delete through the new retry helper and document the new error behavior.
  • Add logging for transport retry attempts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread python/packages/kagent-core/src/kagent/core/a2a/_task_store.py Outdated
…ing prompt

Fixes kagent-dev#2086

Signed-off-by: Yashraj Shukla <shuklayashraj68@gmail.com>
@yashrajshuklaaa yashrajshuklaaa force-pushed the fix/task-store-transport-retry branch from 0175791 to 084d78b Compare June 25, 2026 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BYO agent silently drops an incoming prompt when the agent→controller /api/tasks call fails (transient transport error)

2 participants