Skip to content

feat: migrate from Python Claude Agent SDK to Vercel AI SDK v6 (TypeScript)#1891

Open
AndyMik90 wants to merge 89 commits intodevelopfrom
auto-claude/237-migrate-claude-agent-sdk-python-to-vercel-ai-sdk-t
Open

feat: migrate from Python Claude Agent SDK to Vercel AI SDK v6 (TypeScript)#1891
AndyMik90 wants to merge 89 commits intodevelopfrom
auto-claude/237-migrate-claude-agent-sdk-python-to-vercel-ai-sdk-t

Conversation

@AndyMik90
Copy link
Owner

Summary

  • Complete migration of all AI agent execution from Python claude-agent-sdk subprocess calls to a native TypeScript agent layer using Vercel AI SDK v6 (ai package)
  • 190+ new TypeScript files in apps/frontend/src/main/ai/ implementing providers, tools, security, orchestration, runners, memory, and MCP integration
  • Zero Python subprocess calls remain in production IPC handlers — the backend is reduced to Graphiti memory sidecar + CLI utilities
  • Local-first memory system using libSQL (Turso) with embedding-based retrieval, replacing the external Graphiti dependency for core memory operations

What changed

Provider Layer (ai/providers/)

  • Multi-provider registry supporting 9 providers: Anthropic, OpenAI, Google, Bedrock, Azure, Mistral, Groq, xAI, Ollama
  • OAuth token detection (sk-ant-oa* / sk-ant-ort*) with automatic refresh
  • Provider-specific transforms for thinking token normalization and prompt caching

Session Runtime (ai/session/)

  • runAgentSession() using streamText() with stopWhen: stepCountIs(N) for agentic tool-use loops
  • Error classification (429 rate-limit, 401 auth, 400 bad-request) with automatic retry
  • Structured progress tracking from tool calls and text patterns

Worker Threads (ai/agent/)

  • Agent sessions run in worker_threads to avoid blocking the Electron main process
  • WorkerBridge relays postMessage() events to the existing AgentManagerEvents interface
  • SerializableSessionConfig crosses thread boundary; LanguageModel recreated in worker

Build Orchestration (ai/orchestration/)

  • Full planner → coder → QA pipeline ported from Python
  • Spec validator, conversation compactor, QA reports, pause handler
  • Parallel subagent execution via Promise.allSettled()

Tools (ai/tools/)

  • 8 builtin tools (Read, Write, Edit, Bash, Glob, Grep, WebFetch, WebSearch) with Zod schemas
  • 6 Auto Claude custom tools (record_gotcha, get_session_context, etc.)
  • Tool registry with per-agent-type tool filtering

Security (ai/security/)

  • 19 validators + secret scanner + tool input validator
  • Bash command parser and validator with identical allowlist behavior to Python
  • Path containment for filesystem boundary enforcement

Memory System (ai/memory/)

  • Local libSQL database with WAL mode, FTS5 full-text search, and vector embeddings
  • Embedding service with auto-detection (Ollama local, OpenAI cloud fallback)
  • Reranker, retrieval pipeline, and memory-aware context injection
  • 16 memory types across 6 categories (patterns, errors, decisions, insights, calibration)
  • UI: Memory browser with category filters, health metrics, search, and expandable cards

Runners (ai/runners/)

  • Insights, roadmap, ideation, commit-message, changelog runners
  • GitHub PR review engine, triage engine, batch processor, bot detector
  • GitLab MR review engine

IPC Handlers Rewired

  • pr-handlers, mr-review-handlers, autofix-handlers, triage-handlers, insights-executor all call TypeScript runners directly instead of spawning Python subprocesses

Other

  • Worktree manager ported to TypeScript
  • Project analyzer, stack detector, framework detector, command registry (400+ commands)
  • Context builder with keyword extraction, service matching, categorization
  • Semantic merge analyzer with 80+ conflict detection rules and 8 merge strategies
  • electron-vite config updated for worker thread entry points and AI SDK bundling

Verification

  • 0 TypeScript errors (tsc --noEmit)
  • 3,869 tests passing (164 test files), 0 failures
  • electron-vite build clean
  • E2E tested via Electron MCP: memory system connects, renders memories, filters work, no console errors

Test plan

  • TypeScript compilation: cd apps/frontend && npx tsc --noEmit — 0 errors
  • Unit tests: cd apps/frontend && npm test — 3,869 passed
  • Build: cd apps/frontend && npx electron-vite build — clean
  • E2E: Memory panel renders, filters, health metrics, card expansion all verified via Electron MCP screenshots
  • Full agent task execution (spec → build → QA) in desktop app
  • Cross-platform verification (Windows, Linux)

🤖 Generated with Claude Code

AndyMik90 and others added 30 commits February 19, 2026 00:44
…der packages

Added dependencies: ai@^6, @ai-sdk/anthropic, @ai-sdk/openai, @ai-sdk/google,
@ai-sdk/amazon-bedrock, @ai-sdk/azure, @ai-sdk/mistral, @ai-sdk/groq, @ai-sdk/xai,
@ai-sdk/openai-compatible, @ai-sdk/mcp, @modelcontextprotocol/sdk. Verified zod/v3
compat works with existing zod v4.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Define SupportedProvider enum, ProviderConfig, ModelResolution, and
ProviderCapabilities types. Port MODEL_ID_MAP, THINKING_BUDGET_MAP,
MODEL_BETAS_MAP, and phase config types from phase_config.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…onfig) → LanguageModel

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…iderRegistry

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Port thinking token normalization, tool ID format transforms, prompt
caching thresholds, and adaptive thinking support from phase_config.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ty/parser

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ty/hooks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… boundary

Add path-containment.ts with assertPathContained() for filesystem boundary
enforcement including symlink resolution, traversal prevention, and
cross-platform normalization. Add security-profile.ts for loading and
caching project security profiles from .auto-claude config files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…security layer

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Define ToolContext interface (cwd, projectDir, specDir, securityProfile),
ToolPermission types, ToolExecutionOptions, and ToolDefinitionConfig.
Create Tool.define() that wraps AI SDK v6 tool() with Zod v3 inputSchema
and security hooks integration (bash validator pre-execution check).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…dit, Glob)

Implements Read (line offset/limit, image base64, PDF support),
Write (content validation, mkdir -p), Edit (exact string replacement,
replace_all), and Glob (fs.globSync, mtime sort) with Zod schemas
and path-containment security integration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add the 4 remaining built-in tools following the existing Tool.define() pattern:
- Bash: command execution with bashSecurityHook() integration, timeout, background support
- Grep: ripgrep-based search with output modes, file type/glob filtering
- WebFetch: URL fetching with timeout and content truncation
- WebSearch: web search with domain allow/block list filtering

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ig registry

Port tool constants (BASE_READ_TOOLS, BASE_WRITE_TOOLS, WEB_TOOLS), MCP tool
lists, and AGENT_CONFIGS from Python models.py. Implement ToolRegistry with
registerTool(), getToolsForAgent(), and helper functions getAgentConfig(),
getDefaultThinkingLevel(), getRequiredMcpServers().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…t-configs.ts

Port all 27 agent type configurations from Python backend to TypeScript.
Includes tool lists, MCP server mappings, auto-claude tools, thinking
defaults, and helper functions (getAgentConfig, getRequiredMcpServers,
getDefaultThinkingLevel, mapMcpServerName).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lback chain

Add auth types and resolver that reuses existing claude-profile/credential-utils.ts.
Implements 4-stage fallback: profile OAuth token → profile API key → environment
variable → default provider credentials. Supports all providers with provider-specific
env var mappings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add MCP integration layer using @ai-sdk/mcp with @modelcontextprotocol/sdk
for stdio/StreamableHTTP transports. Define server configs for context7,
linear, graphiti, electron, puppeteer, auto-claude. Implement
getMcpServersForAgent() via createMcpClientsForAgent() with dynamic server
resolution and graceful fallback on connection failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…, and transforms

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…g, and tool registry

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add SessionConfig, SessionResult, StreamEvent, ProgressState types for the
agent session runtime. Add AgentClientConfig/Result and SimpleClientConfig/Result
types for the client layer. Implement createAgentClient() with full tool/MCP
setup and createSimpleClient() for utility runners with minimal tools.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add stream-handler.ts to process AI SDK v6 fullStream events (text-delta,
reasoning, tool-call, tool-result, step-finish, error) and emit structured
StreamEvents. Add error-classifier.ts ported from Python core/error_utils.py
with classification for rate limit (429), auth failure (401), concurrency
(400), tool execution, and abort errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tion from tool calls + text patterns

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ssion().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add 78 tests across 4 test files covering:
- stream-handler: text-delta, reasoning, tool-call/result, step-finish, error, multi-step conversations
- error-classifier: 429/401/400 detection, abort errors, classification priority, sanitization
- progress-tracker: phase detection from tools/text, regression prevention, terminal locking
- runner: completion, max_steps, auth retry, cancellation, event forwarding, tool tracking

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…orker bridge

Add the worker thread infrastructure for running AI agent sessions off the
main Electron thread:

- executor.ts: AgentExecutor class wrapping WorkerBridge with start/stop/retry
- worker.ts: Worker thread entry point receiving config via workerData,
  running runAgentSession(), posting structured messages back via parentPort
- worker-bridge.ts: Main-thread bridge spawning Worker, relaying postMessage
  events to EventEmitter matching AgentManagerEvents interface
- types.ts: WorkerConfig, SerializableSessionConfig, WorkerMessage protocol

Handles dev/production Electron paths, SecurityProfile serialization across
worker boundaries, and abort signal propagation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…sManager

Replace Python subprocess spawn with Worker thread creation for AI SDK agents.
Add spawnWorkerProcess() using WorkerBridge for postMessage event handling.
Update killProcess/killAllProcesses to handle Worker thread termination.
Add optional worker field to AgentProcess interface. Keep spawnProcess()
and getPythonPath()/ensurePythonEnvReady() for backward compatibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…AgentEvents

Add handleStructuredProgress() and buildProgressData() methods that accept
typed progress events from worker threads via postMessage, bypassing text
matching. Includes phase regression prevention. Existing parseExecutionPhase()
preserved as fallback for backward compatibility during transition.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests cover: worker spawning, message relay (log/error/progress/stream-event),
result handling with exit code mapping, crash handling (worker error/exit events),
termination with abort signal, executor lifecycle (start/stop/retry), config
management, and AgentManagerEvents compatibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…terator.ts

Replaces Python run.py main build loop and agents/coder.py subtask iteration
with TypeScript equivalents for the Vercel AI SDK migration.

- BuildOrchestrator: drives planning → coding → qa_review → qa_fixing → complete
- SubtaskIterator: reads implementation_plan.json, iterates pending subtasks
- Phase transitions validated via phase-protocol.ts
- Retry tracking, stuck detection, abort signal support

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sentry
Copy link

sentry bot commented Feb 26, 2026

🚧 Skipped: PR exceeds review size limit.

Please split into smaller PRs and re-run.
Reference ID: 10835398

Security: fix worker.ts unsafe cast, sanitize Bearer tokens in error classifier,
block --no-preserve-root in rm validator, deny unparseable shell -c commands,
redact OAuth tokens in debug logs.

Cross-platform: resolve shell dynamically in bash tool (Git Bash/cmd.exe),
use findExecutable for ripgrep in grep tool, handle CRLF in read/write/
worktree-manager/auto-merger, use killProcessGracefully for process cleanup.

Build: remove stale Python/Graphiti extraResources from package.json, update
spec_runner.py marker to session/runner.ts, deduplicate AGENT_CONFIGS in
tools/registry.ts, remove hollow test assertion.

i18n: add 11 missing FR translation keys in onboarding.json (Ollama config,
Voyage embedding model), add memory.info section to en/fr common.json,
replace 4 hardcoded strings in MemoriesTab.tsx with t() calls.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sentry
Copy link

sentry bot commented Feb 26, 2026

🚧 Skipped: PR exceeds review size limit.

Please split into smaller PRs and re-run.
Reference ID: 10838952

@sentry
Copy link

sentry bot commented Feb 27, 2026

🚧 Skipped: PR exceeds review size limit.

Please split into smaller PRs and re-run.
Reference ID: 10875303

@sentry
Copy link

sentry bot commented Feb 27, 2026

🚧 Skipped: PR exceeds review size limit.

Please split into smaller PRs and re-run.
Reference ID: 10882080

@sentry
Copy link

sentry bot commented Mar 3, 2026

🚧 Skipped: PR exceeds review size limit.

Please split into smaller PRs and re-run.
Reference ID: 11071682

2. **Account ID** — May be required. Test without it first. If needed, decode from JWT or read `~/.codex/auth.json`.
3. **CORS** — Not an issue (Electron main process = Node.js).
4. **Polling rate** — Unknown if OpenAI rate-limits `wham/usage`. Start conservatively (every 30-60s).
5. **Multi-account Codex** — Codex CLI doesn't support multiple accounts. We store one token file. If user has multiple Codex accounts, they'd need to re-auth each time (unlike Anthropic which supports multiple config dirs).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporary research file accidentally committed to repository

Medium Severity

CODEX_RATE_LIMITS_RESEARCH.md is explicitly marked on line 3 as "Temporary research file. Delete after implementation." yet it's committed to the repository root. It contains 348 lines of internal architecture notes, undocumented API endpoints, OAuth client IDs, token storage paths, and a detailed implementation plan for Codex rate-limit monitoring — none of which belongs in the committed codebase.

Fix in Cursor Fix in Web

@sentry
Copy link

sentry bot commented Mar 8, 2026

🚧 Skipped: PR exceeds review size limit.

Please split into smaller PRs and re-run.
Reference ID: 11374266

Copy link
Contributor

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

AndyMik90 and others added 2 commits March 9, 2026 08:32
- Increase polling interval from 30s to 60s for active profile
- Increase inactive profile cache TTL from 60s to 5 minutes
- Add adaptive cache: drops to 60s when active usage >80% session or >90% weekly
- Add request coalescing for getAllProfilesUsage() to prevent duplicate fetches
- Stagger same-provider fetches with 15s delay (prevents burst-hitting same API)
- Add 10-minute backoff for 429 rate limits (vs 2min general failure cooldown)
- Stop force-refreshing on AccountSettings open (use cached data + push updates)
- Fix false "needs re-auth" flag: clear needsReauthProfiles when valid token obtained
- Remove noisy ProjectStore subtask completion diagnostic logging

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sentry
Copy link

sentry bot commented Mar 9, 2026

🚧 Skipped: PR exceeds review size limit.

Please split into smaller PRs and re-run.
Reference ID: 11396101

@sentry
Copy link

sentry bot commented Mar 9, 2026

🚧 Skipped: PR exceeds review size limit.

Please split into smaller PRs and re-run.
Reference ID: 11402351

@sentry
Copy link

sentry bot commented Mar 9, 2026

🚧 Skipped: PR exceeds review size limit.

Please split into smaller PRs and re-run.
Reference ID: 11407737

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

system: systemPrompt,
messages: conversationHistory,
tools: toolRegistry.getToolsForAgent(agentType),
stopWhen: stepCountIs(1000),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation recommends excessively high step count limit

High Severity

The "Key Patterns" documentation in CLAUDE.md recommends stopWhen: stepCountIs(1000) as the canonical example for agent sessions. Each step is a separate LLM API call. Combined with the runtime default of 500 steps in runner.ts (which can be doubled to 1000 by calibration) and an absolute cap of 2000 in memory-stop-condition.ts, this contributes to the excessive API usage reported in PR discussion — a user burning through a 5-hour Claude limit in 20 minutes. The documented example normalizes an extremely high step count as the recommended pattern.

Fix in Cursor Fix in Web

@sentry
Copy link

sentry bot commented Mar 9, 2026

🚧 Skipped: PR exceeds review size limit.

Please split into smaller PRs and re-run.
Reference ID: 11444614

@sentry
Copy link

sentry bot commented Mar 11, 2026

🚧 Skipped: PR exceeds review size limit.

Please split into smaller PRs and re-run.
Reference ID: 11562797

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/frontend This is frontend only area/fullstack This is Frontend + Backend feature New feature or request size/XL Extra large (1000+ lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants