fix: prevent nil pointer crash in OpenAI SSE tool call accumulation by duhd-vnpay · Pull Request #182 · nextlevelbuilder/goclaw

duhd-vnpay · 2026-03-13T15:04:05Z

Summary

Root cause fix: openai.go — accumulators map (keyed by SSE tool_call index) was iterated with for i := 0; i < len(map); i++, assuming contiguous keys 0,1,2... SSE indices from extended thinking models can be non-contiguous (e.g. {0, 2}), causing accumulators[i] → nil → panic on acc.rawArgs
Defense: lanes.go — add defer recover() in scheduler lane goroutine so panics log error + return semaphore token instead of crashing the process
Cleanup: tracing.go + gateway.go — add SweepOrphanTraces() to mark stuck running traces as error on startup (traces older than 1h with status running = orphan from crash)

Changed files

File	Change
`internal/providers/openai.go`	Iterate map by sorted keys instead of sequential 0..len-1
`internal/scheduler/lanes.go`	Add `recover()` in goroutine defer
`internal/store/tracing_store.go`	Add `SweepOrphanTraces` to interface
`internal/store/pg/tracing.go`	Implement `SweepOrphanTraces` (UPDATE running→error)
`cmd/gateway.go`	Call sweep on startup in background goroutine

Test results (18 tests, all PASS)

providers (9/9 PASS):

ContiguousToolCalls — indices 0,1 (happy path)
NonContiguousToolCallIndices — indices 0,2 skip 1 (the crash scenario)
LargeGapToolCallIndices — indices 0,5,10
SingleToolCallAtHighIndex — single call at index 3
ToolCallWithThoughtSignature — metadata preserved
TextOnly / EmptyToolCalls — no tool calls
HTTPError — 429 → proper HTTPError
CancelledContext — context cancel → error

scheduler (5/5 PASS):

NoPanic — normal operation
PanicRecovery — panic → lane recovers, accepts new work
MultiplePanics — 3 panics → all semaphore tokens returned
CancelledContext — cancelled ctx with full lane → error
StatsAfterPanic — active counter decremented after panic

store/pg (4/4 PASS, integration with PostgreSQL):

SweepsOldRunning — running trace 2h old → swept
IgnoresRecentRunning — running trace 5min old → NOT swept
IgnoresCompleted — completed trace → NOT swept
NoOrphans — no running traces → swept=0

🤖 Generated with Claude Code

…ing, photo handling - Fix zaloBotInfo to use account_name/display_name (not name) - Add Label() method for bot display name resolution - Handle 3 response formats in getUpdates: array, single object, wrapped - Add photo_url field to zaloMessage for Zalo CDN image URLs - Add display_name/is_bot to zaloFrom, chat_type to zaloChat - Use PhotoURL with fallback to Photo in handleImageMessage Co-Authored-By: Claude Opus 4.6 <[email protected]>

…ssing Zalo CDN URLs are auth-restricted and expire quickly, causing read_image tool failures. Now downloads photos to temp files (like Telegram channel) so the agent pipeline can base64-encode and process them normally. Falls back to passing the URL directly if download fails. Co-Authored-By: Claude Opus 4.6 <[email protected]>

… instance updates When encryption key is empty, credentials stayed as map[string]any from JSON unmarshal, causing pgx driver to fail encoding into bytea. Now credentials are always marshaled to []byte regardless of encryption. Co-Authored-By: Claude Opus 4.6 <[email protected]>

Add Party Mode to GoClaw: structured multi-persona AI discussions with Standard (single LLM call), Deep (parallel thinking + cross-talk), and Token-Ring (sequential turns) modes. Backend: PartyStore + PG implementation, party engine with parallel goroutines, 7 RPC methods (party.start/round/question/add_context/ summary/exit/list), 10 WebSocket events, migration 000014. Frontend: React dashboard page with session list, chat view, persona sidebar, mode controls, start dialog with 6 team presets, i18n (en/vi/zh). Co-Authored-By: Claude Opus 4.6 <[email protected]>

Previously, sanitizeHistory() only cleaned the in-memory copy for each LLM request but never persisted the fix — causing the same "dropping orphaned tool message" WARN to repeat on every single request forever. Changes: - sanitizeHistory() now returns drop count alongside cleaned messages - When orphans are detected, cleaned history is persisted back to the session store via new SetHistory() method, then saved to DB - Per-message WARN logs downgraded to DEBUG (cleanup is logged once at INFO level with total count) - Added SetHistory() to SessionStore interface + both implementations Co-Authored-By: Claude Opus 4.6 <[email protected]>

When a delegated agent (e.g. ui-ux-design-agent) spawns subagents, the announce session key uses the format delegate:{uuid8}:{agentKey}:{id}. The scheduler's RunFunc only handled agent:{agentId}:{rest} format, falling back to the hardcoded "default" agent — which doesn't exist in managed-mode deployments where the default agent has a custom key. Add delegate: prefix parsing to extract the target agent key from position 2 of the session key parts. Co-Authored-By: Claude Opus 4.6 <[email protected]>

- Add sender_id and channel to team task metadata for audit trail - Remove assistant prefill in team task reminder (thinking models reject it) - Add unit tests for team access control and sender_id tracking Co-Authored-By: Claude Opus 4.6 <[email protected]>

…, mobile UX, budget, traces) Resolved 13 file conflicts: - zalo.go: kept local struct tags + upstream's n==0 safety check - channel_instances.go: used upstream's credential merging (supersedes local bytea fix) - factory.go, stores.go: merged both Party + Contacts/Activity/Snapshots stores - loop_history.go: kept upstream skill inlining constants + local session persistence - session_store.go, sessions/manager.go, sessions_ops.go: deduplicated SetHistory() - sidebar.tsx, routes.tsx: added Party to upstream's restructured sidebar/routes - protocol.ts, i18n/index.ts, sidebar.json (×3): merged Party + upstream events/translations - version.go: bumped to schema version 18 Migration collision fix: renamed 000014_party_sessions → 000018_party_sessions Co-Authored-By: Claude Opus 4.6 <[email protected]>

- Add nullish coalescing for PERSONA_COLORS array index - Remove unused destructured `round` variable from PartyPage Co-Authored-By: Claude Opus 4.6 <[email protected]>

…ting Co-Authored-By: Claude Opus 4.6 <[email protected]>

…otocol Backend: - gateway_providers: read default_model from provider settings JSONB - party.go: getEngine() prefers providers with DefaultModel, alphabetical fallback for determinism (fixes random Go map iteration) - party.go: add slog.Error for round failures (was silently swallowed) Frontend: - use-party.ts: align all RPC params/events with snake_case wire format, add transformSession/mapStatus/selectSession helpers - party-start-dialog.tsx: use actual DB persona keys (morpheus-persona etc.) - party-page.tsx: use selectSession() for proper state hydration - connection-status.tsx: fix status text alignment Co-Authored-By: Claude Opus 4.6 <[email protected]>

transformSession() was dropping history/summary fields from backend response, and selectSession() reset messages to empty array. Old sessions appeared blank when clicked in sidebar. - transformSession: preserve _history and _summary from backend - hydrateMessages: new helper to convert RoundResult[] + SummaryResult into PartyMessage[] (round headers, persona messages, summary) - selectSession: call hydrateMessages() instead of setMessages([]) Co-Authored-By: Claude Opus 4.6 <[email protected]>

Root cause: `accumulators` map iterated with sequential `for i := 0; i < len(map)` but SSE tool_call indices can be non-contiguous (e.g. {0, 2}), causing nil dereference on `accumulators[i]` when key `i` doesn't exist. Fixes: 1. openai.go: iterate map by sorted keys instead of sequential 0..len-1 2. lanes.go: add defer recover() in scheduler goroutine to prevent panics from crashing the entire process — logs error and returns semaphore token 3. tracing: add SweepOrphanTraces() to mark stuck running traces as error on gateway startup (running > 1h = orphan from previous crash) Test results (18 tests): - providers: 9/9 PASS (contiguous, non-contiguous, large gap, high index, thought_signature, text-only, empty, HTTP error, cancelled context) - scheduler: 5/5 PASS (no panic, panic recovery, multiple panics, cancelled context, stats after panic) - store/pg: 4/4 PASS (sweeps old running, ignores recent, ignores completed, no orphans) Co-Authored-By: Claude Opus 4.6 <[email protected]>

… loops When LLM hits max_tokens, tool call arguments may be incomplete/malformed JSON. - Add truncation guard: skip tool execution when finish_reason=length, ask LLM to retry smaller - Wire per-agent max_tokens from other_config JSONB (default 8192) - Log warning on JSON parse failure for non-empty tool call arguments (truncation indicator) Co-Authored-By: Claude Opus 4.6 <[email protected]>

duhd-vnpay and others added 18 commits March 10, 2026 15:44

Merge branch 'nextlevelbuilder:main' into main

6d6a7c7

Merge remote-tracking branch 'origin/main'

68a9e95

Merge branch 'main' of https://github.com/duhd-vnpay/goclaw

925a5c1

fix: TypeScript strict mode errors in Party Mode UI

f26d52a

- Add nullish coalescing for PERSONA_COLORS array index - Remove unused destructured `round` variable from PartyPage Co-Authored-By: Claude Opus 4.6 <[email protected]>

test: add data-testid attributes to Party Mode components for E2E tes…

b332024

…ting Co-Authored-By: Claude Opus 4.6 <[email protected]>

Merge remote-tracking branch 'origin/main'

ba3871e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent nil pointer crash in OpenAI SSE tool call accumulation#182

fix: prevent nil pointer crash in OpenAI SSE tool call accumulation#182
duhd-vnpay wants to merge 18 commits intonextlevelbuilder:mainfrom
duhd-vnpay:main

duhd-vnpay commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

duhd-vnpay commented Mar 13, 2026

Summary

Changed files

Test results (18 tests, all PASS)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant