Skip to content

feat: add transcript GC maintenance for summarized tool results#148

Open
jalehman wants to merge 2 commits intomainfrom
codex/lossless-claw-71a-transcript-gc-candidates
Open

feat: add transcript GC maintenance for summarized tool results#148
jalehman wants to merge 2 commits intomainfrom
codex/lossless-claw-71a-transcript-gc-candidates

Conversation

@jalehman
Copy link
Contributor

What

Add the first runtime-assisted transcript GC pass for summarized externalized tool results, and include the repo spec that explains the broader design and current implementation status.

Why

Lossless Claw already bounded model context growth, but long-lived tool-heavy sessions could still accumulate oversized inline tool results in the active transcript. This change starts shrinking the hot JSONL once those payloads are safely externalized and summarized, which reduces restart/bootstrap cost and repeated replay of giant tool output after crashes. The added spec documents what is already done versus what still remains.

Changes

  • Add conservative maintain() transcript rewrite flow
  • GC only summarized externalized tool results
  • Match transcript entries by unique toolCallId
  • Rebuild compact replacements from stored message parts
  • Add focused transcript-GC unit coverage
  • Add repo spec for externalization/bootstrap/GC design
  • Add changeset for release notes

Testing

  • npx vitest run test/engine.test.ts -t "(lists summarized externalized tool results as transcript GC candidates|maintain\(\) requests transcript rewrites for summarized externalized tool results|externalizes oversized tool-result payloads into large_files|externalizes oversized plain-text tool-result blocks from live exec-style messages)"
  • npx tsc -p tsconfig.json --noEmit is still not a clean local gate because of pre-existing repo baseline issues and stale installed openclaw typings

Follow-ups

This is intentionally a first pass, not the full end state. The remaining work is tracked in the repo spec at specs/tool-result-externalization-and-incremental-bootstrap.md:

  • handle legacy inline oversized tool results that predate ingest-time externalization
  • strengthen transcript-entry alignment beyond unique toolCallId
  • tighten fresh-tail and eligibility rules for GC
  • add end-to-end coverage against the merged OpenClaw maintenance lifecycle
  • optionally add more preventive write-time hygiene so giant inline tool blobs are avoided earlier

…l results

Add a summarized-tool candidate query in SummaryStore and implement LcmContextEngine.maintain() for the conservative first transcript-GC pass. This pass only rewrites tool-result transcript entries that were already externalized into large_files during ingest, are linked through summary_messages, and are no longer present as raw context items. Rebuild replacement toolResult messages from stored message_parts, align them to transcript entries by stable toolCallId, and request runtime-owned rewrites in small batches. Also export the minimal assembler helpers needed for replacement reconstruction and add focused engine tests for candidate selection and maintain()-driven rewrite requests.

Regeneration-Prompt: |
  Implement Phase 2 of the tool-result externalization spec now that upstream OpenClaw has merged the transcript maintenance hook and rewrite helper. Keep this first pass conservative and additive: do not redesign compaction or add new schema unless required. Select transcript-GC candidates from LCM state only when a tool-result message was already externalized into large_files, is covered by summaries, and is no longer present as a raw context item. Rebuild the compact replacement message from stored message_parts so the placeholder content stays canonical, then align candidates to active transcript entries by stable toolCallId and ask the runtime to rewrite them in bounded batches. Skip anything ambiguous instead of trying to be clever. Add focused tests that prove candidate discovery works and that maintain() requests the expected rewrite payload for a summarized externalized tool result.
Document the current state of tool-result externalization,
incremental bootstrap, and transcript GC in the repo spec.
Add a changeset for the new runtime-assisted transcript GC
behavior so release notes capture the user-visible impact.

Regeneration-Prompt: |
  OpenClaw upstream landed the transcript rewrite maintenance API,
  and this branch already implements the first pass of transcript GC
  for summarized externalized tool results. Add the missing repo-side
  documentation so the PR is self-contained: a spec in specs/ that
  explains what is already implemented, why it matters operationally,
  and what still remains to finish the design. Also add a changeset,
  because this changes user-visible runtime behavior by shrinking
  active transcripts after safe condensation. Do not pretend the
  implementation is complete; call out the remaining work explicitly,
  including legacy inline tool results, stronger transcript alignment,
  tighter eligibility/fresh-tail rules, and end-to-end integration
  coverage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant