feat: add transcript GC maintenance for summarized tool results#148
Open
feat: add transcript GC maintenance for summarized tool results#148
Conversation
…l results Add a summarized-tool candidate query in SummaryStore and implement LcmContextEngine.maintain() for the conservative first transcript-GC pass. This pass only rewrites tool-result transcript entries that were already externalized into large_files during ingest, are linked through summary_messages, and are no longer present as raw context items. Rebuild replacement toolResult messages from stored message_parts, align them to transcript entries by stable toolCallId, and request runtime-owned rewrites in small batches. Also export the minimal assembler helpers needed for replacement reconstruction and add focused engine tests for candidate selection and maintain()-driven rewrite requests. Regeneration-Prompt: | Implement Phase 2 of the tool-result externalization spec now that upstream OpenClaw has merged the transcript maintenance hook and rewrite helper. Keep this first pass conservative and additive: do not redesign compaction or add new schema unless required. Select transcript-GC candidates from LCM state only when a tool-result message was already externalized into large_files, is covered by summaries, and is no longer present as a raw context item. Rebuild the compact replacement message from stored message_parts so the placeholder content stays canonical, then align candidates to active transcript entries by stable toolCallId and ask the runtime to rewrite them in bounded batches. Skip anything ambiguous instead of trying to be clever. Add focused tests that prove candidate discovery works and that maintain() requests the expected rewrite payload for a summarized externalized tool result.
Document the current state of tool-result externalization, incremental bootstrap, and transcript GC in the repo spec. Add a changeset for the new runtime-assisted transcript GC behavior so release notes capture the user-visible impact. Regeneration-Prompt: | OpenClaw upstream landed the transcript rewrite maintenance API, and this branch already implements the first pass of transcript GC for summarized externalized tool results. Add the missing repo-side documentation so the PR is self-contained: a spec in specs/ that explains what is already implemented, why it matters operationally, and what still remains to finish the design. Also add a changeset, because this changes user-visible runtime behavior by shrinking active transcripts after safe condensation. Do not pretend the implementation is complete; call out the remaining work explicitly, including legacy inline tool results, stronger transcript alignment, tighter eligibility/fresh-tail rules, and end-to-end integration coverage.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Add the first runtime-assisted transcript GC pass for summarized externalized tool results, and include the repo spec that explains the broader design and current implementation status.
Why
Lossless Claw already bounded model context growth, but long-lived tool-heavy sessions could still accumulate oversized inline tool results in the active transcript. This change starts shrinking the hot JSONL once those payloads are safely externalized and summarized, which reduces restart/bootstrap cost and repeated replay of giant tool output after crashes. The added spec documents what is already done versus what still remains.
Changes
maintain()transcript rewrite flowtoolCallIdTesting
npx vitest run test/engine.test.ts -t "(lists summarized externalized tool results as transcript GC candidates|maintain\(\) requests transcript rewrites for summarized externalized tool results|externalizes oversized tool-result payloads into large_files|externalizes oversized plain-text tool-result blocks from live exec-style messages)"npx tsc -p tsconfig.json --noEmitis still not a clean local gate because of pre-existing repo baseline issues and stale installedopenclawtypingsFollow-ups
This is intentionally a first pass, not the full end state. The remaining work is tracked in the repo spec at
specs/tool-result-externalization-and-incremental-bootstrap.md:toolCallId