perf(session): drop superseded routed_experts/indexer_topk blobs from…#1463
Open
guapisolo wants to merge 1 commit into
Open
perf(session): drop superseded routed_experts/indexer_topk blobs from…#1463guapisolo wants to merge 1 commit into
guapisolo wants to merge 1 commit into
Conversation
… session records Each chat-completion turn stored the full upstream response in its SessionRecord, including the all-token routed_experts/indexer_topk blob (~1KB/token over the whole prompt+output). Every turn's prompt is the full accumulated prefix, so these blobs overlap and grow per turn — a 64-trajectory x 50-turn x ~50k-context run retained tens of GB and the all-token run failed with 502. merge_samples folds per-turn samples last-wins, so only the most recent turn's blob is ever read. append_record now drops the blob from records that can no longer be the consumed tail, keeping the last MAX_ASSISTANT_ROLLBACK_STEPS + 1 (so a single rollback's promoted tail still carries its blob) and leaving logprobs and the consumer untouched. Retained size drops from O(turns*prefix) to O(prefix). This last-wins reasoning only holds when turns are merged, so SessionRegistry asserts generate_multi_samples is False; the agentic_tool_call_multi_samples test variant (session server + multi-sample output) is removed accordingly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
08bde32 to
9cf2a03
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Drop superseded routed_experts/indexer_topk blobs from linear-trajectory session records.
Motivation
Each chat-completion turn stored the full upstream response in its
SessionRecord, including the all-tokenrouted_experts/indexer_topkblob (~1KB/token over the whole prompt+output). Every turn's prompt is the full accumulated prefix, so the blobs overlap and grow per turn — retained payload is O(turns × prefix). A 64-trajectory x 50-turn x ~50k-context run retained tens of GB; the all-token run failed with 502. The merged training sample and the per-turnoutput_token_logprobsstay unchanged.Before / After
append_recorddrops the blob from records that can no longer be the merged tail, so retained payload is O(prefix).LinearTrajectory.append_record, retaining the lastMAX_ASSISTANT_ROLLBACK_STEPS + 1records' blobs.SessionRegistry.__init__gains agenerate_multi_samples is Falseassert; the last-wins reasoning only holds when turns are merged.agentic_tool_call_multi_samplestest variant is removed accordingly.Behavior Preservation
test_stripping_superseded_records_preserves_merged_r3replays a 3-turn chain, strips the first two records, then asserts the mergedrollout_routed_expertsis byte-identical to the all-present baseline.test_keeps_last_two_and_preserves_logprobsassertsoutput_token_logprobssurvive every strip.test_single_rollback_leaves_surviving_tail_with_blobasserts a rolled-back tail still carries its blob.Verification
Existing suite (
test_linear_trajectory+test_openai_endpoint_utils+test_sessions+test_session_race_conditions) → 82 passed.Retention microbenchmark drives the real
append_recordwith synthetic blobs sized to the workload profile (~1KB/token, ~1k tokens/turn, so context reaches ~50k at turn 50). Per trajectory, retained blob bytes grow with turns pre-fix but stay O(prefix) after:The session server is a singleton process (
router_manager.py:116), so all in-flight sessions' retained sets coexist in one process. Building 64 concurrent sessions x 50 turns in one process measures the aggregate directly (not extrapolated), via tracemalloc live Python heap:Live heap equals retained blobs in both runs, confirming the popped blobs are released rather than held by a lingering reference. tracemalloc is used instead of RSS because glibc keeps freed chunks resident, which would over-report the fixed run.
Review Focus
append_recordstrip indexlen(records) - 1 - (MAX_ASSISTANT_ROLLBACK_STEPS + 1)against_try_detect_and_rollback_to_assistant_checkpoint, which truncatesrecordson a single-step rollback.SessionRegistry.__init__assert: confirm the session-server subprocess receives the sameargs, sogenerate_multi_samples=Trueis rejected before any session runs.