Skip to content

feat: add keepAlive support and fix RemoteSandboxSnapshot client re-injection#1819

Open
Buktal wants to merge 7 commits into
agentscope-ai:mainfrom
Buktal:feat/sandbox-keepalive
Open

feat: add keepAlive support and fix RemoteSandboxSnapshot client re-injection#1819
Buktal wants to merge 7 commits into
agentscope-ai:mainfrom
Buktal:feat/sandbox-keepalive

Conversation

@Buktal

@Buktal Buktal commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

AgentScope-Java Version

2.0.0-RC3 (based on f28e0a0)

Description

Problem

  1. Sandbox always destroyed after each turn. Every agent call creates a fresh Docker container or Kubernetes Pod, runs the call, then shuts it down. For stateful agents (coding agent with installed dependencies, data agent with cached datasets), this is wasteful - the same workspace could be reused across turns within a session.

  2. RemoteSnapshotClient null after deserialization. RemoteSandboxSnapshot loses its RemoteSnapshotClient when the sandbox state goes through JSON serialization/deserialization. SandboxClient.resume() was documented to re-inject it but never actually did - a latent bug that blocks any stateful resume path.

Solution

keepAlive mode (6 commits)

Add a keepAlive boolean to SandboxFilesystemSpec, propagated through SandboxContext to SandboxManager.release(). When enabled, release() calls stop() (persists workspace snapshot) but skips shutdown() (destroys container/Pod), preserving the sandbox resource for the next call.

Key files:

  • SandboxFilesystemSpec - keepAlive field + fluent setter + toSandboxContext() propagation
  • SandboxContext - keepAlive field + isKeepAlive() + Builder setter
  • SandboxManager.release() - conditional shutdown() based on isKeepAlive()
  • SandboxManager.archive(key) - out-of-band graceful teardown: loads persisted state, resumes, stops, shuts down, deletes state, returns final serialized JSON for external persistence
  • SandboxManager.archiveForUser/session() - convenience wrappers

Snapshot client re-injection fix (1 commit)

Both DockerSandboxClient and KubernetesSandboxClient now accept a SandboxSnapshotSpec at construction time. resume() uses the spec to rebuild the RemoteSnapshotClient after deserialization, fixing the documented-but-never-implemented gap.

Public key API for out-of-band callers (1 commit)

  • SandboxIsolationKey.of(scope, value) - public factory method so external code (restore, archive, scavenger) can build a key without a RuntimeContext or guessing the internal slotSessionId format
  • SessionSandboxStateStore.load/save(scope, value) - convenience overloads so application code no longer needs to construct SandboxIsolationKey objects manually

Changes

agentscope-harness/.../sandbox/SandboxFilesystemSpec.java          +20
agentscope-harness/.../sandbox/SandboxContext.java                 +12
agentscope-harness/.../sandbox/SandboxManager.java                 +78
agentscope-harness/.../sandbox/SandboxIsolationKey.java             +2
agentscope-harness/.../sandbox/SessionSandboxStateStore.java       +15
agentscope-harness/.../middleware/SandboxLifecycleMiddleware.java    +4
agentscope-harness/.../sandbox/impl/docker/DockerSandboxClient.java +29
agentscope-harness/.../sandbox/impl/docker/DockerFilesystemSpec.java +7
agentscope-extensions/.../kubernetes/KubernetesSandboxClient.java   +23
agentscope-extensions/.../kubernetes/KubernetesFilesystemSpec.java   +7

Testing

8 new test files / additions:

Tests File
keepAlive default/getter/propagation SandboxContextTest
keepAlive gating of shutdown() SandboxManagerIsolationTest
createClient() injection DockerFilesystemSpecTest / KubernetesFilesystemSpecTest
Docker snapshot re-injection DockerSandboxStateSerdeTest
K8s snapshot re-injection KubernetesSandboxStateSerdeTest
middleware rollback on start failure SandboxLifecycleMiddlewareTest

All existing tests pass: harness (568 tests) + K8s sandbox extension (2 tests) = 570 tests, 0 failures.

Checklist

  • Code has been formatted with mvn spotless:apply
  • All tests are passing
  • Javadoc comments are complete
  • Documentation updated
  • Code is ready for review

@Buktal Buktal requested a review from a team June 18, 2026 03:03
@codecov

codecov Bot commented Jun 18, 2026

Copy link
Copy Markdown

@AgentScopeJavaBot AgentScopeJavaBot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Review

This PR introduces two features: (1) keepAlive mode — .keepAlive(true) makes release() only call stop() (snapshot) but skip shutdown(), preserving the Pod across calls; (2) RemoteSandboxSnapshot client re-injection — both DockerSandboxClient and KubernetesSandboxClient now accept SandboxSnapshotSpec at construction and rebuild the snapshot client in resume(). An archive() method and 8 new tests are included. The code design is clean and tests are comprehensive. Two recommended issues: a breaking API change in SandboxManager.release() signature that needs a @Deprecated compatibility overload, and a 560-line AI implementation plan document that should not be committed to the main repository.

}

public void release(SandboxAcquireResult result) {
public void release(SandboxAcquireResult result, SandboxContext sandboxContext) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[recommended] Breaking API change in release() signature: The release() method signature has changed, which is a breaking change for existing consumers. Consider adding a @Deprecated compatibility overload that delegates to the new signature, giving downstream users a migration path:

@Deprecated
public void release(String key) {
    release(key, false); // or appropriate default
}

@@ -0,0 +1,560 @@
# Sandbox keepAlive Implementation Plan

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[recommended] AI implementation plan document should not be committed: This 560-line markdown file appears to be an AI-generated implementation plan/design document. While useful during development, such documents should not be committed to the main repository as they clutter the repo and become stale quickly. Consider removing this file or moving it to a design docs area outside the main source tree.

sandbox.shutdown();
} catch (Exception e) {
log.warn("[sandbox] Sandbox shutdown failed: {}", e.getMessage(), e);
boolean keepAlive = sandboxContext != null && sandboxContext.isKeepAlive();

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Javadoc for keepAlive: Consider enriching the Javadoc with a usage scenario description (e.g., 'useful for multi-turn conversations where the same sandbox should be reused across agent calls to avoid cold-start latency').

}

// snapshotSpec used in resume() to re-inject snapshot client after deserialization
public DockerSandboxClient(ObjectMapper objectMapper, SandboxSnapshotSpec snapshotSpec) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Missing Javadoc: The new constructor accepting SandboxSnapshotSpec should have a Javadoc explaining when to use it vs the simpler constructor.

@AgentScopeJavaBot AgentScopeJavaBot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Review

This PR introduces two features: (1) keepAlive mode — .keepAlive(true) makes release() only call stop() (snapshot) but skip shutdown(), preserving the Pod across calls; (2) RemoteSandboxSnapshot client re-injection — both DockerSandboxClient and KubernetesSandboxClient now accept SandboxSnapshotSpec at construction and rebuild the snapshot client in resume(). An archive() method and 8 new tests are included. The code design is clean and tests are comprehensive. Two recommended issues: a breaking API change in SandboxManager.release() signature that needs a @Deprecated compatibility overload, and a 560-line AI implementation plan document that should not be committed to the main repository.

}

public void release(SandboxAcquireResult result) {
public void release(SandboxAcquireResult result, SandboxContext sandboxContext) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[recommended] Breaking API change in release() signature: The release() method signature has changed, which is a breaking change for existing consumers. Consider adding a @Deprecated compatibility overload that delegates to the new signature, giving downstream users a migration path:

@Deprecated
public void release(String key) {
    release(key, false); // or appropriate default
}

@@ -0,0 +1,560 @@
# Sandbox keepAlive Implementation Plan

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[recommended] AI implementation plan document should not be committed: This 560-line markdown file appears to be an AI-generated implementation plan/design document. While useful during development, such documents should not be committed to the main repository as they clutter the repo and become stale quickly. Consider removing this file or moving it to a design docs area outside the main source tree.

sandbox.shutdown();
} catch (Exception e) {
log.warn("[sandbox] Sandbox shutdown failed: {}", e.getMessage(), e);
boolean keepAlive = sandboxContext != null && sandboxContext.isKeepAlive();

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Javadoc for keepAlive: Consider enriching the Javadoc with a usage scenario description (e.g., 'useful for multi-turn conversations where the same sandbox should be reused across agent calls to avoid cold-start latency').

}

// snapshotSpec used in resume() to re-inject snapshot client after deserialization
public DockerSandboxClient(ObjectMapper objectMapper, SandboxSnapshotSpec snapshotSpec) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Missing Javadoc: The new constructor accepting SandboxSnapshotSpec should have a Javadoc explaining when to use it vs the simpler constructor.

@AgentScopeJavaBot AgentScopeJavaBot added enhancement New feature or request area/harness agentscope-harness (test/runtime support) area/ext/integration External protocols & middleware integrations labels Jun 19, 2026
…eStore scope-value overloads

Add a public factory method SandboxIsolationKey.of(scope, value) so that
out-of-band callers (restore, archive, scavenger) can build a key without
a RuntimeContext or guessing the internal slotSessionId format.

Add SessionSandboxStateStore.load(scope, value) / save(scope, value)
convenience overloads so that application code no longer needs to
construct SandboxIsolationKey objects manually.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/ext/integration External protocols & middleware integrations area/harness agentscope-harness (test/runtime support) enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants