Add PodSnapshot extension to Python client#249
Add PodSnapshot extension to Python client#249shrutiyam-glitch wants to merge 12 commits intokubernetes-sigs:mainfrom
Conversation
✅ Deploy Preview for agent-sandbox ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Hi @shrutiyam-glitch. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
| self.controller_ready = False | ||
| return self.controller_ready | ||
|
|
||
| def checkpoint(self, trigger_name: str) -> ExecutionResult: |
There was a problem hiding this comment.
I recommend not to use the standard "run" function that exists in the client but to override it with a specific restore implementation:
Add the label + UUID as function parameters
If the parameter is empty, read the snapshot.json record and display existing template snapshots to the user “would you like to start from this checkpoint?”
clients/python/agentic-sandbox-client/agentic_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/agentic_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/agentic_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/agentic_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/agentic_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/agentic_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/agentic_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/agentic_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/test_podsnapshot_extension.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/test_podsnapshot_extension.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/agentic_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/test_podsnapshot_extension.py
Outdated
Show resolved
Hide resolved
| if condition.type == "PodRestored" and condition.status == "True": | ||
| # Attempt to extract UUID from the message | ||
| # Message format: "pod successfully restored from pod snapshot namespace/uuid" | ||
| if condition.message: |
There was a problem hiding this comment.
Thinking more about this. Is there any other indication that is less brittle than the condition.message ? For example, an annotation being set by the pod snapshot controller to the restored pod or something similar ? Or checking the UUID from the PSP object and then looking for this specific UUID in the condition.message etc ?
There was a problem hiding this comment.
This is a sample restored pod yaml - https://paste.googleplex.com/5754659412246528 - line 78 mentions the snapshot UUID.
There are no annotations available as of now.
checking the UUID from the PSP object and then looking for this specific UUID in the condition.message
So, do you think we can make it is_restored_from(s_uid) ? And check if the pod is restored from this snapshot.
There was a problem hiding this comment.
Yeah, i think this is a more robust approach, to look for that UUID in this string message: pod successfully restored from pod snapshot sandbox-test/244589a9-f094-46ec-8dfe-78f1ab0b1cfa , thanks!!
| Cleans up the PodSnapshotManualTrigger Resources. | ||
| Automatically cleans up the Sandbox. | ||
|
|
||
| TODO: Add cleanup for PodSnapshot resources. |
There was a problem hiding this comment.
Are we planning to do this? How will the use list snapshots otherwise?
There was a problem hiding this comment.
Deletion will be done by the users using the delete_snapshots() (#312).
Just added a Todo to let reviewers that there will be a way to clean that up as well.
| ) | ||
|
|
||
| if not self.pod_name: | ||
| logger.warning("Cannot check restore status: pod_name is unknown.") |
There was a problem hiding this comment.
Can we get this condition? if so how?
| super().__enter__() | ||
| return self | ||
|
|
||
| def _parse_snapshot_result(self, obj) -> SnapshotResult | None: |
There was a problem hiding this comment.
Nit: returning None isn't ideal. I would prefer throwing error.
|
Overall lgtm, minor comments. Please implement smaller CLs in the following CLs. Harder go through all the files in one go. |
| super().__enter__() | ||
| return self | ||
|
|
||
| def _parse_snapshot_result(self, obj) -> SnapshotResult | None: |
There was a problem hiding this comment.
nit non blocking: you correctly type the return as SnapshotResult | None, but obj is untyped. You might want to type it as dict[str, Any] to make the linter happier
| trigger_name=trigger_name, | ||
| snapshot_uid=None, | ||
| error_reason="Snapshot controller is not ready. Ensure it is installed and running.", | ||
| error_code=1, |
There was a problem hiding this comment.
what does error_code=1 mean ?
| print("\n======= Testing Pod Snapshot Extension =======") | ||
| assert sandbox.controller_ready == True, "Sandbox controller is not ready." | ||
|
|
||
| time.sleep(wait_time) |
There was a problem hiding this comment.
super nit fyi non-blocking: main function is defined as an async def and run using asyncio.run(), but inside the function, you are using the blocking, synchronous time.sleep(wait_time). This won't break anything here since there are no other concurrent tasks running, it is generally considered an anti-pattern. You could change time.sleep(5) to await asyncio.sleep(5).
Any specific reason on why this method is async ?
| @@ -0,0 +1,37 @@ | |||
| # Copyright 2025 The Kubernetes Authors. | |||
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: shrutiyam-glitch, vicentefb The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/assign @janetkuo |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Depends on: #339 #338 #337
This PR introduces the
PodSnapshotSandboxClientextension to theagentic-sandbox-clientPython SDK. This extension allows users to manage Pod Snapshots within the Agentic Sandbox environment, enabling stateful "checkpoint and restore" workflows.Key changes:
PodSnapshotSandboxClientClass: A specialized client that extendsSandboxClientto handle snapshot-specific operations.snapshot(trigger_name)method, which creates aPodSnapshotManualTrigger(PSMT) and waits for the controller to process it.snapshot_controller_ready()to detect GKE-managed (gke-managed-pod-snapshots) pod snapshot controllers.__exit__method now cleans up the PSMT resources sandboxes.constants.pyfile to improve maintainabilityTesting Done:
Integration Test: Added
test_podsnapshot_extension.pywhich verifies the full E2E flow:-- Starts a sandbox with a counter application.
-- Creates two sequential snapshots (test-snapshot-10 at 10 seconds and test-snapshot-20 at 20 seconds).
-- Restores a new sandbox from the most recent snapshot.
-- Verifies that the pod has been been restored from the recent snapshot.
Unit tests are added
Output:
Prerequisites:
PodSnapshotStorageConfig,PodSnapshotPolicy), SandboxTemplate to be installed and defined in the cluster.Note: Following PR will handle different aspects
--
SnapshotPersistenceManager,list_snapshots,delete_snapshots- #312-- Restoring from dedicated snapshot, interactive mode restoring - TBD