feat: add Pod Snapshot extension for Python SDK#338
feat: add Pod Snapshot extension for Python SDK#338k8s-ci-robot merged 11 commits intokubernetes-sigs:mainfrom
Conversation
✅ Deploy Preview for agent-sandbox canceled.
|
|
Hi @shrutiyam-glitch. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
|
lgtm |
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
|
/assign @janetkuo |
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
...ts/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/test_podsnapshot_client.py
Show resolved
Hide resolved
| for pod in pods.items: | ||
| if ( | ||
| pod.status.phase == "Running" | ||
| and pod_name_substring in pod.metadata.name |
There was a problem hiding this comment.
It's not reliable / less efficient to match pods by name substring. Would you use a label selector instead?
| @@ -0,0 +1,103 @@ | |||
| # Copyright 2026 The Kubernetes Authors. | |||
There was a problem hiding this comment.
Should we run this test in presubmits?
There was a problem hiding this comment.
The test_podsnapshot_extension.py is an integration test specifically for GKE-managed environments. Because our current test/e2e suite runs on a kind cluster, it cannot support this extension, which relies on the GKE pod snapshot controller.
I have documented the prerequisites and instructions for running this test manually in the podsnapshot.md guide.
Regarding unit tests, they should be part of the presubmits. Currently, the dev/tools/test-unit script is configured to only run Go tests using go list and gotestsum. In a follow-up PR, I will update this tool to include Python unit test execution so that the SDK logic is covered by automated presubmits.
| def snapshot_controller_ready(self) -> bool: | ||
| """ | ||
| Checks if the snapshot agent pods are running in a GKE-managed pod snapshot cluster. | ||
| Falls back to checking CRD existence if pod listing is forbidden. |
There was a problem hiding this comment.
It's more robust to check for CRD presence given that controller pods are implementation details and subject to change over time. Also, with strict RBAC, listing pods might be restricted.
Isn't just checking CRD enough?
There was a problem hiding this comment.
The lack of a CRD means a controller is definitely not running, but a CRD's presence doesn't mean that a controller is running.
Since GKE is a managed control plane, the SDK won't have the necessary RBAC permissions to view any controller pods.
As an alternative, how about designing the snapshot creation method to attempt to create the PodSnapshotManualTrigger Custom Resource? Then catch any 404 Not Found error from the API server if the CRD isn't installed. To handle cases where the CRD is present but the controller isn't ready, we can use exponential backoff when creating the resource or when polling its status.
There was a problem hiding this comment.
If the CRD exists, creating the CR should still succeed, even if the controller isn't ready. The controller should be able to handle the CR as soon as it becomes ready.
I suggest we first verify if the CRDs exist, and then create the PodSnapshotManualTrigger CR in the snapshot creation method. Then we get information from reading PodSnapshotManualTrigger .status field.
This way, the SDK/client only interacts with the API, not the controller implementation details.
There was a problem hiding this comment.
Makes sense. Updated the method to just check for the CRDs installed.
Thanks.
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Show resolved
Hide resolved
|
|
||
| class PodSnapshotSandboxClient(SandboxClient): | ||
| """ | ||
| A specialized Sandbox client for interacting with the gke pod snapshot controller. |
There was a problem hiding this comment.
Is the client only checking whether the controller is ready? This is different from the description in the PR: "to support manual snapshot triggering via the GKE pod snapshot controller."
I'd expect the client to modify Snapshot CRs, instead of checking the Snapshot controller itself.
There was a problem hiding this comment.
I have defined the snapshot method here - https://github.com/kubernetes-sigs/agent-sandbox/pull/339/changes#diff-6535038b29a40cde2f558dd8bf85e28a67c1eee796fe718c04338884af9bddecR203.
The method will first check if the snapshot controller is ready as an initialization check before creating the snapshots.
Other methods added will be list, delete. I had to just split logic into multiple PRs.
For the PR description, I just meant to write what the purpose of the class in an overview is. Will update it.
Thanks.
clients/python/agentic-sandbox-client/test_podsnapshot_extension.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Show resolved
Hide resolved
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aditya-shantanu, janetkuo, shrutiyam-glitch, vicentefb The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Depends on: #337
This PR introduces the initial implementation of the Pod Snapshot extension for the Agentic Sandbox Python SDK, enabling specialized interactions with GKE-managed pod snapshots.
Key Additions:
PodSnapshotSandboxClient: A new class that extends the baseSandboxClientwhose overall purpose is to support manual snapshot triggering via the GKE pod snapshot controller._check_snapshot_crd_installed(): Added a method to verify if the snapshot CRDs exist before performing operations.PodSnapshotSandboxClientpodsnapshot.mdguide detailing the key features, prerequisites, and instructions for running tests.Following PR: #339 , #249
Output: