Skip to content

feat: add Pod Snapshot extension for Python SDK#338

Merged
k8s-ci-robot merged 11 commits intokubernetes-sigs:mainfrom
shrutiyam-glitch:pss_sdk_init
Mar 5, 2026
Merged

feat: add Pod Snapshot extension for Python SDK#338
k8s-ci-robot merged 11 commits intokubernetes-sigs:mainfrom
shrutiyam-glitch:pss_sdk_init

Conversation

@shrutiyam-glitch
Copy link
Copy Markdown
Contributor

@shrutiyam-glitch shrutiyam-glitch commented Feb 21, 2026

Depends on: #337

This PR introduces the initial implementation of the Pod Snapshot extension for the Agentic Sandbox Python SDK, enabling specialized interactions with GKE-managed pod snapshots.

Key Additions:

  • PodSnapshotSandboxClient: A new class that extends the base SandboxClient whose overall purpose is to support manual snapshot triggering via the GKE pod snapshot controller.
  • _check_snapshot_crd_installed(): Added a method to verify if the snapshot CRDs exist before performing operations.
  • Comprehensive Testing: Included both unit tests for the client logic and an integration test script to verify the end-to-end snapshot workflow.
  • Constants: Added constants required for the PodSnapshotSandboxClient
  • Extension Documentation: Added a podsnapshot.md guide detailing the key features, prerequisites, and instructions for running tests.

Following PR: #339 , #249
Output:

$ python3 test_podsnapshot_extension.py --template-name python-counter-template --namespace sandbox-test
--- Starting Sandbox Client Test (Namespace: sandbox-test, Port: 8888) ---

***** Phase 1: Starting Counter *****

======= Testing Pod Snapshot Extension =======

--- Sandbox Client Test Finished ---

@netlify
Copy link
Copy Markdown

netlify bot commented Feb 21, 2026

Deploy Preview for agent-sandbox canceled.

Name Link
🔨 Latest commit 7840d37
🔍 Latest deploy log https://app.netlify.com/projects/agent-sandbox/deploys/69a9e643815b39000875bb2a

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Feb 21, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @shrutiyam-glitch. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 21, 2026
@janetkuo janetkuo added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 21, 2026
@SHRUTI6991
Copy link
Copy Markdown
Contributor

lgtm

Copy link
Copy Markdown
Member

@vicentefb vicentefb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

i've already provided review in #249 and that has been addressed here as well

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 26, 2026
@vicentefb
Copy link
Copy Markdown
Member

/assign @janetkuo

for pod in pods.items:
if (
pod.status.phase == "Running"
and pod_name_substring in pod.metadata.name
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not reliable / less efficient to match pods by name substring. Would you use a label selector instead?

@@ -0,0 +1,103 @@
# Copyright 2026 The Kubernetes Authors.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we run this test in presubmits?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test_podsnapshot_extension.py is an integration test specifically for GKE-managed environments. Because our current test/e2e suite runs on a kind cluster, it cannot support this extension, which relies on the GKE pod snapshot controller.
I have documented the prerequisites and instructions for running this test manually in the podsnapshot.md guide.

Regarding unit tests, they should be part of the presubmits. Currently, the dev/tools/test-unit script is configured to only run Go tests using go list and gotestsum. In a follow-up PR, I will update this tool to include Python unit test execution so that the SDK logic is covered by automated presubmits.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 27, 2026
def snapshot_controller_ready(self) -> bool:
"""
Checks if the snapshot agent pods are running in a GKE-managed pod snapshot cluster.
Falls back to checking CRD existence if pod listing is forbidden.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more robust to check for CRD presence given that controller pods are implementation details and subject to change over time. Also, with strict RBAC, listing pods might be restricted.

Isn't just checking CRD enough?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lack of a CRD means a controller is definitely not running, but a CRD's presence doesn't mean that a controller is running.

Since GKE is a managed control plane, the SDK won't have the necessary RBAC permissions to view any controller pods.

As an alternative, how about designing the snapshot creation method to attempt to create the PodSnapshotManualTrigger Custom Resource? Then catch any 404 Not Found error from the API server if the CRD isn't installed. To handle cases where the CRD is present but the controller isn't ready, we can use exponential backoff when creating the resource or when polling its status.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the CRD exists, creating the CR should still succeed, even if the controller isn't ready. The controller should be able to handle the CR as soon as it becomes ready.

I suggest we first verify if the CRDs exist, and then create the PodSnapshotManualTrigger CR in the snapshot creation method. Then we get information from reading PodSnapshotManualTrigger .status field.

This way, the SDK/client only interacts with the API, not the controller implementation details.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Updated the method to just check for the CRDs installed.
Thanks.


class PodSnapshotSandboxClient(SandboxClient):
"""
A specialized Sandbox client for interacting with the gke pod snapshot controller.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the client only checking whether the controller is ready? This is different from the description in the PR: "to support manual snapshot triggering via the GKE pod snapshot controller."

I'd expect the client to modify Snapshot CRs, instead of checking the Snapshot controller itself.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have defined the snapshot method here - https://github.com/kubernetes-sigs/agent-sandbox/pull/339/changes#diff-6535038b29a40cde2f558dd8bf85e28a67c1eee796fe718c04338884af9bddecR203.
The method will first check if the snapshot controller is ready as an initialization check before creating the snapshots.
Other methods added will be list, delete. I had to just split logic into multiple PRs.

For the PR description, I just meant to write what the purpose of the class in an overview is. Will update it.
Thanks.

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 5, 2026
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 5, 2026
Copy link
Copy Markdown
Member

@janetkuo janetkuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 5, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aditya-shantanu, janetkuo, shrutiyam-glitch, vicentefb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 15026b7 into kubernetes-sigs:main Mar 5, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants