Skip to content

feat: implement pod snapshot method#339

Merged
k8s-ci-robot merged 4 commits intokubernetes-sigs:mainfrom
shrutiyam-glitch:pss_sdk_snapshot
Mar 13, 2026
Merged

feat: implement pod snapshot method#339
k8s-ci-robot merged 4 commits intokubernetes-sigs:mainfrom
shrutiyam-glitch:pss_sdk_snapshot

Conversation

@shrutiyam-glitch
Copy link
Copy Markdown
Contributor

@shrutiyam-glitch shrutiyam-glitch commented Feb 21, 2026

Depends on: #338 #337

This PR completes the Pod Snapshot extension for the Python SDK by implementing the snapshot() method and the underlying logic to monitor snapshot completion.

Core Logic Implementation:

  • Manual Trigger Workflow: The snapshot() method creates a PodSnapshotManualTrigger resource for the target pod, using a unique hash suffix to avoid name collisions.
  • Snapshot Status Watcher: Implements _wait_for_snapshot_to_be_completed, which uses the Kubernetes watch API to block until the snapshot controller marks the trigger as Complete. It specifically looks for the Triggered condition with a True status and Complete reason.
  • Response Handling: Captures the snapshot_uid from the resource status and returns a structured SnapshotResponse, providing the user with the unique identifier needed for future restores.
  • Lifecycle Cleanup: Updates the __exit__ method to ensure that all PodSnapshotManualTrigger resources created during the session are automatically deleted, maintaining a clean namespace.

Testing Done:

  • Integration Test: Added test_podsnapshot_extension.py which verifies the full E2E flow:
    -- Starts a sandbox with a counter application.
    -- Creates two sequential snapshots (test-snapshot-10 at 10 seconds and test-snapshot-20 at 20 seconds).

  • Unit tests are added

Follow-up PR:
Restore example, is_restored() check added in the following PR (#249)
List_snapshots, delete_snapshots and restoring from dedicated snapshot will be added in the following PRs.

Output:

  • Integration Test (clients/python/agentic-sandbox-client/test_podsnapshot_extension.py):
$ python3 test_podsnapshot_extension.py --template-name python-counter-template --namespace sandbox-test
--- Starting Sandbox Client Test (Namespace: sandbox-test, Port: 8888) ---

***** Phase 1: Starting Counter *****

======= Testing Pod Snapshot Extension =======
Creating first pod snapshot 'test-snapshot-10' after 10 seconds...
Trigger Name: test-snapshot-10-20260306-190022-b6016fa2
Snapshot UID: 69debd3e-3518-4185-8d09-720f933b2100
Success: True
Error Code: 0
Error Reason: 

Creating second pod snapshot 'test-snapshot-20' after 10 seconds...
Trigger Name: test-snapshot-20-20260306-190035-874a66a1
Snapshot UID: a10f67a6-bd84-4684-b8bd-506cde0f7757
Success: True
Error Code: 0
Error Reason: 
Recent snapshot UID: a10f67a6-bd84-4684-b8bd-506cde0f7757
--- Pod Snapshot Test Passed! ---

--- Sandbox Client Test Finished ---
  • Unit test (clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/test_podsnapshot_client.py):
$ python3 k8s_agent_sandbox/gke_extensions/test_podsnapshot_client
.py
.......ERROR:k8s_agent_sandbox.gke_extensions.podsnapshot_client:Failed to create PodSnapshotManualTrigger 'test-trigger-fe4e166a': (Create failed)
Reason: None
Traceback (most recent call last):
  File "/usr/local/google/home/shrutiyam/Documents/python-test/split1/agent-sandbox/clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py", line 214, in snapshot
    created_obj = self.custom_objects_api.create_namespaced_custom_object(
        group=PODSNAPSHOT_API_GROUP,
    ...<3 lines>...
        body=manifest,
    )
  File "/usr/lib/python3.13/unittest/mock.py", line 1169, in __call__
    return self._mock_call(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/unittest/mock.py", line 1173, in _mock_call
    return self._execute_mock_call(*args, **kwargs)
           ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/unittest/mock.py", line 1228, in _execute_mock_call
    raise effect
kubernetes.client.exceptions.ApiException: (Create failed)
Reason: None

...ERROR:k8s_agent_sandbox.gke_extensions.podsnapshot_client:Snapshot creation timed out for trigger 'test-trigger-7faa61c6': Snapshot manual trigger 'test-trigger-7faa61c6' was not processed within 1 seconds.
Traceback (most recent call last):
  File "/usr/local/google/home/shrutiyam/Documents/python-test/split1/agent-sandbox/clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py", line 224, in snapshot
    snapshot_result = self._wait_for_snapshot_processed(
        trigger_name, resource_version
    )
  File "/usr/local/google/home/shrutiyam/Documents/python-test/split1/agent-sandbox/clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py", line 176, in _wait_for_snapshot_processed
    raise TimeoutError(
        f"Snapshot manual trigger '{trigger_name}' was not processed within {self.podsnapshot_timeout} seconds."
    )
TimeoutError: Snapshot manual trigger 'test-trigger-7faa61c6' was not processed within 1 seconds.
.
----------------------------------------------------------------------
Ran 11 tests in 1.452s

OK

@netlify
Copy link
Copy Markdown

netlify bot commented Feb 21, 2026

Deploy Preview for agent-sandbox canceled.

Name Link
🔨 Latest commit fe57349
🔍 Latest deploy log https://app.netlify.com/projects/agent-sandbox/deploys/69b39cdf05aae400087caba3

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 21, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @shrutiyam-glitch. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Feb 21, 2026
@janetkuo
Copy link
Copy Markdown
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 21, 2026
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 6, 2026
@SHRUTI6991
Copy link
Copy Markdown
Contributor

/lgtm

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@SHRUTI6991: changing LGTM is restricted to collaborators

Details

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot removed the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 9, 2026
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Mar 9, 2026

import logging
from kubernetes import client
import os
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import os seems to be unused in this file. Consider removing it to keep the imports clean.

"""Result of a snapshot processing operation."""

snapshot_uid: str
snapshot_timestamp: str
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snapshot_timestamp is extracted and stored in SnapshotResult, but it is never utilized or returned in SnapshotResponse. Should we add it to the final SnapshotResponse so the user knows when the snapshot occurred

"""Parses the object to extract snapshot details."""
status = obj.get("status", {})
conditions = status.get("conditions", [])
for condition in conditions:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the the API responds with an explicit null for conditions ({"conditions": null}), status.get("conditions", []) will return None. Iterating over None on the next line will raise a TypeError. Please use conditions = status.get("conditions") or [] to ensure it always defaults to a list.

and condition.get("status") == "True"
and condition.get("reason") == "Complete"
):
snapshot_uid = status.get("snapshotCreated", {}).get("name")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is safer to write: snapshot_created = status.get("snapshotCreated") or {} and then snapshot_uid = snapshot_created.get("name"). Similar case as the on ebaove where the API can return a null value...

Returns:
SnapshotResponse: The result of the operation.
"""
timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this utc ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is the local time of the system wherever the code is running. Should it be UTC ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's fine

self.assertIn("test-trigger", result.trigger_name)

# Verify create call was made
self.client.custom_objects_api.create_namespaced_custom_object.assert_called_once()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To improve test coverage, you could use assert_called_once_with(...) instead of assert_called_once(). This ensures that the generated manifest payload accurately includes the generated trigger_name and the correct targetPod.


result = self.client.snapshot("test-retry")

self.assertTrue(result.success)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When asserting result.success, we could use self.assertTrue(result.success, result.error_reason). This will print the actual error message in the test output if the test fails unexpectedly.

Copy link
Copy Markdown
Member

@vicentefb vicentefb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 13, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: shrutiyam-glitch, vicentefb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 13, 2026
@k8s-ci-robot k8s-ci-robot merged commit 719301a into kubernetes-sigs:main Mar 13, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants