Implement list and delete snapshot functionality in Python SDK#448
Implement list and delete snapshot functionality in Python SDK#448shrutiyam-glitch wants to merge 10 commits intokubernetes-sigs:mainfrom
Conversation
✅ Deploy Preview for agent-sandbox canceled.
|
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: shrutiyam-glitch The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
...ts/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/test_podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/test_podsnapshot_extension.py
Outdated
Show resolved
Hide resolved
|
/assign codebot-robot |
|
@aditya-shantanu: GitHub didn't allow me to assign the following users: codebot-robot. Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/assign @codebot-robot |
|
@barney-s: GitHub didn't allow me to assign the following users: codebot-robot. Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/assign @codebot-robot |
|
@aditya-shantanu: GitHub didn't allow me to assign the following users: codebot-robot. Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
can we base the PR off of this PR: #467?
|
codebot-robot
left a comment
There was a problem hiding this comment.
Overall, this is a solid implementation that covers the core functionality and includes thorough unit and integration tests. I've left several comments focused on edge cases, defensive programming, and reliability.
The main areas to address are:
- Hardening
.get()calls to safely handleNonevalues returned by the Kubernetes API forstatusandmetadata. - Changing truthiness checks (
if snapshot_uid:) to explicitNonechecks (if snapshot_uid is not None:) to prevent accidental mass deletion if an empty string is provided. - Making the integration tests safer against concurrent execution by avoiding hardcoded labels and exact length assertions.
Please review the inline comments for detailed suggestions.
(This review was generated by Overseer)
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/test_podsnapshot_extension.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/test_podsnapshot_extension.py
Outdated
Show resolved
Hide resolved
|
/hold |
codebot-robot
left a comment
There was a problem hiding this comment.
Overall the PR introduces the requested list and delete functionality solidly and includes comprehensive test coverage.
There are a few areas we should refine before merging:
- Edge case handling when metadata or UIDs are missing/malformed.
- Preventing invalid labels from failing the
listrequest. - Consistency in error propagation and exception usage.
Please review the detailed inline comments for specific suggestions.
(This review was generated by Overseer)
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
...ts/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/test_podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot.md
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
codebot-robot
left a comment
There was a problem hiding this comment.
The PR effectively introduces list_snapshots and delete_snapshots along with comprehensive unit and integration tests.
Most suggestions focus on minor maintainability improvements such as avoiding magic strings, explicitly formatting docstrings, and improving observability in error scenarios.
Adding a request timeout for list API calls and verifying pagination handling would also improve resilience against non-ideal network states.
Thanks for the excellent contribution!
(This review was generated by Overseer)
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot.md
Outdated
Show resolved
Hide resolved
...ts/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/test_podsnapshot_client.py
Outdated
Show resolved
Hide resolved
...ts/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/test_podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/test_podsnapshot_extension.py
Outdated
Show resolved
Hide resolved
codebot-robot
left a comment
There was a problem hiding this comment.
Overall, this is a solid implementation that correctly extends the SDK's capabilities to manage PodSnapshots. The addition of sorting logic and conditional bulk deletion are great features.
The review highlights a few areas for improvement:
- Defensive Programming: Hardening the parsing logic against unexpected or malformed Kubernetes API responses (e.g., missing status fields) to prevent unhandled
AttributeErrors. - Error Handling & Logging: Refining error messages to explicitly distinguish between total and partial batch deletion failures, and cleaning up log outputs when optional parameters like
grouping_labelsare omitted. - Edge Cases: Properly handling terminal failure states (so they aren't merely marked as "NotReady") and documenting the subtle behavior of passing an empty dictionary for grouping labels.
- Test Coverage: Adding a few targeted assertions to verify fallback paths and addressing hardcoded labels in shared templates.
Addressing these minor issues will make the client significantly more robust against cluster-side anomalies.
(This review was generated by Overseer)
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
...ts/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/test_podsnapshot_client.py
Outdated
Show resolved
Hide resolved
...ts/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/test_podsnapshot_client.py
Outdated
Show resolved
Hide resolved
...ts/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/test_podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/test_podsnapshot_extension.py
Outdated
Show resolved
Hide resolved
codebot-robot
left a comment
There was a problem hiding this comment.
Thank you for the detailed PR. The list_snapshots and delete_snapshots functionality looks solid and significantly improves the lifecycle management capabilities of the Python SDK.
The testing coverage is also quite comprehensive. I've left a few comments mainly focused on refining the logging strategies (to prevent noisy logs in large clusters), fortifying the exception handling for K8s API errors, and making the integration test assertions slightly more resilient against timing fluctuations.
\n\n*(This review was generated by Overseer)*
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
...ts/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/test_podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/podsnapshot_client.py
Outdated
Show resolved
Hide resolved
The PR is now merged. we can remove the hold. @shrutiyam-glitch |
|
/unhold |
codebot-robot
left a comment
There was a problem hiding this comment.
Overall, excellent work on this solid PR! It successfully addresses the issue by expanding the SDK with robust list and batch-deletion capabilities. The use of Pydantic dataclasses for structuring K8s API responses is a great pattern.
While the implementation generally handles bulk operation errors appropriately, I have identified a few critical edge cases and areas for improvement:
- Unhandled
ValidationError: The response parsing loop inlist()is outside thetry...exceptblock, meaning malformed resources could crash the client instead of returning a failed result. - Timeout ignored: The boolean return value of
wait_for_snapshot_deletionis currently ignored indelete(), meaning timed-out deletions are incorrectly reported as successful. - Watch Stream termination: K8s watch streams can silently close before the timeout, which would result in premature timeout failures.
- Inconsistent 404 handling:
deleted_snapshotsbehaves differently if a 404 occurs immediately versus during the wait cycle. - Race conditions: Ensure edge cases are hardened regarding race conditions when polling deletions.
Additionally, I have left a few minor suggestions for improving the resilience of the integration tests to avoid flakiness and slightly reducing log noise during the startup lifecycle.
Great work! Let me know if you'd like to discuss any of these.
(This review was generated by Overseer)
| """ | ||
| # Check if already deleted | ||
| try: | ||
| k8s_helper.custom_objects_api.get_namespaced_custom_object( |
There was a problem hiding this comment.
Calling get_namespaced_custom_object to check existence before starting the watch stream creates a small race condition. If the object gets deleted in the milliseconds before w.stream starts, the stream will hang until the 60s timeout. Consider extracting the resourceVersion from the get call and passing it to w.stream so no events are missed.
.../python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/snapshots/snapshot_engine.py
Outdated
Show resolved
Hide resolved
| errors = [] | ||
| for uid in snapshots_to_delete: | ||
| # Delete PodSnapshot | ||
| try: |
There was a problem hiding this comment.
wait_for_snapshot_deletion returns a boolean indicating whether the deletion was confirmed or timed out. Currently, this return value is ignored, and the uid is unconditionally appended to deleted_snapshots even if the deletion timed out and potentially failed. You should check the return value and handle the timeout case appropriately (e.g., by adding it to errors instead of deleted_snapshots).
.../python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/snapshots/snapshot_engine.py
Outdated
Show resolved
Hide resolved
.../python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/snapshots/snapshot_engine.py
Outdated
Show resolved
Hide resolved
.../python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/snapshots/snapshot_engine.py
Outdated
Show resolved
Hide resolved
| error_code=SNAPSHOT_ERROR_CODE, | ||
| ) | ||
| except ValidationError as e: | ||
| logger.error(f"Malformed snapshot data: {e}") |
There was a problem hiding this comment.
The current try...except ValidationError block wraps the entire loop and returns success=False for the list operation if any snapshot is malformed. This means a single bad snapshot prevents the user from listing or deleting any valid snapshots for that pod. Is this expected ?
You could move this try-catch inside the loop to simply skip and log malformed items instead of aborting the whole operation.
codebot-robot
left a comment
There was a problem hiding this comment.
Overall, the PR introduces solid improvements for managing Pod snapshots through the Python SDK and follows existing conventions well. The abstractions for filtering and batch deletion are well-designed, and the addition of Pydantic models alongside comprehensive unit and integration tests greatly improves the SDK's robustness.
However, please review the provided inline comments carefully. A few areas need attention:
- Test Errors: The integration test is passing an incorrect parameter name (
filter_valueinstead oflabel_value), which will cause aTypeError. - Falsy Checks: Conditional checks around empty dictionaries/strings evaluate to falsy, leading to unexpected behaviors (e.g.
labels={}orsnapshot_uid=""silently doing nothing or performing mass deletions). - Race Conditions: The Kubernetes watch setup in
wait_for_snapshot_deletionintroduces a race condition ifresource_versionis not extracted from the initial check. - API Responses & Edge Cases: There are a few places where
Noneresponses from the K8s API could lead toAttributeErrors. Care is also needed when handling specific object formats from the API. - Typing Enhancements: Ensure common result formats share a base class to reduce code duplication.
- Logging: Optimize log levels to prevent spam during batch deletion operations.
Addressing these issues will ensure the SDK is highly robust.
(This review was generated by Overseer)
|
|
||
| print(f"\nDeleting all snapshots for sandbox '{sandbox.sandbox_id}'...") | ||
| delete_result = sandbox.snapshots.delete_all( | ||
| delete_by="labels", filter_value=grouping_labels |
There was a problem hiding this comment.
The argument filter_value is incorrect here. The delete_all method signature in snapshot_engine.py expects the parameter to be named label_value. Using filter_value will result in a TypeError.
| error_code=SNAPSHOT_ERROR_CODE, | ||
| ) | ||
| snapshots_to_delete = [s.snapshot_uid for s in snapshots_result.snapshots] | ||
| elif labels: |
There was a problem hiding this comment.
If labels is an empty dictionary {}, elif labels: evaluates to False. Thus, calling delete_all(delete_by='labels', label_value={}) completely skips this block and returns success without attempting deletion. Consider elif labels is not None:.
| """Waits for the PodSnapshot to be deleted from the cluster.""" | ||
| # Check if already deleted | ||
| try: | ||
| k8s_helper.custom_objects_api.get_namespaced_custom_object( |
There was a problem hiding this comment.
The get_namespaced_custom_object call checks if the object already doesn't exist. However, the returned object is discarded. You should extract the resourceVersion from the returned object and pass it into the subsequent watch stream. Without this, if the snapshot is deleted between the get call and the stream starting, the stream will miss the DELETED event and hang until timeout.
This PR implements the ability to list and delete Pod Snapshots within the
PodSnapshotSandboxClient.Core Logic Implementation:
Snapshot Listing (sandbox.snapshots.list)-- Support for a
filter_byparameter to allow users to filter snapshots by state (e.g., ready_only) or grouping labels-- Snapshots are returned sorted by creation timestamp (newest first) to simplify "latest-available" restoration logic -- Leverages Pydantic models for reliable parsing of Kubernetes API responses, ensuring type safety for snapshot metadata and status fields.
Single Snapshot Deletion (sandbox.snapshots.delete)-- Supports deleting snapshots by specific UID
Multiple Snapshot Deletion (sandbox.snapshots.delete_all)-- Performs bulk deletion for all snapshots associated with the current Sandbox.
-- Implements
wait_for_snapshot_deletionusing Kubernetes watch streams. The logic has been hardened to correctly handleresourceVersionto avoid race conditions during the watch initialization.-- Correctly propagates timeout results and distinguishes between successful deletions and partial batch failures.
Testing Done:
Integration Test: Added
test_podsnapshot_extension.pywhich verifies the full E2E flow:-- creating multiple snapshots, listing them, and performing a cleanup deletion
Unit tests are added
Output: