Skip to content

fix(python-client): harden tracing, uploads, and execution response handling#501

Open
wllbo wants to merge 2 commits intokubernetes-sigs:mainfrom
wllbo:python-sdk-fixes
Open

fix(python-client): harden tracing, uploads, and execution response handling#501
wllbo wants to merge 2 commits intokubernetes-sigs:mainfrom
wllbo:python-sdk-fixes

Conversation

@wllbo
Copy link
Copy Markdown
Contributor

@wllbo wllbo commented Apr 1, 2026

Issues surfaced during Go SDK review (#424) that also apply to Python SDK.

  • Replace initialize_tracer() with create_tracer_provider() factory, the SDK no longer calls trace.set_tracer_provider().
  • Raise ValueError on upload paths with directory separators instead of silently stripping to basename.
  • Reject execution responses larger than 16 MB with a clear error.
  • Use generateName for claim creation instead of manual uuid.uuid4() suffix.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 1, 2026
@netlify
Copy link
Copy Markdown

netlify bot commented Apr 1, 2026

Deploy Preview for agent-sandbox canceled.

Name Link
🔨 Latest commit eddf245
🔍 Latest deploy log https://app.netlify.com/projects/agent-sandbox/deploys/69d9740cd02fc400087368be

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @wllbo. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 1, 2026
@aditya-shantanu
Copy link
Copy Markdown
Contributor

/assign @SHRUTI6991

from k8s_agent_sandbox.trace_manager import trace_span, trace

# Maximum response size for command execution (16 MB).
MAX_EXECUTION_RESPONSE_SIZE = 16 * 1024 * 1024
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we enforcing this limit?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It guards against the client parsing and propagating unexpectedly large command output. 16 MB matches the Go SDK limit from #424. For larger outputs, callers should use file I/O instead.


@trace_span("run")
def run(self, command: str, timeout: int = 60) -> ExecutionResult:
"""Executes a command. Rejects responses larger than 16 MB."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You may be wanna parametrize this based on the MAX_EXECUTION_RESPONSE_SIZE value.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, docstring now references the constant by name


class SandboxTracerConfig(BaseModel):
"""Configuration for tracer level information"""
model_config = {"arbitrary_types_allowed": True}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: add a comment about this field.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment explaining it's needed for the tracer_provider field


try:
self._create_claim(claim_name, template, namespace)
claim_name = self._create_claim(template, namespace)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change!

Initializes the global OpenTelemetry TracerProvider using the singleton pattern.

This function uses double-checked locking to ensure thread-safe, one-time initialization.
def create_tracer_provider(service_name: str) -> "TracerProvider | None":
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@igooch can you review this change as well?

provider.add_span_processor(
BatchSpanProcessor(OTLPSpanExporter())
)
atexit.register(provider.shutdown)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trace provider is shutdown when the client goes out of scope right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not on client GC, atexit fires at process exit. Python's __del__ is unreliable, so atexit ensures BatchSpanProcessor flushes remaining spans before the process terminates.

If the caller passes their own provider via SandboxTracerConfig.tracer_provider, create_tracer_provider is never called, so we don't register anything. They own the lifecycle.

@aditya-shantanu
Copy link
Copy Markdown
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 2, 2026
@wllbo wllbo force-pushed the python-sdk-fixes branch from 6296f2e to f8f171b Compare April 10, 2026 21:36
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: aditya-shantanu, wllbo
Once this PR has been reviewed and has the lgtm label, please ask for approval from shruti6991. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 10, 2026
@wllbo wllbo force-pushed the python-sdk-fixes branch from f8f171b to eddf245 Compare April 10, 2026 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:rich-client cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants