Skip to content

feat: configurable WebSocket invocation payload limit across engine + SDKs#1593

Open
ytallo wants to merge 1 commit into
mainfrom
feat/ws-payload-size-limit
Open

feat: configurable WebSocket invocation payload limit across engine + SDKs#1593
ytallo wants to merge 1 commit into
mainfrom
feat/ws-payload-size-limit

Conversation

@ytallo
Copy link
Copy Markdown
Contributor

@ytallo ytallo commented May 3, 2026

Summary

Lands a single, configurable WebSocket payload ceiling end-to-end (default 16 MiB), replacing inconsistent per-language behavior with one engine-enforced limit:

  • Engine enforces the ceiling via iii-worker-manager.max_message_size, applied to both WebSocketUpgrade::max_message_size and max_frame_size
  • Every SDK (Node, Python, Rust) defaults to the same value via a new init option
  • Producers refuse oversized envelopes locally (fail-fast) instead of triggering server-side disconnects
  • New typed error code invocation_failed_payload_too_large lets callers branch on payload-size failures distinct from generic invocation_stopped
  • New SDK exception type per language: IIIPayloadTooLarge (Python, Node) / IIIError::PayloadTooLarge (Rust)

Behavior change worth flagging

Python users get a 15× larger default. Before this PR, the Python SDK silently inherited the websockets library's 1 MiB max-message-size while Node and Rust effectively had no ceiling. Python callers occasionally saw mysterious invocation_stopped errors that were actually the hidden 1 MiB cap firing.

After this PR, the Python default is 16 MiB, matching Node, Rust, and the engine. Payloads between 1 MiB and 16 MiB that previously failed in Python will now succeed.

For payloads consistently above 16 MiB, use channels — they chunk over the same WebSocket without the per-message limit.

See docs/changelog/0-11-0/payload-size-limit.mdx for the full migration checklist, sizing notes (base64 + JSON envelope inflation), and the new error-codes reference.

Verification

The commit message records local test results from the author. To re-verify on this PR:

  • CI green: engine build + tests, SDK CI for Node/Python/Rust, license-check
  • cargo test --tests --no-fail-fast clean
  • cd sdk/packages/python/iii && uv run pytest tests/test_payload_limits.py
  • pnpm vitest run tests/payload-limits.test.ts
  • cargo test --test payload_limits
  • npx mintlify validate for the docs build
  • Integration tests gated on III_URL pass against a live engine before release tag

Summary by CodeRabbit

Release Notes

  • New Features

    • Configurable WebSocket message size limits across SDKs (default 16 MiB)
    • New IIIPayloadTooLarge error for client-side payload validation
    • New invocation_failed_payload_too_large error code for oversized invocations
  • Documentation

    • Added comprehensive error codes API reference
    • Updated channel usage guidance with explicit payload limits
  • Bug Fixes

    • Python SDK: default message size limit increased from 1 MiB to 16 MiB

… SDKs

Replace the silent 1 MiB cliff with an engine-enforced 16 MiB default that's
configurable end-to-end. Adds a specific error code, producer-side guards,
and aligns Python/Node/Rust SDKs on a single ceiling.

Engine
- WorkerManagerConfig.max_message_size (default 16 MiB), applied via
  WebSocketUpgrade::max_message_size/max_frame_size on both ws_handler and
  otel_ws_handler.
- New error code invocation_failed_payload_too_large emitted from
  cleanup_worker when a worker disconnects due to a WS Capacity error.
  invocation_stopped continues to cover clean disconnects, shutdown, and EOF.
- WS recv errors logged at WARN with peer/worker_id/error.
- DisconnectReason enum + halt_invocation_with_reason path.

SDKs (Python / Node / Rust)
- max_message_size / maxMessageSize init option, default 16 MiB.
- IIIPayloadTooLarge / IIIError::PayloadTooLarge raised producer-side
  before the WS send when the envelope exceeds the limit.
- Cross-language error string aligned: "Payload {n} bytes exceeds invocation
  limit {limit} bytes. For binary blobs use channels: <docs URL>".
- Python ws.connect now passes max_size, replacing the silent 1 MiB default
  inherited from the websockets library.

Tests (TDD: red first, then green)
- engine/tests/ws_payload_limit_e2e.rs (4 e2e: oversize disconnect emits
  payload_too_large; clean close still emits invocation_stopped; configured
  limit; at-limit success).
- Unit tests for halt_invocation_with_reason and WorkerManagerConfig
  default + override.
- SDK unit tests verify option plumbing, default value, producer-guard
  raise, at-limit success. Each SDK ships an integration test gated on
  III_URL that validates the engine contract end-to-end.

Docs
- Rewrote use-channels rule-of-thumb with the actual ceiling, base64+JSON
  inflation math, and links to the new error-codes reference.
- New api-reference/error-codes.mdx enumerates engine error bodies grouped
  by area; codes pulled from engine/src/.
- Init-option rows added to sdk-python / sdk-node / sdk-rust references.
- changelog/0-11-0/payload-size-limit.mdx documents the new config, error
  code, SDK options, and the Python 1 MiB → 16 MiB behavior change.

Verification
- engine: cargo test --tests --no-fail-fast → 1805 passed.
- python: uv run pytest tests/test_payload_limits.py → 10 passed, 1 skipped.
- node: pnpm vitest run tests/payload-limits.test.ts → 4 passed, 1 skipped.
- rust: cargo test --test payload_limits → 4 passed, 1 ignored.
- docs: npx mintlify validate → success.
- Integration tests need III_URL pointing at a live engine; not blocking
  this commit.
@mintlify
Copy link
Copy Markdown

mintlify Bot commented May 3, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
iii 🟢 Ready View Preview May 3, 2026, 4:39 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented May 3, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
iii-website Ready Ready Preview, Comment May 3, 2026 4:39pm

Request Review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 3, 2026

📝 Walkthrough

Walkthrough

This PR implements configurable WebSocket invocation payload size limits across the engine and SDKs (Node, Python, Rust). The engine now classifies disconnects based on payload-too-large events and halts in-flight invocations with specific error codes; each SDK adds configurable max_message_size initialization options and producer-side guards that refuse to send oversized payloads before WebSocket round-trips, paired with typed error classes.

Changes

End-to-End Payload Size Limits

Layer / File(s) Summary
Data Shape & Error Types
engine/src/worker_connections/mod.rs, sdk/packages/*/iii/src/error*, sdk/packages/rust/iii/src/lib.rs (InitOptions)
DisconnectReason enum (PayloadTooLarge/Other) added to engine; IIIPayloadTooLarge exception/error variants added across SDKs; InitOptions extended with maxMessageSize/max_message_size field (default 16 MiB).
Engine Halt & Classification
engine/src/invocation/mod.rs, engine/src/engine/mod.rs
New halt_invocation_with_reason method sends specific error code/message instead of generic invocation_stopped; classify_recv_error helper maps WebSocket errors to DisconnectReason, enabling reason-specific halting during cleanup.
SDK Producer Guards & Config
sdk/packages/node/iii/src/errors.ts, sdk/packages/python/iii/src/iii/errors.py, sdk/packages/rust/iii/src/error.rs, sdk/packages/*/iii/src/iii*.ts, sdk/packages/*/iii/src/iii.py, sdk/packages/*/iii/src/iii.rs
Each SDK adds pre-flight payload size validation that rejects oversized serialized messages with IIIPayloadTooLarge before sending; each stores and applies configured max_message_size from initialization.
Worker Manager & WebSocket Config
engine/src/workers/worker/mod.rs, sdk/packages/rust/iii/src/channels.rs
Engine's WorkerManagerConfig adds max_message_size field; apply_message_size_limit helper applies limit to WebSocket upgrade; each SDK wires max_message_size into WebSocket client configuration at connection time.
Integration & Wiring
engine/tests/rbac_infrastructure_e2e.rs, sdk/packages/python/iii/src/iii/__init__.py, sdk/packages/node/iii/src/index.ts, sdk/packages/rust/iii/src/lib.rs
register_worker resolves and applies max_message_size on SDK initialization; exports and module wiring updated across Node/Python/Rust; RBAC e2e test updated to use struct defaults.
Tests & Documentation
engine/tests/ws_payload_limit_e2e.rs, sdk/packages/*/iii/tests/*, docs/api-reference/*, docs/changelog/0-11-0/payload-size-limit.mdx, docs/how-to/use-channels.mdx, engine/config.yaml, docs/docs.json
E2E test suite validates engine disconnect/error-code behavior; SDK test suites verify guard enforcement and configuration wiring; docs updated with error codes reference, SDK API changes, changelog, and guidance on when to use channels vs. increasing limits.

Sequence Diagram(s)

(Skipped — changes introduce validation gates and error classification rather than new sequential component interactions; the control flow enhancement is self-evident.)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

Possibly related PRs

Suggested reviewers

  • sergiofilhowz
  • guibeira

🐰 A payload guard so snug and tight,
Sixteen mebibytes before the flight!
Now engines sing with classified cheer,
When oversized blobs start to appear!
Channels await those giants too,
A humble rabbit's code for you! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is comprehensive and well-structured. It includes a clear summary of changes, explains the behavior change in Python, provides verification steps, and includes a detailed migration checklist reference. However, the required Apache 2 license checkbox is not checked. Check the Apache 2 license checkbox at the end of the PR description to confirm licensing compliance before merging.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely describes the main change: introducing a configurable WebSocket payload limit across the engine and SDKs with a consistent 16 MiB default.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/ws-payload-size-limit

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
sdk/packages/python/iii/src/iii/iii_constants.py (1)

76-90: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate max_message_size when InitOptions is constructed.

0 or negative values currently create a nonsensical ceiling and can make every non-empty envelope fail much later during send/connect. A small __post_init__ guard would fail fast on bad config.

Suggested fix
 `@dataclass`
 class InitOptions:
@@
     headers: dict[str, str] | None = None
     telemetry: TelemetryOptions | None = None
     max_message_size: int = DEFAULT_MAX_MESSAGE_SIZE
+
+    def __post_init__(self) -> None:
+        if self.max_message_size <= 0:
+            raise ValueError("max_message_size must be > 0")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/packages/python/iii/src/iii/iii_constants.py` around lines 76 - 90, Add
validation for max_message_size in the InitOptions constructor by implementing a
__post_init__ on the InitOptions dataclass (or equivalent initializer) that
checks the max_message_size field; if max_message_size is None or <= 0, raise a
ValueError with a clear message referencing max_message_size and the
DEFAULT_MAX_MESSAGE_SIZE constant, ensuring InitOptions (and related fields like
worker_name, reconnection_config) fail fast on invalid sizes instead of allowing
downstream send/connect errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@engine/src/engine/mod.rs`:
- Around line 1444-1455: In handle_otel, when matching Some(Err(_)) mirror the
new WARN path used for the main worker socket: call classify_recv_error on the
error, emit a tracing::warn including peer, worker.id, the error and the reason
(same fields as the other branch), then call
worker.set_disconnect_reason(reason).await and break; this ensures oversize
telemetry frames produce the same peer/error logging and disconnect reason as
the main worker recv handling.

In `@engine/tests/ws_payload_limit_e2e.rs`:
- Around line 283-292: The test currently guesses 256 bytes of slack when
constructing big_blob, which can break if the JSON envelope changes; instead
build the JSON envelope (result_msg) with a provisional blob, measure its
serialized length via to_string().len(), and reduce/truncate big_blob until the
serialized message length is <= the configured WS frame limit (use the same
limit constant used by production). Update the test to loop or compute the
correct blob size by measuring result_msg.to_string().len() and trimming
big_blob before calling ws.send(WsMessage::Text(result_msg.to_string().into()))
so the frame is precisely under the limit.
- Around line 59-60: Replace the fixed
tokio::time::sleep(Duration::from_millis(...)).await waits with state-based
waiting: instead of sleeping before returning (port, engine), poll for a
concrete readiness condition (e.g., attempt a TCP connect to the listener port
in a retry loop with a short backoff or call the engine’s registration/readiness
API) until it succeeds or a timeout is reached; locate the sleep calls
(tokio::time::sleep and Duration::from_millis) that use the local variables port
and engine and replace them with the retry/poll logic, and apply the same change
to the other occurrence around the later lines (the second sleep at 107-108).
Ensure the loop returns an error or panics on timeout to preserve test
determinism.

In `@sdk/packages/python/iii/src/iii/iii.py`:
- Around line 360-371: The _assert_within_limit helper is only used from
trigger_async but the handler result path can still send an oversized
InvocationResultMessage via _send; update the response path to call
_assert_within_limit before sending handler results. Specifically, in the code
path that constructs/sends InvocationResultMessage (referencing trigger_async,
InvocationResultMessage and _send), serialize the message payload the same way
(_to_dict -> json.dumps -> .encode("utf-8")) and call _assert_within_limit (or
inline the same check using self._options.max_message_size) and raise
IIIPayloadTooLarge if it exceeds the limit, then proceed to call _send only for
messages that pass the check.

In `@sdk/packages/python/iii/tests/test_payload_limits.py`:
- Around line 200-223: The test
test_oversize_invocation_returns_payload_too_large_code currently only creates a
single III client (client = III(...)) and never registers a real callee for
"noop", so on a fresh engine it can fail with function_not_found instead of
exercising the payload-too-large path; fix it by provisioning a separate
worker/registration that actually owns "noop" before calling client.trigger:
start a second III instance (or worker helper) with default InitOptions (no
increased max_message_size), register a noop handler for function_id "noop" and
ensure it is connected/registered (wait until connected) so the engine routes
the invocation to that callee, then run the trigger and finally clean up the
worker registration/connection in the test teardown or finally block.
- Around line 138-163: The test test_trigger_below_limit_does_not_raise
currently replaces client._send with fake_send but never asserts it was invoked;
modify the test to record and assert the call to client._send (e.g., replace
fake_send with an AsyncMock or a coroutine that sets an asyncio.Event/flag)
before returning, run the coroutine as before with
asyncio.run_coroutine_threadsafe(call(), client._loop).result(...), then assert
the AsyncMock was called or the Event/flag is set to confirm the send path was
exercised (referencing client._send, fake_send, and the inner call coroutine).

---

Outside diff comments:
In `@sdk/packages/python/iii/src/iii/iii_constants.py`:
- Around line 76-90: Add validation for max_message_size in the InitOptions
constructor by implementing a __post_init__ on the InitOptions dataclass (or
equivalent initializer) that checks the max_message_size field; if
max_message_size is None or <= 0, raise a ValueError with a clear message
referencing max_message_size and the DEFAULT_MAX_MESSAGE_SIZE constant, ensuring
InitOptions (and related fields like worker_name, reconnection_config) fail fast
on invalid sizes instead of allowing downstream send/connect errors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 8d8ac356-e8ed-4d9f-9368-acc4d371911e

📥 Commits

Reviewing files that changed from the base of the PR and between 19a6f7e and ea7aff8.

⛔ Files ignored due to path filters (1)
  • sdk/packages/python/iii/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (29)
  • docs/api-reference/error-codes.mdx
  • docs/api-reference/sdk-node.mdx
  • docs/api-reference/sdk-python.mdx
  • docs/api-reference/sdk-rust.mdx
  • docs/changelog/0-11-0/payload-size-limit.mdx
  • docs/docs.json
  • docs/how-to/use-channels.mdx
  • engine/config.yaml
  • engine/src/engine/mod.rs
  • engine/src/invocation/mod.rs
  • engine/src/worker_connections/mod.rs
  • engine/src/workers/worker/mod.rs
  • engine/tests/rbac_infrastructure_e2e.rs
  • engine/tests/ws_payload_limit_e2e.rs
  • sdk/packages/node/iii/src/errors.ts
  • sdk/packages/node/iii/src/iii.ts
  • sdk/packages/node/iii/src/index.ts
  • sdk/packages/node/iii/tests/payload-limits.test.ts
  • sdk/packages/python/iii/pyproject.toml
  • sdk/packages/python/iii/src/iii/__init__.py
  • sdk/packages/python/iii/src/iii/errors.py
  • sdk/packages/python/iii/src/iii/iii.py
  • sdk/packages/python/iii/src/iii/iii_constants.py
  • sdk/packages/python/iii/tests/test_payload_limits.py
  • sdk/packages/rust/iii/src/channels.rs
  • sdk/packages/rust/iii/src/error.rs
  • sdk/packages/rust/iii/src/iii.rs
  • sdk/packages/rust/iii/src/lib.rs
  • sdk/packages/rust/iii/tests/payload_limits.rs

Comment thread engine/src/engine/mod.rs
Comment on lines +1444 to +1455
Some(Err(err)) => {
let reason = classify_recv_error(&err);
tracing::warn!(
peer = %peer,
worker_id = %worker.id,
error = %err,
reason = ?reason,
"Worker WS recv error"
);
worker.set_disconnect_reason(reason).await;
break;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Mirror the new WARN path in handle_otel.

This adds recv-error logging for the main worker socket, but /otel still treats Some(Err(_)) as a silent break. Oversize telemetry frames will keep disconnecting without the peer/error context this PR now emits for normal worker sockets.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/src/engine/mod.rs` around lines 1444 - 1455, In handle_otel, when
matching Some(Err(_)) mirror the new WARN path used for the main worker socket:
call classify_recv_error on the error, emit a tracing::warn including peer,
worker.id, the error and the reason (same fields as the other branch), then call
worker.set_disconnect_reason(reason).await and break; this ensures oversize
telemetry frames produce the same peer/error logging and disconnect reason as
the main worker recv handling.

Comment on lines +59 to +60
tokio::time::sleep(Duration::from_millis(150)).await;
(port, engine)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Replace the fixed sleeps with state-based waiting.

These sleeps are the only synchronization before the helper starts connecting and invoking. On slower CI, that can race listener startup or function registration and make the suite nondeterministic. Waiting for a concrete signal/retry condition here would make the E2E contract much more stable.

Also applies to: 107-108

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/tests/ws_payload_limit_e2e.rs` around lines 59 - 60, Replace the fixed
tokio::time::sleep(Duration::from_millis(...)).await waits with state-based
waiting: instead of sleeping before returning (port, engine), poll for a
concrete readiness condition (e.g., attempt a TCP connect to the listener port
in a retry loop with a short backoff or call the engine’s registration/readiness
API) until it succeeds or a timeout is reached; locate the sleep calls
(tokio::time::sleep and Duration::from_millis) that use the local variables port
and engine and replace them with the retry/poll logic, and apply the same change
to the other occurrence around the later lines (the second sleep at 107-108).
Ensure the loop returns an error or panics on timeout to preserve test
determinism.

Comment on lines +283 to +292
// Build a payload that sits just under the limit. Reserve ~256 bytes
// for the JSON envelope.
let big_blob = "z".repeat(1024 * 1024 - 256);
let result_msg = json!({
"type": "invocationresult",
"invocation_id": invocation_id.to_string(),
"function_id": "echo",
"result": { "blob": big_blob },
});
ws.send(WsMessage::Text(result_msg.to_string().into()))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Measure the serialized frame instead of guessing 256 bytes of slack.

This boundary test is tied to an estimate, not the actual on-wire size. If the envelope grows by even a couple of fields, the “at limit” case can start failing spuriously while the production code is still correct. Build the JSON first, measure to_string().len(), and trim the blob until it is within the configured limit.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/tests/ws_payload_limit_e2e.rs` around lines 283 - 292, The test
currently guesses 256 bytes of slack when constructing big_blob, which can break
if the JSON envelope changes; instead build the JSON envelope (result_msg) with
a provisional blob, measure its serialized length via to_string().len(), and
reduce/truncate big_blob until the serialized message length is <= the
configured WS frame limit (use the same limit constant used by production).
Update the test to loop or compute the correct blob size by measuring
result_msg.to_string().len() and trimming big_blob before calling
ws.send(WsMessage::Text(result_msg.to_string().into())) so the frame is
precisely under the limit.

Comment on lines +360 to +371
def _assert_within_limit(self, msg: Any) -> None:
"""Reject oversize invocation envelopes before they reach the WS.

Raises IIIPayloadTooLarge if the serialized message exceeds
``InitOptions.max_message_size``. Pre-flight rejection prevents one
oversize message from tearing the WS connection and halting every
in-flight invocation on the worker.
"""
limit = self._options.max_message_size
encoded = json.dumps(self._to_dict(msg)).encode("utf-8")
if len(encoded) > limit:
raise IIIPayloadTooLarge(payload_bytes=len(encoded), limit_bytes=limit)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Guard oversized handler results with the same preflight check.

This helper is only used from trigger_async(). A local handler that returns an oversize result still sends InvocationResultMessage through _send() unguarded, which can trip the engine-side WS limit and disconnect the worker. That leaves the Python SDK exposed to the same connection-tearing failure mode this change is trying to eliminate.

One way to wire the same protection into the response path
-            await self._send(
-                InvocationResultMessage(
-                    invocation_id=invocation_id,
-                    function_id=path,
-                    result=result,
-                    traceparent=response_traceparent,
-                )
-            )
+            result_msg = InvocationResultMessage(
+                invocation_id=invocation_id,
+                function_id=path,
+                result=result,
+                traceparent=response_traceparent,
+            )
+            try:
+                self._assert_within_limit(result_msg)
+            except IIIPayloadTooLarge as exc:
+                await self._send(
+                    InvocationResultMessage(
+                        invocation_id=invocation_id,
+                        function_id=path,
+                        error={
+                            "code": "invocation_failed_payload_too_large",
+                            "message": str(exc),
+                        },
+                        traceparent=response_traceparent,
+                    )
+                )
+                return
+            await self._send(result_msg)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/packages/python/iii/src/iii/iii.py` around lines 360 - 371, The
_assert_within_limit helper is only used from trigger_async but the handler
result path can still send an oversized InvocationResultMessage via _send;
update the response path to call _assert_within_limit before sending handler
results. Specifically, in the code path that constructs/sends
InvocationResultMessage (referencing trigger_async, InvocationResultMessage and
_send), serialize the message payload the same way (_to_dict -> json.dumps ->
.encode("utf-8")) and call _assert_within_limit (or inline the same check using
self._options.max_message_size) and raise IIIPayloadTooLarge if it exceeds the
limit, then proceed to call _send only for messages that pass the check.

Comment thread sdk/packages/python/iii/tests/test_payload_limits.py
Comment thread sdk/packages/python/iii/tests/test_payload_limits.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant