Skip to content

fix(http): fallback when OS rejects large HTTP socket buffers#994

Open
ntny wants to merge 3 commits into
ai-dynamo:mainfrom
ntny:fix(http)-fallback-if-os-rejects-large-socker-buffers
Open

fix(http): fallback when OS rejects large HTTP socket buffers#994
ntny wants to merge 3 commits into
ai-dynamo:mainfrom
ntny:fix(http)-fallback-if-os-rejects-large-socker-buffers

Conversation

@ntny
Copy link
Copy Markdown

@ntny ntny commented May 26, 2026

AIPerf configures 10 MiB send/receive buffers for HTTP streaming sockets during startup. On some macOS systems, the default kern.ipc.maxsockbuf limit is lower than the requested size. In this case, socket initialization can fail with:

  OSError(55, 'No buffer space available')

This can be worked around by increasing the kernel limit:

sysctl -w kern.ipc.maxsockbuf=16777216

or by lowering AIPerf's socket buffer settings via AIPERF_HTTP_SO_RCVBUF / AIPERF_HTTP_SO_SNDBUF.
Users can work around this by tuning the OS limit or overriding AIPerf's socket buffer settings, but AIPerf should also work with default OS settings. The configured socket buffer size is a throughput optimization, not a correctness requirement, so startup should not fail solely because the OS rejects the requested optimization.

This PR adds a fallback mechanism that retries socket initialization with progressively smaller buffer sizes when the OS rejects the requested value with ENOBUFS. A warning is logged once per fallback tuple, a final error is logged if even the minimum fallback size cannot be applied, and unrelated socket errors continue to be raised normally.

also adds tests covering the fallback and logging behavior.

Summary by CodeRabbit

  • Bug Fixes

    • Improved socket buffer handling: automatically retries smaller buffer sizes when OS rejects large values, logs clearer socket option names, and emits each fallback warning only once; includes a minimum fallback threshold to prevent infinite retries.
  • Tests

    • Added unit tests covering buffer fallback retries, non-retryable error propagation, minimum-fallback failure logging, option-label logging, and warning deduplication.

Review Change Stack

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 26, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Socket buffer sizing now uses SocketDefaults._set_socket_buffer: it maps option names, attempts requested SO_RCVBUF/SO_SNDBUF, retries with halved sizes on ENOBUFS down to MIN_SOCKET_BUFFER_FALLBACK_BYTES while logging distinct (option, requested, fallback) once; apply_to_socket uses this and tests validate behavior.

Changes

Socket buffer fallback mechanism and integration

Layer / File(s) Summary
Socket buffer fallback mechanism
src/aiperf/transports/http_defaults.py
Added MIN_SOCKET_BUFFER_FALLBACK_BYTES, reorganized imports and logging, added _get_socket_option_label(), _logged_buffer_fallbacks, and SocketDefaults._set_socket_buffer() which sets socket buffer sizes and on ENOBUFS retries with halved candidates while logging each distinct (option, requested, fallback) once. Non-ENOBUFS OSError is re-raised.
Integration into apply_to_socket
src/aiperf/transports/http_defaults.py
Replaced direct setsockopt() calls in apply_to_socket() with SocketDefaults._set_socket_buffer() calls for SO_RCVBUF and SO_SNDBUF.
Fallback behavior validation
tests/unit/transports/test_tcp_connector.py
Added errno and logging imports, expanded http_defaults imports to include SocketDefaults, introduced an autouse fixture to clear _logged_buffer_fallbacks between tests, and added tests validating ENOBUFS-triggered retry & warning, re-raising non-ENOBUFS errors, error when minimum fallback fails, readable option-label usage, numeric label fallback for unknown options, and deduplicated fallback warning logging across sockets.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I nudge a stubborn socket, try a size so grand,
The kernel shakes its head — I halve it by hand.
A single gentle warning for each tuple seen,
I try again, then settle where buffers are clean.
Hopping home, I log the tale, soft and keen.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: implementing a fallback mechanism for socket buffer configuration when the OS rejects large buffer sizes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 26, 2026

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@653493e162ee6431aeb683fd86fba8a801123e3d

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@653493e162ee6431aeb683fd86fba8a801123e3d

Last updated for commit: 653493eBrowse code

@github-actions github-actions Bot added the fix label May 26, 2026
Comment thread src/aiperf/transports/http_defaults.py Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented May 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/aiperf/transports/http_defaults.py`:
- Around line 53-76: The code currently forces a 1024 buffer size after the loop
which overrides requested values under 1024; change the tail so instead of
unconditionally calling sock.setsockopt(..., 1024) you attempt to set the
original requested value (value) if the loop was skipped or candidate dropped
below 1024: call sock.setsockopt(socket.SOL_SOCKET, option_name, value) and
handle OSError the same way as inside the loop (only swallow errno.ENOBUFS and
fall back by halving candidate), and only fall back to 1024 if all halved
candidates fail; keep using cls._logged_buffer_fallbacks and _logger for
warnings, and reference option_name, value, candidate, and sock.setsockopt in
the fix.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 861855c5-fc26-4bd0-85aa-67150072a754

📥 Commits

Reviewing files that changed from the base of the PR and between 8da0b8f and 80231e4.

📒 Files selected for processing (2)
  • src/aiperf/transports/http_defaults.py
  • tests/unit/transports/test_tcp_connector.py

Comment thread src/aiperf/transports/http_defaults.py
Copy link
Copy Markdown

@dynamo-ops dynamo-ops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous review comments have been addressed. Approving.

Copy link
Copy Markdown

@dynamo-ops dynamo-ops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous review comments have been addressed. Approving.

Copy link
Copy Markdown

@dynamo-ops dynamo-ops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous review comments have been addressed. Approving.

@ntny ntny force-pushed the fix(http)-fallback-if-os-rejects-large-socker-buffers branch from 25237d2 to 07a474e Compare May 27, 2026 07:44
@ntny ntny force-pushed the fix(http)-fallback-if-os-rejects-large-socker-buffers branch from 07a474e to e17db11 Compare May 28, 2026 10:55
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

ntny and others added 3 commits May 28, 2026 14:01
AIPerf configures 10 MiB send/receive buffers for HTTP streaming sockets by default. On macOS, the default `kern.ipc.maxsockbuf` can be lower than the requested
buffer size, causing socket initialization to fail with `OSError(55, 'No buffer space available')`.

The socket buffer size is a performance optimization, not a correctness requirement, so failing startup is unnecessarily strict. This change retries with
smaller buffer sizes when the OS rejects the requested value with `ENOBUFS`, logs the fallback, and keeps raising unrelated socket errors

Signed-off-by: ntny <ntny1986@gmail.com>
- fix: suppress repeated socket buffer fallback warnings and signed commits

Signed-off-by: ntny <ntny1986@gmail.com>
@ntny ntny force-pushed the fix(http)-fallback-if-os-rejects-large-socker-buffers branch from e17db11 to 653493e Compare May 28, 2026 11:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants