Skip to content

feat(windows): native Windows support — platform abstraction layer#946

Open
lkomali wants to merge 3 commits into
mainfrom
lkomali/aiperf-windows-port
Open

feat(windows): native Windows support — platform abstraction layer#946
lkomali wants to merge 3 commits into
mainfrom
lkomali/aiperf-windows-port

Conversation

@lkomali
Copy link
Copy Markdown
Contributor

@lkomali lkomali commented May 15, 2026

Summary

Foundational PR for native Windows support. Establishes the platform abstraction layer needed for aiperf profile to run on Windows for low-concurrency single-shot benchmarks.

What's in this PR

  • IS_WINDOWS / IS_MACOS / IS_LINUX centralized in common/constants.py
  • ZMQ TCP loopback fallback for Windows (pyzmq wheels don't support ipc://); deterministic port from sha256(path/ipc_filename) in range 28000–48000
  • signal.signal() fallback for SIGINT + SIGBREAK on Windows (asyncio add_signal_handler is POSIX-only)
  • WindowsSelectorEventLoopPolicy swap before loop creation (pyzmq's add_reader/writer aren't on Proactor)
  • uvloop auto-disabled on Windows (not supported there)
  • ForkProcess dropped from type unions (Windows has no os.fork)
  • ASCII [OK] instead of U+2713 in tokenizer logs (cp1252 safety)
  • TCP port hash range widened to 20000 (collision math: 0.56% within-process at 14 sockets)

Stacked follow-up

lkomali/aiperf-windows-test-cleanup is the follow-up PR — adds Windows production fixes that surfaced through testing (SO_SNDBUF skip, SIGKILL→SIGTERM, path.anchor, DCGM auto-skip, dashboard non-TTY fallback) + the integration test infrastructure. Windows CI should remain disabled until that one lands too.

Summary by CodeRabbit

  • New Features

    • Added Windows TCP fallback for IPC and improved Windows asyncio event-loop configuration.
  • Bug Fixes

    • Preserve backslashes in Windows CLI command parsing.
    • More reliable Windows signal handling for graceful async shutdown.
  • Improvements

    • Updated validation/status output markers to text labels (e.g. [OK]/(OK)/(FAIL)) and adjusted tokenizer summary separator.
  • Tests

    • Expanded platform-aware tests and added platform-specific skips.

Review Change Stack

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 15, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 15, 2026

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@a5313a619a5c5dc34bf2e758bfeb2776240989e7

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@a5313a619a5c5dc34bf2e758bfeb2776240989e7

Last updated for commit: a5313a6Browse code

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5dcea13a-275b-49b0-978d-b422cb8057aa

📥 Commits

Reviewing files that changed from the base of the PR and between 84bc97b and a5313a6.

📒 Files selected for processing (1)
  • tests/unit/conftest.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/unit/conftest.py

Walkthrough

Refactors OS detection to import-time constants; adds Windows asyncio event-loop policy setup and Windows-specific synchronous signal handlers that bridge into asyncio; implements cross-platform ZMQ address helper (IPC vs deterministic TCP); narrows process typing; adjusts command parsing and broad test-suite platform handling.

Changes

Windows Cross-Platform Support

Layer / File(s) Summary
Platform detection constants foundation
src/aiperf/common/constants.py, src/aiperf/common/bootstrap.py, src/aiperf/common/environment.py, src/aiperf/controller/system_mixins.py
Centralizes boolean platform detection as IS_WINDOWS, IS_MACOS, IS_LINUX constants computed at import time, replacing scattered platform.system() calls across bootstrap, environment, and signal handler modules.
Windows asyncio event loop policy configuration
src/aiperf/common/bootstrap.py, tests/unit/common/test_bootstrap_windows.py
_configure_event_loop_policy_for_platform() sets asyncio.WindowsSelectorEventLoopPolicy on Windows before event loop creation to enable asyncio socket reader/writer operations; called in bootstrap pre-run sequence.
Windows signal handler bridging
src/aiperf/controller/system_mixins.py, tests/unit/controller/test_system_mixins.py
setup_signal_handlers() conditionally installs SIGINT/SIGBREAK: on Windows via signal.signal() with asyncio.run_coroutine_threadsafe() bridging; on non-Windows via loop.add_signal_handler() with async task tracking.
Windows ZMQ socket address strategy
src/aiperf/config/comm/ipc.py, src/aiperf/config/comm/dual_bind.py
_build_socket_address helper returns ipc:// on non-Windows and deterministic tcp://127.0.0.1:<sha256-derived-port> on Windows; four ZMQ config classes delegate address construction to this helper.
Environment logging and output formatting updates
src/aiperf/common/environment.py, src/aiperf/common/tokenizer_display.py, src/aiperf/plugin/cli.py, tests/unit/common/test_tokenizer_display.py, tests/unit/plugin/test_cli.py
auto_disable_uvloop_on_windows() gates logging to parent process only; tokenizer validation changes from checkmark/bullet to [OK] prefix and `
Process type refinement and command parsing fixes
src/aiperf/controller/multiprocess_service_manager.py, tests/harness/utils.py, tests/unit/test_harness_utils.py
MultiProcessRunInfo.process type narrowed from Process | SpawnProcess | ForkProcess | None to Process | None; harness _parse_command() uses shlex.split(posix=(sys.platform != "win32")) to preserve Windows path backslashes.
Test infrastructure and platform-specific test skipping
tests/unit/conftest.py, tests/unit/common/test_environment.py, tests/unit/test_cli_runner_macos.py, tests/unit/transports/test_tcp_connector.py, tests/unit/zmq/test_dual_bind.py
_patch_platform_constants() helper patches IS_WINDOWS/IS_MACOS/IS_LINUX across modules; module-level platform skips for macOS-only and Linux capability tests; environment tests rewritten to patch constants; ZMQ IPC tests skip on Windows where IPC unavailable in pyzmq wheels.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🐰 Hoppity hop across Windows and POSIX,
Platform constants make the path less opaque,
Asyncio loops dance on Selector gates,
ZMQ sockets find TCP on Windows' estates,
While signals bridge threadsafe through the asyncio straits! 🎯

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.95% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(windows): native Windows support — platform abstraction layer' directly corresponds to the main objective of the PR: adding native Windows support with a platform abstraction layer (centralized platform flags, platform-specific handling for signals/event loops/IPC, and Windows compatibility fixes).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Warning

Review ran into problems

🔥 Problems

Stopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a @coderabbit review after the pipeline has finished.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
tests/unit/zmq/test_dual_bind.py (1)

119-119: ⚡ Quick win

Use raw strings for regex patterns to fix RUF043 warnings.

The regex patterns contain metacharacters (square brackets) but are not raw strings. Ruff warns because the pattern should explicitly indicate it's a regex by using a raw string prefix.

🔧 Proposed fix
-        with pytest.raises(ValueError, match="[Pp]ath is required"):
+        with pytest.raises(ValueError, match=r"[Pp]ath is required"):

Apply the same change at line 629:

-        with pytest.raises(ValueError, match="[Pp]ath is required"):
+        with pytest.raises(ValueError, match=r"[Pp]ath is required"):

Also applies to: 629-629

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/zmq/test_dual_bind.py` at line 119, The pytest.raises calls that
use match="[Pp]ath is required" should use a raw string to avoid regex escaping
warnings: update the match argument in the pytest.raises(ValueError, match=...)
at the occurrence referencing ValueError (line with match "[Pp]ath is required")
and the other occurrence at the later test (around the second occurrence) to use
a raw string literal (r"...") so the pattern is r"[Pp]ath is required"; locate
these in tests/unit/zmq/test_dual_bind.py where pytest.raises is invoked and
replace the non-raw match strings with raw string equivalents.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/harness/utils.py`:
- Around line 610-615: The tokens returned by shlex.split(...,
posix=(sys.platform != "win32")) on Windows keep surrounding quote characters,
so after calling shlex.split (the args variable) when running on win32 iterate
over args and strip a matching surrounding pair of single or double quotes from
each token (e.g., if token[0] == token[-1] and token[0] in ("'", '"') then
remove them) before applying the existing aiperf check; update the code that
constructs/returns args to perform this quote-stripping only when posix is False
(Windows) so paths like "C:\Users\..." become unquoted tokens.

In `@tests/unit/conftest.py`:
- Line 639: Add an explicit return type annotation to _patch_platform_constants:
annotate it as returning an ExitStack (i.e., def _patch_platform_constants(... )
-> ExitStack:) and ensure ExitStack is imported (or use contextlib.ExitStack or
typing.ContextManager[None] if you prefer). This matches its actual return (a
context manager) as used by the callers (_patch_platform_constants).

In `@tests/unit/zmq/test_dual_bind.py`:
- Around line 15-27: The _skip_on_windows_ipc pytest marker is defined in the
middle of the import block (between stdlib and project imports) causing E402;
move the definition of _skip_on_windows_ipc so that all imports (stdlib,
third-party, and aiperf/project imports like those currently below line 27)
appear first at the top of the module, then define _skip_on_windows_ipc
immediately after the import block to resolve the lint error; ensure the
identifier _skip_on_windows_ipc and its pytest.mark.skipif(...) expression are
unchanged apart from relocation.

---

Nitpick comments:
In `@tests/unit/zmq/test_dual_bind.py`:
- Line 119: The pytest.raises calls that use match="[Pp]ath is required" should
use a raw string to avoid regex escaping warnings: update the match argument in
the pytest.raises(ValueError, match=...) at the occurrence referencing
ValueError (line with match "[Pp]ath is required") and the other occurrence at
the later test (around the second occurrence) to use a raw string literal
(r"...") so the pattern is r"[Pp]ath is required"; locate these in
tests/unit/zmq/test_dual_bind.py where pytest.raises is invoked and replace the
non-raw match strings with raw string equivalents.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9777c7bd-21d3-4e54-8b43-7ce58b6218f7

📥 Commits

Reviewing files that changed from the base of the PR and between dbd93ec and 4a2d729.

📒 Files selected for processing (22)
  • src/aiperf/cli_runner.py
  • src/aiperf/common/bootstrap.py
  • src/aiperf/common/config/zmq_config.py
  • src/aiperf/common/constants.py
  • src/aiperf/common/environment.py
  • src/aiperf/common/tokenizer_display.py
  • src/aiperf/controller/multiprocess_service_manager.py
  • src/aiperf/controller/system_mixins.py
  • src/aiperf/plugin/cli.py
  • tests/harness/utils.py
  • tests/unit/common/config/test_service_config.py
  • tests/unit/common/config/test_zmq_config.py
  • tests/unit/common/test_bootstrap_windows.py
  • tests/unit/common/test_environment.py
  • tests/unit/common/test_tokenizer_display.py
  • tests/unit/conftest.py
  • tests/unit/controller/test_system_mixins.py
  • tests/unit/plugin/test_cli.py
  • tests/unit/test_cli_runner_macos.py
  • tests/unit/test_harness_utils.py
  • tests/unit/transports/test_tcp_connector.py
  • tests/unit/zmq/test_dual_bind.py

Comment thread tests/harness/utils.py
Comment on lines +610 to 615
# POSIX-mode shlex treats backslash as an escape character, which
# strips backslashes from Windows paths (C:\Users\... becomes
# C:Users...). On Windows we parse in non-POSIX mode so backslashes
# in interpolated paths are preserved.
args = shlex.split(cmd, posix=(sys.platform != "win32"))
return args[1:] if args and args[0] == "aiperf" else args
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
python - <<'PY'
import shlex
cmd = r'aiperf profile --file "C:\Users\test user\data.jsonl" --request-count 5'
print("posix=False:", shlex.split(cmd, posix=False))
print("posix=True :", shlex.split(cmd, posix=True))
PY

Repository: ai-dynamo/aiperf

Length of output: 1955


🏁 Script executed:

python3 - <<'PY'
import shlex

# Test case: Windows path with spaces in quoted argument
cmd = r'aiperf profile --file "C:\Users\test user\data.jsonl" --request-count 5'

result_posix_false = shlex.split(cmd, posix=False)
result_posix_true = shlex.split(cmd, posix=True)

print("posix=False result:")
for i, arg in enumerate(result_posix_false):
    print(f"  [{i}] {repr(arg)}")

print("\nposix=True result:")
for i, arg in enumerate(result_posix_true):
    print(f"  [{i}] {repr(arg)}")

# Check if quotes are preserved
has_quotes_false = any('"' in arg for arg in result_posix_false)
print(f"\nQuotes preserved in posix=False: {has_quotes_false}")
print(f"Quotes preserved in posix=True: {any('"' in arg for arg in result_posix_true)}")
PY

Repository: ai-dynamo/aiperf

Length of output: 295


🌐 Web query:

shlex.split posix parameter behavior quotes non-POSIX mode documentation

💡 Result:

In Python's shlex module, the posix parameter in shlex.split(s, comments=False, posix=True) determines whether the parser follows POSIX shell parsing rules [1][2]. When posix=True (the default), shlex adheres to standard shell parsing rules, including the handling of quotes and escape characters [1][3]. In this mode, quotes are stripped from the resulting tokens, and they do not act as word separators (e.g., "Do"Not"Separate" is parsed as the single token DoNotSeparate) [1][4]. When posix=False (non-POSIX mode), the parser uses a legacy compatibility mode [3][5]. This mode is generally simpler and does not handle backslash escaping or complex quoting rules as accurately as POSIX mode [3]. Specifically, in non-POSIX mode, quote characters are often preserved in the output tokens rather than being stripped, and backslashes may not be treated as escape characters [3][6]. In summary, if you are parsing command lines that resemble standard Unix-like shell syntax, using posix=True is recommended to ensure correct handling of quotes and escapes [3]. If you require the legacy behavior—where quotes might be preserved or parsing rules differ—you may use posix=False [3][6].

Citations:


🏁 Script executed:

python3 - <<'PY'
import shlex

# Test case: Windows path with spaces in quoted argument
cmd = r'aiperf profile --file "C:\Users\test user\data.jsonl" --request-count 5'

result_posix_false = shlex.split(cmd, posix=False)
result_posix_true = shlex.split(cmd, posix=True)

print("posix=False result:")
for i, arg in enumerate(result_posix_false):
    print(f"  [{i}] {repr(arg)}")

print("\nposix=True result:")
for i, arg in enumerate(result_posix_true):
    print(f"  [{i}] {repr(arg)}")

# Check if quotes are preserved
has_quotes_false = any('"' in arg for arg in result_posix_false)
has_quotes_true = any('"' in arg for arg in result_posix_true)
print(f"\nQuotes preserved in posix=False: {has_quotes_false}")
print(f"Quotes preserved in posix=True: {has_quotes_true}")
PY

Repository: ai-dynamo/aiperf

Length of output: 421


Strip quotes from tokens when using non-POSIX shlex.split() on Windows.

When posix=False, shlex.split() preserves quote characters in the resulting tokens. On Windows, this means a quoted argument like "C:\Users\test user\data.jsonl" becomes a token with literal quotes: '"C:\\Users\\test user\\data.jsonl"'. Downstream code expecting unquoted paths will fail. Strip matching surrounding quote pairs on Windows to normalize behavior with the POSIX path:

Proposed fix
-        args = shlex.split(cmd, posix=(sys.platform != "win32"))
+        if sys.platform == "win32":
+            args = shlex.split(cmd, posix=False)
+            # shlex non-POSIX keeps quote chars; normalize wrapped tokens.
+            args = [
+                arg[1:-1] if len(arg) >= 2 and arg[0] == arg[-1] == '"' else arg
+                for arg in args
+            ]
+        else:
+            args = shlex.split(cmd, posix=True)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/harness/utils.py` around lines 610 - 615, The tokens returned by
shlex.split(..., posix=(sys.platform != "win32")) on Windows keep surrounding
quote characters, so after calling shlex.split (the args variable) when running
on win32 iterate over args and strip a matching surrounding pair of single or
double quotes from each token (e.g., if token[0] == token[-1] and token[0] in
("'", '"') then remove them) before applying the existing aiperf check; update
the code that constructs/returns args to perform this quote-stripping only when
posix is False (Windows) so paths like "C:\Users\..." become unquoted tokens.

Comment thread tests/unit/conftest.py Outdated
Comment thread tests/unit/zmq/test_dual_bind.py Outdated
…ebased on v2 config)

Signed-off-by: lkomali <lkomali@nvidia.com>
@lkomali lkomali force-pushed the lkomali/aiperf-windows-port branch from 4bb3ba9 to bff6081 Compare May 15, 2026 20:30
if IS_WINDOWS:
salt = f"{path}/{ipc_filename}"
digest = hashlib.sha256(salt.encode()).hexdigest()
port_offset = int(digest[:8], 16) % _WINDOWS_TCP_PORT_RANGE
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hashing each Windows IPC endpoint into a fixed 20,000-port range can map two production sockets in the same run to the same TCP port, causing intermittent address already in use startup failures. Fix: allocate and share a collision-free per-run port map, or resolve collisions by probing/reserving ports before services start.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

❌ Patch coverage is 89.47368% with 6 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/aiperf/config/comm/ipc.py 73.68% 5 Missing ⚠️
src/aiperf/common/environment.py 80.00% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

Signed-off-by: lkomali <lkomali@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants