Skip to content

fix(cli): bypass HTTP_PROXY for runner webhook via raw socket#563

Open
XWang20 wants to merge 1 commit intotiann:mainfrom
XWang20:fix/runner-webhook-bypass-bun-proxy
Open

fix(cli): bypass HTTP_PROXY for runner webhook via raw socket#563
XWang20 wants to merge 1 commit intotiann:mainfrom
XWang20:fix/runner-webhook-bypass-bun-proxy

Conversation

@XWang20
Copy link
Copy Markdown
Contributor

@XWang20 XWang20 commented May 5, 2026

Problem

In an environment with HTTP_PROXY set and a NO_PROXY that does not literally include 127.0.0.1, every "new session" / "resume session" from the web UI fails after a 15s wait with:

Runner last spawn error: Session webhook timeout for PID <n>

Reproducer: HTTP_PROXY=http://127.0.0.1:7890 + NO_PROXY="127.*,localhost" (the wildcard form glibc / wget accept but libcurl-style parsers don't). bun routes the loopback webhook through the proxy, the runner never sees it, and the parent gives up at the 15s mark. cloudflared escapes this only because its origin is the literal localhost, which NO_PROXY does match.

Root cause

bun honors HTTP_PROXY at process start for both fetch and node:http; neither API exposes a per-request bypass that actually takes effect (verified proxy: '', proxy: undefined, runtime mutation of process.env.NO_PROXY). The only transport in bun that ignores the proxy stack is node:net, since it is raw TCP.

Fix

Replace runnerPost's fetch with a minimal HTTP/1.1 client on node:net.connect. Public signature, return shape, error message format, and HAPI_RUNNER_HTTP_TIMEOUT semantics are unchanged, so all callers (notifyRunnerSessionStarted, listRunnerSessions, stopRunnerSession, spawnRunnerSession, stopRunnerHttp) keep working without modification.

The client sends Connection: close, so the runner control server closes after each response and the parser resolves on end.

Testing

  • cli unit tests pass (bun test runnerIdentity buildCliArgs)
  • End-to-end on a real spawn under bun + bad proxy env: webhook delivery 389ms vs the previous 15s timeout

Known follow-up

The minimal HTTP/1.1 parser does not honor Content-Length for early return and does not handle close without end. With Connection: close against the existing runner control server this is fine in practice, but a future patch should add a close handler so a mid-response runner crash fails immediately rather than waiting for HAPI_RUNNER_HTTP_TIMEOUT.

When HAPI runs in an environment with HTTP_PROXY/HTTPS_PROXY set and a
NO_PROXY that does not literally bypass 127.0.0.1, the loopback webhook
each spawned session sends back to the runner gets routed through the
user's proxy, fails to reach the runner's control port, and the parent
runner reports "Session webhook timeout for PID <n>" 15s later — every
"new session" / "resume session" from the web UI.

Reproducer: with HTTP_PROXY=http://127.0.0.1:7890 and the common
NO_PROXY="127.*,localhost" (the form glibc / wget accept but libcurl-
style parsers don't), bun's fetch and bun's node:http both forward
loopback requests to the proxy. cloudflared escapes this only because
its origin uses the literal "localhost", which NO_PROXY does match.
The runner uses 127.0.0.1 and gets caught.

bun honors HTTP_PROXY at process start for both fetch and node:http;
neither API exposes a per-request bypass that actually takes effect
(verified `proxy: ''`, `proxy: undefined`, runtime mutation of
`process.env.NO_PROXY`). The only transport in bun that ignores the
proxy stack is node:net, since it is raw TCP.

Replace `runnerPost`'s `fetch` call with a minimal HTTP/1.1 client
written on `node:net.connect`. Public signature, return shape, error
messages and `HAPI_RUNNER_HTTP_TIMEOUT` semantics are unchanged, so
all callers (`notifyRunnerSessionStarted`, `listRunnerSessions`,
`stopRunnerSession`, `spawnRunnerSession`, `stopRunnerHttp`) keep
working without modification.

Verified end-to-end on a real spawn under bun + bad proxy env: webhook
delivery 389ms vs the previous 15s timeout.
Copilot AI review requested due to automatic review settings May 5, 2026 06:30
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • No high-confidence issues found in the added/modified lines.

Summary

  • Review mode: initial
  • Reviewed cli/src/runner/controlClient.ts plus surrounding runner control server/client context. Residual risk: this hand-rolled HTTP path is covered only by the existing runner integration flow, which depends on the integration environment; a small focused unit test with a local node:net/HTTP server would better lock down response parsing and proxy-bypass behavior.

Testing

  • Not run (automation); static review only.

HAPI Bot

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the CLI’s runner control-plane HTTP client to reliably reach the local runner control server even when HTTP_PROXY is set and NO_PROXY is misformatted for Bun’s proxy parsing, by bypassing the proxy stack via a raw TCP socket.

Changes:

  • Replaced fetch()-based POSTs to the runner control server with a minimal HTTP/1.1 client over node:net.connect().
  • Preserved existing caller-facing API surface (return shape and error message patterns) while changing the underlying transport to avoid proxy interference.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


if (!response.ok) {
const errorMessage = `Request failed: ${path}, HTTP ${response.status}`;
const timeout = process.env.HAPI_RUNNER_HTTP_TIMEOUT ? parseInt(process.env.HAPI_RUNNER_HTTP_TIMEOUT) : 10_000;
`Content-Length: ${payload.length}\r\n` +
`Connection: close\r\n\r\n`;
socket.write(head);
socket.write(payload);

socket.on('error', (error) => {
settle(() => fail(error.message));
});
@tiann
Copy link
Copy Markdown
Owner

tiann commented May 6, 2026

Maybe you should set process.env.NO_PROXY earlier, i think this should work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants