fix(cli): bypass HTTP_PROXY for runner webhook via raw socket#563
Open
XWang20 wants to merge 1 commit intotiann:mainfrom
Open
fix(cli): bypass HTTP_PROXY for runner webhook via raw socket#563XWang20 wants to merge 1 commit intotiann:mainfrom
XWang20 wants to merge 1 commit intotiann:mainfrom
Conversation
When HAPI runs in an environment with HTTP_PROXY/HTTPS_PROXY set and a NO_PROXY that does not literally bypass 127.0.0.1, the loopback webhook each spawned session sends back to the runner gets routed through the user's proxy, fails to reach the runner's control port, and the parent runner reports "Session webhook timeout for PID <n>" 15s later — every "new session" / "resume session" from the web UI. Reproducer: with HTTP_PROXY=http://127.0.0.1:7890 and the common NO_PROXY="127.*,localhost" (the form glibc / wget accept but libcurl- style parsers don't), bun's fetch and bun's node:http both forward loopback requests to the proxy. cloudflared escapes this only because its origin uses the literal "localhost", which NO_PROXY does match. The runner uses 127.0.0.1 and gets caught. bun honors HTTP_PROXY at process start for both fetch and node:http; neither API exposes a per-request bypass that actually takes effect (verified `proxy: ''`, `proxy: undefined`, runtime mutation of `process.env.NO_PROXY`). The only transport in bun that ignores the proxy stack is node:net, since it is raw TCP. Replace `runnerPost`'s `fetch` call with a minimal HTTP/1.1 client written on `node:net.connect`. Public signature, return shape, error messages and `HAPI_RUNNER_HTTP_TIMEOUT` semantics are unchanged, so all callers (`notifyRunnerSessionStarted`, `listRunnerSessions`, `stopRunnerSession`, `spawnRunnerSession`, `stopRunnerHttp`) keep working without modification. Verified end-to-end on a real spawn under bun + bad proxy env: webhook delivery 389ms vs the previous 15s timeout.
There was a problem hiding this comment.
Findings
- No high-confidence issues found in the added/modified lines.
Summary
- Review mode: initial
- Reviewed
cli/src/runner/controlClient.tsplus surrounding runner control server/client context. Residual risk: this hand-rolled HTTP path is covered only by the existing runner integration flow, which depends on the integration environment; a small focused unit test with a localnode:net/HTTP server would better lock down response parsing and proxy-bypass behavior.
Testing
- Not run (automation); static review only.
HAPI Bot
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the CLI’s runner control-plane HTTP client to reliably reach the local runner control server even when HTTP_PROXY is set and NO_PROXY is misformatted for Bun’s proxy parsing, by bypassing the proxy stack via a raw TCP socket.
Changes:
- Replaced
fetch()-based POSTs to the runner control server with a minimal HTTP/1.1 client overnode:net.connect(). - Preserved existing caller-facing API surface (return shape and error message patterns) while changing the underlying transport to avoid proxy interference.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| if (!response.ok) { | ||
| const errorMessage = `Request failed: ${path}, HTTP ${response.status}`; | ||
| const timeout = process.env.HAPI_RUNNER_HTTP_TIMEOUT ? parseInt(process.env.HAPI_RUNNER_HTTP_TIMEOUT) : 10_000; |
| `Content-Length: ${payload.length}\r\n` + | ||
| `Connection: close\r\n\r\n`; | ||
| socket.write(head); | ||
| socket.write(payload); |
|
|
||
| socket.on('error', (error) => { | ||
| settle(() => fail(error.message)); | ||
| }); |
Owner
|
Maybe you should set |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
In an environment with
HTTP_PROXYset and aNO_PROXYthat does not literally include127.0.0.1, every "new session" / "resume session" from the web UI fails after a 15s wait with:Reproducer:
HTTP_PROXY=http://127.0.0.1:7890+NO_PROXY="127.*,localhost"(the wildcard form glibc / wget accept but libcurl-style parsers don't). bun routes the loopback webhook through the proxy, the runner never sees it, and the parent gives up at the 15s mark. cloudflared escapes this only because its origin is the literallocalhost, which NO_PROXY does match.Root cause
bun honors
HTTP_PROXYat process start for bothfetchandnode:http; neither API exposes a per-request bypass that actually takes effect (verifiedproxy: '',proxy: undefined, runtime mutation ofprocess.env.NO_PROXY). The only transport in bun that ignores the proxy stack isnode:net, since it is raw TCP.Fix
Replace
runnerPost'sfetchwith a minimal HTTP/1.1 client onnode:net.connect. Public signature, return shape, error message format, andHAPI_RUNNER_HTTP_TIMEOUTsemantics are unchanged, so all callers (notifyRunnerSessionStarted,listRunnerSessions,stopRunnerSession,spawnRunnerSession,stopRunnerHttp) keep working without modification.The client sends
Connection: close, so the runner control server closes after each response and the parser resolves onend.Testing
bun test runnerIdentity buildCliArgs)Known follow-up
The minimal HTTP/1.1 parser does not honor
Content-Lengthfor early return and does not handleclosewithoutend. WithConnection: closeagainst the existing runner control server this is fine in practice, but a future patch should add aclosehandler so a mid-response runner crash fails immediately rather than waiting forHAPI_RUNNER_HTTP_TIMEOUT.