Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 BUG: POST requests often fail with wrangler dev #5095

Closed
Cherry opened this issue Feb 26, 2024 · 4 comments · Fixed by #5106
Closed

🐛 BUG: POST requests often fail with wrangler dev #5095

Cherry opened this issue Feb 26, 2024 · 4 comments · Fixed by #5106
Assignees
Labels
bug Something that isn't working regression Break in existing functionality as a result of a recent change start-dev-worker Relating to the startDevWorker API

Comments

@Cherry
Copy link
Contributor

Cherry commented Feb 26, 2024

Which Cloudflare product(s) does this pertain to?

Wrangler core

What version(s) of the tool(s) are you using?

3.29.0

What version of Node are you using?

20.11.1

What operating system and version are you using?

Windows 11

Describe the Bug

Observed behavior

POST requests with data fail ~50% of the time, with a generic Error: Network connection lost.. I've had multiple users in Discord reproduce this bug.

Expected behavior

I would expect all requests to suceed.

Steps to reproduce

worker.js

export default {
	async fetch(request, env, ctx) {
		return new Response('HELLO');
	},
};
  • npx wrangler dev worker.js
  • Then run the following command over and over:

Windows: curl.exe http://127.0.0.1:8787 -d '{"foo": "bar"}'
Mac: curl http://127.0.0.1:8787 -d '{"foo": "bar"}'

Every other request (in my environment) will fail. I've had others reproduce this with the same failure rate on Windows 11, and a much lower rate (only about 2 or 3 every 20 reqs on MacOS).

PS E:\GitHub\white-water-2a7c> curl.exe http://127.0.0.1:8787 -d '{"foo": "bar"}'
Error: Network connection lost.
PS E:\GitHub\white-water-2a7c> curl.exe http://127.0.0.1:8787 -d '{"foo": "bar"}'
HELLO
PS E:\GitHub\white-water-2a7c> curl.exe http://127.0.0.1:8787 -d '{"foo": "bar"}'
Error: Network connection lost.
PS E:\GitHub\white-water-2a7c> curl.exe http://127.0.0.1:8787 -d '{"foo": "bar"}'
HELLO
PS E:\GitHub\white-water-2a7c> curl.exe http://127.0.0.1:8787 -d '{"foo": "bar"}'
Error: Network connection lost.
PS E:\GitHub\white-water-2a7c> curl.exe http://127.0.0.1:8787 -d '{"foo": "bar"}'
HELLO
PS E:\GitHub\white-water-2a7c> curl.exe http://127.0.0.1:8787 -d '{"foo": "bar"}'
Error: Network connection lost.
PS E:\GitHub\white-water-2a7c> curl.exe http://127.0.0.1:8787 -d '{"foo": "bar"}'
HELLO
PS E:\GitHub\white-water-2a7c> curl.exe http://127.0.0.1:8787 -d '{"foo": "bar"}'
Error: Network connection lost.
PS E:\GitHub\white-water-2a7c> curl.exe http://127.0.0.1:8787 -d '{"foo": "bar"}'
HELLO
PS E:\GitHub\white-water-2a7c> curl.exe http://127.0.0.1:8787 -d '{"foo": "bar"}'
Error: Network connection lost.
PS E:\GitHub\white-water-2a7c> curl.exe http://127.0.0.1:8787 -d '{"foo": "bar"}'
HELLO
PS E:\GitHub\white-water-2a7c> curl.exe http://127.0.0.1:8787 -d '{"foo": "bar"}'
HELLO
PS E:\GitHub\white-water-2a7c> curl.exe http://127.0.0.1:8787 -d '{"foo": "bar"}'
Error: Network connection lost.

cc @RamIdeas

This is something I've been observing with some contractors recently, working with HTTP clients within games. I've had a really hard time getting to the bottom of it, but with this issue, appear to have a reliably reproduction now.

Note: this is a major regression. If I use npx [email protected] dev worker.js, I can no longer reproduce the issue. Any wrangler version >= 3.19.0 and this breaks.

Likely related:

@MattIPv4
Copy link

MattIPv4 commented Feb 26, 2024

👀 I went to repro this using npx wrangler@latest dev index.js and ab -c 1 -n 100 -p payload.json http://127.0.0.1:8787/ to save needing to put curl http://127.0.0.1:8787 -d '{"foo": "bar"}' in a loop and noticed that ab would report a very consistent 5s per request, which seemed insanely slow (while the wrangler logs were reporting a few milliseconds for each request). I was also not able to reproduce the issue with ab.

I then took to putting curl in a loop and could immediately reproduce it, with curl was taking a few milliseconds (aligning with the wrangler logs), not 5 seconds, to exit each time. So I asked myself why? I suspected wrangler was holding the connection open, with ab waiting for it to close but curl exiting as soon as it saw the right number of bytes based on the response content-length header...

Yup, that's the reason ab is slower I believe:

time curl -vvv http://127.0.0.1:8787 -d '{"foo": "bar"}'

*   Trying 127.0.0.1:8787...
* Connected to 127.0.0.1 (127.0.0.1) port 8787
> POST / HTTP/1.1
> Host: 127.0.0.1:8787
> User-Agent: curl/8.4.1-DEV
> Accept: */*
> Content-Length: 14
> Content-Type: application/x-www-form-urlencoded
> 
< HTTP/1.1 200 OK
< Content-Length: 5
< Content-Type: text/plain;charset=UTF-8
< 
* Connection #0 to host 127.0.0.1 left intact
HELLOcurl -vvv http://127.0.0.1:8787 -d '{"foo": "bar"}'  0.01s user 0.01s system 36% cpu 0.045 total
time curl -vvv --ignore-content-length http://127.0.0.1:8787 -d '{"foo": "bar"}'

*   Trying 127.0.0.1:8787...
* Connected to 127.0.0.1 (127.0.0.1) port 8787
> POST / HTTP/1.1
> Host: 127.0.0.1:8787
> User-Agent: curl/8.4.1-DEV
> Accept: */*
> Content-Length: 14
> Content-Type: application/x-www-form-urlencoded
> 
< HTTP/1.1 200 OK
< Content-Length: 5
< Content-Type: text/plain;charset=UTF-8
* no chunk, no close, no size. Assume close to signal end
< 
* Closing connection
HELLOcurl -vvv --ignore-content-length http://127.0.0.1:8787 -d '{"foo": "bar"}'  0.01s user 0.01s system 0% cpu 5.034 total

Given this, I very much suspect that the connection timeout issue being seen might be due to something in wrangler being overwhelmed as it still has previous connections being held open while the loop has moved onto the next curl command and the next request.

Running the worker with npx [email protected] dev index.js and running the same two commands would suggest this issue was introduced sometime after 3.18.0:

time curl -vvv http://127.0.0.1:8787 -d '{"foo": "bar"}'

*   Trying 127.0.0.1:8787...
* Connected to 127.0.0.1 (127.0.0.1) port 8787
> POST / HTTP/1.1
> Host: 127.0.0.1:8787
> User-Agent: curl/8.4.1-DEV
> Accept: */*
> Content-Length: 14
> Content-Type: application/x-www-form-urlencoded
> 
< HTTP/1.1 200 OK
< Content-Length: 5
< Content-Type: text/plain;charset=UTF-8
< 
* Connection #0 to host 127.0.0.1 left intact
HELLOcurl -vvv http://127.0.0.1:8787 -d '{"foo": "bar"}'  0.01s user 0.01s system 30% cpu 0.078 total
time curl -vvv --ignore-content-length http://127.0.0.1:8787 -d '{"foo": "bar"}'

*   Trying 127.0.0.1:8787...
* Connected to 127.0.0.1 (127.0.0.1) port 8787
> POST / HTTP/1.1
> Host: 127.0.0.1:8787
> User-Agent: curl/8.4.1-DEV
> Accept: */*
> Content-Length: 14
> Content-Type: application/x-www-form-urlencoded
> 
< HTTP/1.1 200 OK
< Content-Length: 5
< Content-Type: text/plain;charset=UTF-8
* no chunk, no close, no size. Assume close to signal end
< 
* Closing connection
HELLOcurl -vvv --ignore-content-length http://127.0.0.1:8787 -d '{"foo": "bar"}'  0.01s user 0.01s system 62% cpu 0.033 total

Running the curl commands with the worker on 3.19.0 returns the same 5s wait for the connection to close as seen on latest, so it would seem this was introduced between 3.18.0 and 3.19.0.

@petebacondarwin petebacondarwin added start-dev-worker Relating to the startDevWorker API regression Break in existing functionality as a result of a recent change labels Feb 26, 2024
@RamIdeas
Copy link
Contributor

Hi @Cherry! Thanks for putting this together – it really helped us hone in on the issue and I can confirm we can reliably reproduce the bug.

We believe we're seeing something similar to either cloudflare/workerd#960 or cloudflare/workerd#1376 or a mixture of them both. We will open another issue in the workerd repo if needs be.

We have a workaround confirmed to fix the issue which we will be able to hotfix into wrangler. The workaround essentially boils down to ensuring the request body is drained in the UserWorker. We need some time to confirm the workaround won't cause any other issues (with body reuse, etc) before releaseing.

In the meantime, to unblock you and your team, please ensure your Worker drains the request body stream with await request.arrayBuffer() or ctx.waitUntil(request.arrayBuffer()); (or anything similar) in all codepaths of worker. Please let us know if this workaround does not work for you (or even if it does, it would be great to confirm).

We will aim to get the hotfix released today or in tomorrow's regular release.

@MattIPv4
Copy link

👀 If consuming the body is the issue here, then the connection remaining opening appears to be a different issue? It happens on both POST w/ body and GET requests.

export default {
	async fetch(request, env, ctx) {
		await request.arrayBuffer();
		return new Response('HELLO');
	},
};
npx [email protected] dev index.js
time curl -vvv --ignore-content-length http://127.0.0.1:8787 -d '{"foo": "bar"}'

*   Trying 127.0.0.1:8787...
* Connected to 127.0.0.1 (127.0.0.1) port 8787
> POST / HTTP/1.1
> Host: 127.0.0.1:8787
> User-Agent: curl/8.4.1-DEV
> Accept: */*
> Content-Length: 14
> Content-Type: application/x-www-form-urlencoded
> 
< HTTP/1.1 200 OK
< Content-Length: 5
< Content-Type: text/plain;charset=UTF-8
* no chunk, no close, no size. Assume close to signal end
< 
* Closing connection
HELLOcurl -vvv --ignore-content-length http://127.0.0.1:8787 -d '{"foo": "bar"}'  0.01s user 0.01s system 0% cpu 5.037 total
time curl -vvv --ignore-content-length http://127.0.0.1:8787

*   Trying 127.0.0.1:8787...
* Connected to 127.0.0.1 (127.0.0.1) port 8787
> GET / HTTP/1.1
> Host: 127.0.0.1:8787
> User-Agent: curl/8.4.1-DEV
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Length: 5
< Content-Type: text/plain;charset=UTF-8
* no chunk, no close, no size. Assume close to signal end
< 
* Closing connection
HELLOcurl -vvv --ignore-content-length http://127.0.0.1:8787  0.01s user 0.01s system 0% cpu 5.105 total

@lrapoport-cf lrapoport-cf moved this from Untriaged to Backlog in workers-sdk Feb 26, 2024
RamIdeas added a commit that referenced this issue Feb 26, 2024
potential workaround for #5095
@RamIdeas
Copy link
Contributor

Hi @MattIPv4 – that is indeed odd but, as you said, is likely unrelated. I have created this issue in workerd to follow up. If you have anymore info, please post there. Thanks for highlighting!

petebacondarwin pushed a commit that referenced this issue Feb 27, 2024
potential workaround for #5095
petebacondarwin pushed a commit that referenced this issue Feb 28, 2024
potential workaround for #5095
@github-project-automation github-project-automation bot moved this from Backlog to Done in workers-sdk Feb 29, 2024
petebacondarwin added a commit that referenced this issue Feb 29, 2024
… workerd (#5106)

* ensure request.body is drained

potential workaround for #5095

* regression test

* fixup! regression test

* fixup! regression test

Use larger body and ensure request read before response sent

* allow request draining to be disabled and add changeset

* remove unnecessary export

* fixup! ensure request.body is drained

---------

Co-authored-by: Peter Bacon Darwin <[email protected]>
Co-authored-by: bcoll <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that isn't working regression Break in existing functionality as a result of a recent change start-dev-worker Relating to the startDevWorker API
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants