-
Notifications
You must be signed in to change notification settings - Fork 114
broken pipe
on fetches still
#1204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I've been running the previous reproducer in a loop for a while on the latest code, and for better or worse I don't see any broken pipes. So at the very least I don't think we've regressed on that particular fix. Next I'll look at the zstd:chunked case and see how that plays into things. |
Ok verified that ztd:chunked is consistently broken:
Aside, I wasted entirely too much time this morning trying to debug podman before I realized I was hitting containers/storage#2055. I had to remove podman-machine from the loop entirely and do the compression directly on my host with 64GB in order for it to actually push 😭 |
Debug output from
|
I'd probably add |
Here's the relevant bit which includes some thus-far unsuccessful attempts to fix this. Most notably I tried reapplying this patch because this feels like it's at least in the ballpark of the problem.
I'm thinking we might need to revisit ostreedev/ostree-rs-ext#672 |
Right, seeing EPIPE on both sides which definitely implies we need to hold open one side of the pipe longer. It's really interesting though that zstd:chunked would somehow always trigger this. |
So the problem seems to be in here: bootc/ostree-ext/src/container/store.rs Lines 930 to 957 in 244c1a0
Specifically, we have bootc/ostree-ext/src/tar/write.rs Line 382 in 244c1a0
The problem is that can get dropped before the |
Worth noting that it's not all zstd:chunked images that are broken. I've created a Compare working from the
To the failing image (local copy of
|
I generated two more The 50MB one pulls successfully. The 200MB one fails. 🫠 |
|
Nope, doesn't seem to change anything.
We did this together live yesterday, but for the permanent record, no it doesn't reproduce with just containers-image-proxy. |
Also another tidbit, I tried inserting a When it fails, it fails like:
or
When it works:
or
I don't think that really helps clarify anything though. |
I have a hunch, based on the following observations:
Based on that, what I think is happening is that there's extra zstd metadata/skippable frames/etc at the very end of the stream. Then we get into a situation where there's some racing depending on order of futures or buffering or timing of how much we read out of the pipe. Sometimes, the zstd decompressor is able to slurp up the entire stream. Sometimes, we finish the blob future while there's still trailing data in the pipe, and this causes everything to break (or hang, if you don't drop the read end of the pipe prematurely). |
That's 100% the problem, everything works after fixing that. I'll do some cleanup and post the PR shortly. |
We need to read anything past the tar stream, otherwise we'll deadlock via skopeo in FinishPipe. Also remove the bit where we hold the pipe open longer, we shouldn't need to do that anymore because we've ensured we've read everything out of it by this point. Resolves: bootc-dev#1204 Signed-off-by: John Eckersberg <jeckersb@redhat.com>
We still see
ERROR Switching: Pulling: Importing: Parsing layer blob sha256:43373869bc293673e2dfc59a04c626501893eed1f7c1ec98004f2e7c0e41a245: failed to invoke method FinishPipe: failed to invoke method FinishPipe: write |1: broken pipe
occasionally at least in CI runs though I haven't seen it myself personally/interactively.Previously: ostreedev/ostree-rs-ext#657
cc also #963 where this seems to happen with zstd:chunked
The text was updated successfully, but these errors were encountered: