Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fuse-overlayfs: use temporary mount wrapper #3878

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

amurzeau
Copy link
Contributor

runc (and containerd as it also uses runc it seems) will fail to mount mounts using fuse-overlayfs.

This makes tests to fail when using fuse-overlayfs snapshotter with this error:

failed to create shim task:
OCI runtime create failed:
runc create failed:
unable to start container process:
error during container init:
error mounting \"overlay\" to rootfs at \"/rw\":
    mount overlay:/rw (via /proc/self/fd/7),
    data:
        workdir=/tmp/bktest_containerd-fuse-overlayfs-grpc312066418/root/snapshots/12/work,
        upperdir=/tmp/bktest_containerd-fuse-overlayfs-grpc312066418/root/snapshots/12/fs,
        lowerdir=/tmp/bktest_containerd-fuse-overlayfs-grpc312066418/root/snapshots/10/fs:/tmp/bktest_containerd-fuse-overlayfs-grpc312066418/root/snapshots/8/fs
    : no such device: unknown

The failing fuse-overlayfs test was discovered on PR #3876 that adds fuse-overlayfs tests.
The error can be seen here: https://github.com/moby/buildkit/actions/runs/4996865039/jobs/8950612470#step:7:1002

With this commit, fuse-overlayfs tests enabled by PR #3876 should run successfully (at least for buildkitd workflow, see https://github.com/amurzeau/buildkit/actions/runs/5009224692/jobs/8977964559)

I'm not sure why this is required as fuse-overlayfs is working outside tests.
To ensure this is not a regression from my previous PRs, I've tried to cherry-pick fuse-overlayfs test PR #3876 on top of a old commit before any of my previous merged PR and tests fails too with the same error.
Maybe the code path is not always triggered ?

@amurzeau amurzeau force-pushed the contrib7-fix-fuse-overlayfs-mounts-runc branch 2 times, most recently from ba05b33 to ce85cf2 Compare May 18, 2023 01:07
@ktock
Copy link
Collaborator

ktock commented May 18, 2023

There are already fuse-based snapshotters working (e.g. stargz) without this exception. Why does fuse-overlayfs require this special exception?

@amurzeau
Copy link
Contributor Author

amurzeau commented May 19, 2023

I've investigated why things work for stargz and not for fuse-overlayfs.

The issue occurs when using NewContainer with non-root (/) mounts that are write-enabled.
When using NewContainer with write-enabled, non-root, mounts, github.com/moby/buildkit/frontend/gateway.PrepareMounts will create a mutable ref by doing this:

  • create a new mutable ref whose parent ref is the given mount, so any modification made to it will be discarded afterwards
  • this ref is created by the snapshotter using cm.Snapshotter.Prepare, which will:
    • use a fuse3.fuse-overlayfs type mount for fuse-overlayfs snapshotter (containerd can handle fuse3.fuse-overlayfs mounts)
    • use a overlay type mount for other snapshotters (including stargz snapshotter)
  • then, when the created container is started, buildkit will generate an OCI spec with mounts (except the rootfs mount)
  • this is done using github.com/moby/buildkit/executor/oci.GenerateSpec
  • this function will call sharableMountable.Mount on all of these "non-root" mounts, which will:
    • for overlay type mounts, it will create the temporary bind mount and put that bind mount in the OCI spec
    • for fuse3.fuse-overlayfs, it won't use the temporary bind mount and put the fuse3.fuse-overlayfs mount in the OCI spec
      • then runc will fail to mount that later

So it works with stargz because it returns overlay mounts (and not fuse-based mounts).

Using the fuse-overlayfs snapshotter works fine when using buildctl as long as we don't try to use a write-enabled mount with an Exec Op.
Here a Dockerfile that trigger this same issue:

FROM debian:unstable

RUN --mount=type=bind,target=/test,rw echo test
$ docker run --name buildkitd --privileged moby/buildkit:rootless  --oci-worker-snapshotter=fuse-overlayfs
$ buildctl --addr docker-container://buildkitd build --frontend=dockerfile.v0 --local context=. --local dockerfile=.
[+] Building 0.6s (6/6) FINISHED                                                                                                                                             
 => [internal] load .dockerignore                                                                                                                                       0.0s
 => => transferring context: 2B                                                                                                                                         0.0s
 => [internal] load build definition from Dockerfile                                                                                                                    0.0s
 => => transferring dockerfile: 107B                                                                                                                                    0.0s
 => [internal] load metadata for docker.io/library/debian:unstable                                                                                                      0.4s
 => [internal] load build context                                                                                                                                       0.0s
 => => transferring context: 107B                                                                                                                                       0.0s
 => CACHED [stage-0 1/2] FROM docker.io/library/debian:unstable@sha256:38ce00318c86cf6a65b5e9fceeff9d61516141fc0807bc7684b45d8b8dc1cd09                                 0.0s
 => => resolve docker.io/library/debian:unstable@sha256:38ce00318c86cf6a65b5e9fceeff9d61516141fc0807bc7684b45d8b8dc1cd09                                                0.0s
 => ERROR [stage-0 2/2] RUN --mount=type=bind,target=/test,rw echo test                                                                                                 0.2s
------
 > [stage-0 2/2] RUN --mount=type=bind,target=/test,rw echo test:
#0 0.147 runc run failed: unable to start container process: error during container init: error mounting "overlay" to rootfs at "/test": mount overlay:/test (via /proc/self/fd/6), data: workdir=/home/user/.local/share/buildkit/runc-fuse-overlayfs/snapshots/snapshots/8/work,upperdir=/home/user/.local/share/buildkit/runc-fuse-overlayfs/snapshots/snapshots/8/fs,lowerdir=/home/user/.local/share/buildkit/runc-fuse-overlayfs/snapshots/snapshots/3/fs: no such device
------
Dockerfile:3
--------------------
   1 |     FROM debian:unstable
   2 |     
   3 | >>> RUN --mount=type=bind,target=/test,rw echo test
   4 |     
--------------------
error: failed to solve: process "/bin/sh -c echo test" did not complete successfully: exit code: 1

Overlayfs mount use a temporary bind mounts so it can be shared (using sharableMountable).
This is not done for immutable mounts, but fuse-overlayfs would still need to use a temporary bind mount to be accepted by runc.

I guess the code almost never pass a overlayfs/fuse-overlayfs immutable mount to runc, so that never happen and that why modifying sharableMountable.Mount is sufficient to make all tests pass successfully.

But it can be reproduced with this Dockerfile:

FROM debian:unstable as stage1
RUN touch /t

FROM debian:unstable as stage2
RUN --mount=type=bind,target=/stage1,from=stage1,ro echo test

/stage1 will be a fuse-overlayfs immutable mount from stage1, but runc can't handle it.

So I'm proposing to instead do this:

  • Move the code that create the temporary bind mount wrapper to a new function
  • Use that function:
    • by sharableMountable.Mount (existing behavior)
    • for all non-rootfs mounts when generating an OCI spec
    • for all non-rootfs mounts when using runc executor

@ktock
Copy link
Collaborator

ktock commented May 22, 2023

@amurzeau Thanks for the explanation.

Move the code that create the temporary bind mount wrapper to a new function
Use that function:

  • by sharableMountable.Mount (existing behavior)
  • for all non-rootfs mounts when generating an OCI spec
  • for all non-rootfs mounts when using runc executor

If we fix all non-rootfs mounts when generating an OCI spec, we don't need other items, right? How about including that change to this PR? oci.GenerateSpec seems to do the conversion from containerd's mount.Mount to OCI's specs.Mount so that function needs to be fixed not to propagate the mount type unsupported by runc (i.e. fuse3.fuse-overlayfs). cc @tonistiigi

@amurzeau
Copy link
Contributor Author

amurzeau commented May 22, 2023

If we fix all non-rootfs mounts when generating an OCI spec, we don't need other items, right? How about including that change to this PR? oci.GenerateSpec seems to do the conversion from containerd's mount.Mount to OCI's specs.Mount so that function needs to be fixed not to propagate the mount type unsupported by runc (i.e. fuse3.fuse-overlayfs). cc @tonistiigi

Yes, this should be enough, GenerateSpec is called for both containerd and OCI executors.

But I'm not sure where to put the code that create the temporary bind mount. I'm thinking about adding a folder in `util/, but there are already many folders here.

Also, the temporary bind mount done in refs.go use a directory within a cachemount dir.
In GenerateSpec, I don't seem to have access to this path variable unless I add an argument to GenerateSpec and both containerd and runc executor's Run functions.
os.MkdirTemp("", "buildkit-fuse-mount") might be sufficient ?

@tonistiigi
Copy link
Member

I'm not sure what alternative solutions you are suggesting.

The current PR looks fine but it would be better to check fuse-overlayfs snapshotter type directly rather than putting all fuse mounts into same category.

runc (and containerd as it also uses runc) will fail to mount mounts
using fuse-overlayfs.

This makes tests to fail when using fuse-overlayfs snapshotter with this
error:

failed to create shim task:
OCI runtime create failed:
runc create failed:
unable to start container process:
error during container init:
error mounting "overlay" to rootfs at "/rw":
    mount overlay:/rw (via /proc/self/fd/7),
    data:
        workdir=/tmp/bktest_containerd-fuse-overlayfs-grpc312066418/root/snapshots/12/work,
        upperdir=/tmp/bktest_containerd-fuse-overlayfs-grpc312066418/root/snapshots/12/fs,
        lowerdir=/tmp/bktest_containerd-fuse-overlayfs-grpc312066418/root/snapshots/10/fs:/tmp/bktest_containerd-fuse-overlayfs-grpc312066418/root/snapshots/8/fs
    : no such device: unknown

There are two cases handled:

 - When the mount is writable, it must be bind mounted by
   sharableMountable.Mount for the same reason than overlayfs: 2
   fuse-overlayfs mounts using the same upperdir won't work (mounted dirs
   won't see new files created in the other mounted dir).
   This is what TestIntegration/TestSharedCacheMountsNoScratch checks.

 - When the mount is immutable, the fuse-overlayfs mount must be bind-
   mounted when used in a OCI spec as runc is not able to handle fuse mounts.
   This is what TestIntegration/TestImmutableMountFromOtherImageCheckOCISpec
   checks.

Signed-off-by: Alexis Murzeau <[email protected]>
@amurzeau amurzeau force-pushed the contrib7-fix-fuse-overlayfs-mounts-runc branch from ce85cf2 to 676dbff Compare June 7, 2023 22:23
@amurzeau
Copy link
Contributor Author

amurzeau commented Jun 7, 2023

I've used LocalMounter instead of adding another new function that does bind mounts, so my questions are now irrelevant.
I've updated the commit message with the reasoning around the modifications.

@amurzeau
Copy link
Contributor Author

I think I've resolved all remarks in the previous push, is there anything I need to improve on this PR ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants