Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] hack/dockerfile: add dockerignore for vendor.dockerfile #3649

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

thaJeztah
Copy link
Member

Some of the Dockerfiles in this repository may needed the git source to do their work, but vendoring likely doesn't, and I noticed that sending over the build-context was considerably slowing down the process.

This patch adds a dockerignoe specific for this Dockerfile. It's based on the .dockerignore at the root of the repository, but adds the .git, .github, and vendor directories. Possibly other non-code paths could be excluded as well, but would probably likely only bring minor improvements.

before:

make vendor
...
[+] Building 100.6s (14/14) FINISHED
...
    => [internal] load build context                                23.5s
    => => transferring context: 99.02MB                             23.3s

After removing .git:

=> [internal] load build context                                    15.2s
=> => transferring context: 43.10MB                                 15.1s

After removing both .git and vendor:

make vendor
...
[+] Building 13.6s (13/13) FINISHED
...
    => [internal] load build context                                1.2s
    => => transferring context: 5.24MB                              1.0s

Some of the Dockerfiles in this repository may needed the git source to
do their work, but vendoring likely doesn't, and I noticed that sending
over the build-context was considerably slowing down the process.

This patch adds a dockerignoe specific for this Dockerfile. It's based on
the `.dockerignore` at the root of the repository, but adds the `.git`,
`.github`, and `vendor` directories. Possibly other non-code paths could be
excluded as well, but would probably likely only bring minor improvements.

before:

    make vendor
    ...
    [+] Building 100.6s (14/14) FINISHED
    ...
        => [internal] load build context                                23.5s
        => => transferring context: 99.02MB                             23.3s

After removing `.git`:

    => [internal] load build context                                    15.2s
    => => transferring context: 43.10MB                                 15.1s

After removing both `.git` and `vendor`:

    make vendor
    ...
    [+] Building 13.6s (13/13) FINISHED
    ...
        => [internal] load build context                                1.2s
        => => transferring context: 5.24MB                              1.0s

Signed-off-by: Sebastiaan van Stijn <[email protected]>
@thaJeztah
Copy link
Member Author

This might still need work / discussion. Currently the first stage of the Dockerfile will use the vendor/ directory from the repository as starting point, but this also means (partially due to the .git repository always invalidating the cache?) if will always have to send over those files. go mod vendor will still be making network connections in that case though, so I'm not sure if copying over the vendor/ directory has benefits in that case.

And maybe the "copy back to host" step would not even be needed if the starting point after go mod vendor did not update go.sum (and vendor/modules.txt)?

FROM golang:1.19-alpine AS vendored
RUN apk add --no-cache git
WORKDIR /src
RUN --mount=target=/src,rw \
--mount=target=/go/pkg/mod,type=cache \
go mod tidy && go mod vendor && \
mkdir /out && cp -r go.mod go.sum vendor /out

@thaJeztah
Copy link
Member Author

Ah, interesting, so I guess we have a step somewhere to check if .git modified? Wondering if we can somehow do that on the host, or is this something else?

#13 [validate 1/1] RUN --mount=target=.,rw   git add -A &&   rm -rf vendor &&   cp -rf /out/* . &&   ./hack/validate-vendor check
#0 0.096 fatal: not a git repository (or any parent up to mount point /)
#0 0.096 Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
#13 ERROR: process "/bin/sh -c git add -A &&   rm -rf vendor &&   cp -rf /out/* . &&   ./hack/validate-vendor check" did not complete successfully: exit code: 128

@crazy-max
Copy link
Member

Yes we need the working tree so we can validate vendoring. Same with https://github.com/moby/buildkit/blob/master/hack/dockerfiles/generated-files.Dockerfile. Maybe we could just use diff.

@thaJeztah
Copy link
Member Author

It's a bit confusing though, because CI runs hack/validate-vendor(without arguments), which means that it doesn't check if changes were made, so always runs a go mod vendor?

case ${1:-} in
'')
. $(dirname $0)/util
buildxCmd build \
--target validate \
--file ./hack/dockerfiles/vendor.Dockerfile \
.
;;
check)
status="$(git status --porcelain -- go.mod go.sum vendor 2>/dev/null)"

The check is confusing as well though because it prints "changes were made", but check itself doesn't do a go mod vendor to generate those changes

@crazy-max
Copy link
Member

crazy-max commented Feb 21, 2023

hack/validate-vendor calls validate target in the Dockerfile which depends on the vendored target that does go mod vendor and then calls hack/validate-vendor check. That SGTM but yeah that's confusing.

@tonistiigi
Copy link
Member

(...ignoring the current validation issue)

I tend to think this is a premature optimization. User shouldn't need to think about maintaining a list of files purely for performance. Hopefully, BuildKit in the future can optimize it further automatically. Ignoring some files can easily become a subtle bug, especially because 1) even in the CI the files will now be different as dockerignore only applies to local files 2) missing git internal files are usually silently ignored. If some files are better ignored, it is better if that definition is at least in Dockerfile side.

For buildkit maintainer, all the contexts are shared today and if you built any other target before, only small changes and metadata is sent. This is what I get with the first invocation.

=> [internal] load build context                                     0.6s
 => => transferring context: 1.70MB                                   0.6s

Disclaimer: this is coming from the guy who put .git back into docker-cli Dockerfile, and regularly 😡 when I build Moby and get broken command because .git is missing there.

@crazy-max lmk if you disagree on the perf side. I might be spoiled by having a relatively fast machine.

@crazy-max
Copy link
Member

@crazy-max lmk if you disagree on the perf side. I might be spoiled by having a relatively fast machine.

Tested on WSL with 9p protocol:

$ make vendor
...
#9 [internal] load build context
#9 transferring context: 1.05MB 5.1s
#9 transferring context: 95.49MB 10.2s
#9 transferring context: 95.57MB 15.2s
#9 transferring context: 95.65MB 20.3s
#9 transferring context: 95.73MB 25.3s
#9 transferring context: 95.80MB 30.4s
#9 transferring context: 95.87MB 35.4s
#9 transferring context: 95.94MB 39.1s done
#9 DONE 39.2s

ext4:

$ make vendor
...
#9 [internal] load build context
#9 transferring context: 145.66MB 1.0s done
#9 DONE 1.0s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants