Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify caching mechanisms for CI and PROD images #45261

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Dec 28, 2024

For a long time we had used a sophisticated mechanism to speed up
our CI jobs by building the images in "pull_request_target" workflow
and pushing them to GitHub registry. That however had several drawbacks:

  • CI image was complex when it comes to layer setup (we had to pre-
    cache installed dependencies by installing them from branch tip

  • The pull_request_target is a very dangerous workflow, we had a number
    of security problems with it (and it's difficult to debug)

  • Caching of pip and uv was not used because it increased size of
    the image significantly

This PR significantly improves the caching mechanisms for the images
building of several advacements that were not possible before:

  • The upload-artifacts@v4 action and improved stash action developed
    by @assignUser and published in "apache/infrastructure-actions"
    allows us to store all images (8GB per run) in artifacts rather
    than in registry - so we can do the image build once and share
    it with all the jobs.

  • The uv speed is "enough" to allow occasional installation of Airlfow
    locally. This allows to utilize cache-mount and locally build uv
    cache, rather than rely on "remote" cache when we are building
    local images for breeze. The first time you build local breeze
    image it will take 2-5 more minutes (depending on your network
    speed, but because we can utilise cache mounts, every subsequent
    build should be very fast - even if all dependencies change. Using
    uv also allows to "always" reinstall airflow when you build the
    image even if single source file changed, because with cache
    it takes sub-seconds to reinstall airflow and all dependencies.

  • the cache mounts are not included in the image size, and since we
    can export and import images in CI in artifacts and we do not
    need to rebuild them, the images shared as compressed artifacts are
    relatively small (2GB) - cache of uv is around 4GB on top of that
    so sharing image built in the "build image" job with other jobs
    in the same workflow is fast.

  • we are still using registry cache for the "non-python" parts of
    the image - both CI and breeze image build speed benefit from using
    the image cache for system dependencies, database clients etc.

Fixes: #42999
Fixes: #43268


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:dev-tools area:production-image Production image improvements and fixes labels Dec 28, 2024
@potiuk potiuk force-pushed the redesign-image-caching branch 10 times, most recently from 84da531 to 0068c3f Compare December 28, 2024 11:35
.github/actions/prepare_breeze_and_image/action.yml Outdated Show resolved Hide resolved
.github/workflows/ci-image-build.yml Outdated Show resolved Hide resolved
.github/actions/prepare_breeze_and_image/action.yml Outdated Show resolved Hide resolved
.github/actions/prepare_breeze_and_image/action.yml Outdated Show resolved Hide resolved
@gopidesupavan
Copy link
Member

Nice looking forward for this :)

@potiuk potiuk force-pushed the redesign-image-caching branch 13 times, most recently from 962899a to 038b2b7 Compare December 28, 2024 20:53
- name: "Cleanup docker"
run: ./scripts/ci/cleanup_docker.sh
shell: bash
# TODO: Currently we cannot loop through the list of python versions and have dynamic list of
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @assignUser -> will need an option to restore several keys in a single action for that. You might see some people from our team contributing to the stash action of yours :)

@potiuk potiuk force-pushed the redesign-image-caching branch 2 times, most recently from a4395bf to 8d6e1ac Compare December 28, 2024 21:16
@potiuk potiuk force-pushed the redesign-image-caching branch from 8d6e1ac to 585258e Compare December 28, 2024 22:41
For a long time we had used a sophisticated mechanism to speed up
our CI jobs by building the images in "pull_request_target" workflow
and pushing them to GitHub registry. That however had several drawbacks:

* CI image was complex when it comes to layer setup (we had to pre-
  cache installed dependencies by installing them from branch tip

* The pull_request_target is a very dangerous workflow, we had a number
  of security problems with it (and it's difficult to debug)

* Caching of `pip` and `uv` was not used because it increased size of
  the image significantly

This PR significantly improves the caching mechanisms for the images
building of several advacements that were not possible before:

* The upload-artifacts@v4 action and improved stash action developed
  by @assignUser and published in "apache/infrastructure-actions"
  allows us to store all images (8GB per run) in artifacts rather
  than in registry - so we can do the image build once and share
  it with all the jobs.

* The uv speed is "enough" to allow occasional installation of Airlfow
  locally. This allows to utilize cache-mount and locally build uv
  cache, rather than rely on "remote" cache when we are building
  local images for breeze. The first time you build local breeze
  image it will take 2-5 more minutes (depending on your network
  speed, but because we can utilise cache mounts, every subsequent
  build should be very fast - even if all dependencies change. Using
  uv also allows to "always" reinstall airflow when you build the
  image even if single source file changed, because with cache
  it takes sub-seconds to reinstall airflow and all dependencies.

* the cache mounts are not included in the image size, and since we
  can export and import images in CI in artifacts and we do not
  need to rebuild them, the images shared as compressed artifacts are
  relatively small (2GB) - cache of `uv` is around 4GB on top of that
  so sharing image built in the "build image" job with other jobs
  in the same workflow is fast.

* we are still using registry cache for the "non-python" parts of
  the image - both CI and breeze image build speed benefit from using
  the image cache for system dependencies, database clients etc.

Fixes: apache#42999
Fixes: apache#43268
@potiuk potiuk force-pushed the redesign-image-caching branch from 585258e to 3ca63eb Compare December 29, 2024 01:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dev-tools area:production-image Production image improvements and fixes
Projects
None yet
2 participants