Skip to content

CI: reduce cache size and auto clean caches #10032

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

yyforyongyu
Copy link
Member

@yyforyongyu yyforyongyu commented Jul 4, 2025

This PR reduces the num of caches created per PR, and decreases individual cache size whenever possible by using the use-build-cache flag. It also adds a new action to auto clean caches older than 12 hours to free up space.

Most of the changes are suggested by Gemini.

No need to use build caches

  1. sqlc-check Job: This job only generates SQL models. It doesn't need a build cache.
  2. rpc-check Job: While this job does some building, it's for specific small packages. It's a good candidate to disable the full build cache to save space, as the performance gain is likely minimal compared to the cache cost.
  3. check-commits Job: This job only runs a script to check commit messages. It definitely doesn't need a build cache.
  4. lint Job: Linters perform static analysis and don't need a full build cache.
  5. sample-conf-check Job: This job just checks a configuration file.

Combine small jobs

So a single cache can be shared, this also reduces the overhead cost from starting new machines.

Auto cache cleanup

lnd CI is usually very busy, so we want to clean up old caches whenever possible (when the author to have write perms).

@yyforyongyu yyforyongyu self-assigned this Jul 4, 2025
@yyforyongyu yyforyongyu added CI continuous integration no-changelog labels Jul 4, 2025
Copy link

Note

Gemini is unable to generate a summary for this pull request due to the file types involved not being currently supported.

Copy link
Collaborator

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for looking into optimizing our CI 🙏

- name: Check commits
if: github.event_name == 'pull_request'
# ... steps from check-commits job ...
run: scripts/check-each-commit.sh upstream/${{ github.base_ref }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should do this in the middle. Perhaps we should run all other steps first, since we are going to rebase the branch in the script.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this would benefit from the build cache, as it's explicitly going to compile each commit. So this will take quite a bit longer I assume. But at least it should then not run out of space anymore...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my understanding is that the check commits will rebase from the beginning, and when it finishes we should end up in the same last commit.

Also, this would benefit from the build cache, as it's explicitly going to compile each commit. So this will take quite a bit longer I assume.

You mean by moving it to the end will benefit or?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right, it should end up with the same diff.
But I think checking each commit is what usually takes a long time, so we should do all "cheap" and quick checks first. Otherwise we spend a lot of CI run time on this to then potentially run into a linter error.

You mean by moving it to the end will benefit or?

No, I mean the check commits step itself would benefit from the build cache (independent of the order). But we've removed the build cache for the whole step.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it does take 22m to run the Static Checks & Lint job, maybe it should be moved into its own job to speed up the feedback? Tho I think the bottom neck is native postgres itest as it takes 57m🤦🏻

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm further checked a bit - it looks like the usage is really case specific. For instance, this PR has 6 commits, but check commits took 13m, while this PR only takes less than 6m. And since we use make unit it should not use build cache but only package cache? anyway pushing a diff version where we use the build cache to see the difference now.

Copy link
Member Author

@yyforyongyu yyforyongyu Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm the check commits now takes longer to finish, roughly 8m with the build cache, maybe i don't understand it at all...the new Static Checks & Lint is faster than the old Static Checks & Lint tho (roughly 2m), the cache size, however, is 2.3 GB vs 440 MB.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok just realized this can also use the same cache with key prefix unit-test

@yyforyongyu yyforyongyu force-pushed the fix-ci-cache branch 7 times, most recently from 8dd7ad0 to 1d94e0b Compare July 9, 2025 17:54
@yyforyongyu yyforyongyu requested a review from Copilot July 9, 2025 17:58
@yyforyongyu
Copy link
Member Author

/gemini review

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR reduces unnecessary build caches, consolidates similar CI jobs, and adds an automatic cleanup action to remove caches older than 12 hours.

  • Disable or shrink build caches for jobs that don’t benefit from them
  • Merge multiple small jobs into a single static-checks job
  • Introduce an auto-cleanup-cache job and a reusable cleanup-space action

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
.github/workflows/stats.yml Added top-level permissions and restricted the job to base-repo PRs
.github/workflows/main.yml Disabled build caches, combined jobs, renamed steps, and added auto-cleanup-cache job
.github/actions/cleanup-space/action.yml New composite action to free runner disk space
Comments suppressed due to low confidence (2)

.github/workflows/stats.yml:9

  • This workflow posts stats as PR comments but no pull-requests: write permission is granted. Add pull-requests: write under permissions so it can create comments.
  actions: write

.github/workflows/main.yml:528

  • Rather than matching a comma-separated string, use contains(['OWNER','MEMBER','COLLABORATOR'], github.event.pull_request.author_association) for clearer membership testing.
      contains('OWNER, MEMBER, COLLABORATOR', github.event.pull_request.author_association)

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new GitHub action to clean up runner disk space by removing large, non-essential toolsets. I've suggested adding comments to explain the purpose of removing specific directories for better maintainability.

@yyforyongyu
Copy link
Member Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR introduces a new action to clean up runner disk space by removing large, non-essential toolsets. The action removes dotnet, android, and ghc to free up space. I've suggested adding comments to explain why these directories are being removed and the approximate disk space freed by each removal.

Copy link
Collaborator

@ellemouton ellemouton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice!

A couple of questions mostly just for my understanding

@@ -80,6 +81,7 @@ jobs:
uses: ./.github/actions/setup-go
with:
go-version: '${{ env.GO_VERSION }}'
use-build-cache: 'no'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be cool to have a comment either here or in the commit message explaining why it's ok to turn it off here even though we are doing some building. just for future readability.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool this is now removed - think I got it wrong in the first place

@@ -113,6 +115,7 @@ jobs:
uses: ./.github/actions/setup-go
with:
go-version: '${{ env.GO_VERSION }}'
use-build-cache: 'no'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one isnt super clear to me. The PR description says

check-commits Job: This job only runs a script to check commit messages. It definitely doesn't need a build cache.

but that is incorrect - it compiles each commit.

Like previous comment, either a comment here or in commit message would help 🙏

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah right - now removed, also added a commit to explain in details about the caching flow

@@ -139,6 +142,7 @@ jobs:
uses: ./.github/actions/setup-go
with:
go-version: '${{ env.GO_VERSION }}'
use-build-cache: 'no'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also not super clear to me: i think the linter does also analyse compiled code in addition to just pure static checks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@@ -537,6 +537,46 @@ jobs:
- name: 🛡️ backwards compatibility test
run: make backwards-compat-test

#########################################
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice -

do you know why GH doesnt already do this eviction for us? From searching around, it seems like it is meant to auto-evict once we reach 10GB. (1)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i guess it's due to our high frequency that the eviction flow cannot keep up,

The cache eviction process may cause cache thrashing, where caches are created and deleted at a high frequency. To reduce this, you can review the caches for a repository and take corrective steps, such as removing caching from specific workflows.

This follows GitHub's official recommendation to avoid re-downloading
essential tools that setup-go might depend on
@yyforyongyu yyforyongyu force-pushed the fix-ci-cache branch 3 times, most recently from 065eeb9 to 427c1ed Compare July 17, 2025 12:25
To unify the format used in job names.
This commit adds a more descriptive docs about the caching process. In
addition, the `github.job` is now removed from the cache key
construction to allow jobs sharing the same cache. In addition, given we
now use `inputs.key-prefix`, the `github.job` is no longer relevant as
the `key-prefix` already creates a unique name for the cache.

We also update the lint and check commits job to use the same cache from
the unit test, given they are running in the same environment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI continuous integration no-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants