-
Notifications
You must be signed in to change notification settings - Fork 2.2k
CI: reduce cache size and auto clean caches #10032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Note Gemini is unable to generate a summary for this pull request due to the file types involved not being currently supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for looking into optimizing our CI 🙏
.github/workflows/main.yml
Outdated
- name: Check commits | ||
if: github.event_name == 'pull_request' | ||
# ... steps from check-commits job ... | ||
run: scripts/check-each-commit.sh upstream/${{ github.base_ref }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we should do this in the middle. Perhaps we should run all other steps first, since we are going to rebase the branch in the script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this would benefit from the build cache, as it's explicitly going to compile each commit. So this will take quite a bit longer I assume. But at least it should then not run out of space anymore...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my understanding is that the check commits
will rebase from the beginning, and when it finishes we should end up in the same last commit.
Also, this would benefit from the build cache, as it's explicitly going to compile each commit. So this will take quite a bit longer I assume.
You mean by moving it to the end will benefit or?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're right, it should end up with the same diff.
But I think checking each commit is what usually takes a long time, so we should do all "cheap" and quick checks first. Otherwise we spend a lot of CI run time on this to then potentially run into a linter error.
You mean by moving it to the end will benefit or?
No, I mean the check commits
step itself would benefit from the build cache (independent of the order). But we've removed the build cache for the whole step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it does take 22m to run the Static Checks & Lint
job, maybe it should be moved into its own job to speed up the feedback? Tho I think the bottom neck is native postgres itest as it takes 57m🤦🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm further checked a bit - it looks like the usage is really case specific. For instance, this PR has 6 commits, but check commits took 13m, while this PR only takes less than 6m. And since we use make unit
it should not use build cache but only package cache? anyway pushing a diff version where we use the build cache to see the difference now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm the check commits
now takes longer to finish, roughly 8m with the build cache, maybe i don't understand it at all...the new Static Checks & Lint is faster than the old Static Checks & Lint tho (roughly 2m), the cache size, however, is 2.3 GB vs 440 MB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok just realized this can also use the same cache with key prefix unit-test
8dd7ad0
to
1d94e0b
Compare
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR reduces unnecessary build caches, consolidates similar CI jobs, and adds an automatic cleanup action to remove caches older than 12 hours.
- Disable or shrink build caches for jobs that don’t benefit from them
- Merge multiple small jobs into a single static-checks job
- Introduce an auto-cleanup-cache job and a reusable cleanup-space action
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
.github/workflows/stats.yml | Added top-level permissions and restricted the job to base-repo PRs |
.github/workflows/main.yml | Disabled build caches, combined jobs, renamed steps, and added auto-cleanup-cache job |
.github/actions/cleanup-space/action.yml | New composite action to free runner disk space |
Comments suppressed due to low confidence (2)
.github/workflows/stats.yml:9
- This workflow posts stats as PR comments but no
pull-requests: write
permission is granted. Addpull-requests: write
under permissions so it can create comments.
actions: write
.github/workflows/main.yml:528
- Rather than matching a comma-separated string, use
contains(['OWNER','MEMBER','COLLABORATOR'], github.event.pull_request.author_association)
for clearer membership testing.
contains('OWNER, MEMBER, COLLABORATOR', github.event.pull_request.author_association)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new GitHub action to clean up runner disk space by removing large, non-essential toolsets. I've suggested adding comments to explain the purpose of removing specific directories for better maintainability.
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This PR introduces a new action to clean up runner disk space by removing large, non-essential toolsets. The action removes dotnet, android, and ghc to free up space. I've suggested adding comments to explain why these directories are being removed and the approximate disk space freed by each removal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice!
A couple of questions mostly just for my understanding
.github/workflows/main.yml
Outdated
@@ -80,6 +81,7 @@ jobs: | |||
uses: ./.github/actions/setup-go | |||
with: | |||
go-version: '${{ env.GO_VERSION }}' | |||
use-build-cache: 'no' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be cool to have a comment either here or in the commit message explaining why it's ok to turn it off here even though we are doing some building. just for future readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool this is now removed - think I got it wrong in the first place
.github/workflows/main.yml
Outdated
@@ -113,6 +115,7 @@ jobs: | |||
uses: ./.github/actions/setup-go | |||
with: | |||
go-version: '${{ env.GO_VERSION }}' | |||
use-build-cache: 'no' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one isnt super clear to me. The PR description says
check-commits Job: This job only runs a script to check commit messages. It definitely doesn't need a build cache.
but that is incorrect - it compiles each commit.
Like previous comment, either a comment here or in commit message would help 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah right - now removed, also added a commit to explain in details about the caching flow
.github/workflows/main.yml
Outdated
@@ -139,6 +142,7 @@ jobs: | |||
uses: ./.github/actions/setup-go | |||
with: | |||
go-version: '${{ env.GO_VERSION }}' | |||
use-build-cache: 'no' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also not super clear to me: i think the linter does also analyse compiled code in addition to just pure static checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
@@ -537,6 +537,46 @@ jobs: | |||
- name: 🛡️ backwards compatibility test | |||
run: make backwards-compat-test | |||
|
|||
######################################### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice -
do you know why GH doesnt already do this eviction for us? From searching around, it seems like it is meant to auto-evict once we reach 10GB. (1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i guess it's due to our high frequency that the eviction flow cannot keep up,
The cache eviction process may cause cache thrashing, where caches are created and deleted at a high frequency. To reduce this, you can review the caches for a repository and take corrective steps, such as removing caching from specific workflows.
This follows GitHub's official recommendation to avoid re-downloading essential tools that setup-go might depend on
065eeb9
to
427c1ed
Compare
Thus saving us from the overhead cost, also they can share the same cache file.
427c1ed
to
a502e52
Compare
To unify the format used in job names.
This commit adds a more descriptive docs about the caching process. In addition, the `github.job` is now removed from the cache key construction to allow jobs sharing the same cache. In addition, given we now use `inputs.key-prefix`, the `github.job` is no longer relevant as the `key-prefix` already creates a unique name for the cache. We also update the lint and check commits job to use the same cache from the unit test, given they are running in the same environment.
We also make it cancelable.
a502e52
to
d1398f6
Compare
This PR reduces the num of caches created per PR, and decreases individual cache size whenever possible by using the
use-build-cache
flag. It also adds a new action to auto clean caches older than 12 hours to free up space.Most of the changes are suggested by Gemini.
No need to use build caches
rpc-check Job: While this job does some building, it's for specific small packages. It's a good candidate to disable the full build cache to save space, as the performance gain is likely minimal compared to the cache cost.check-commits Job: This job only runs a script to check commit messages. It definitely doesn't need a build cache.lint Job: Linters perform static analysis and don't need a full build cache.Combine small jobs
So a single cache can be shared, this also reduces the overhead cost from starting new machines.
Auto cache cleanup
lnd
CI is usually very busy, so we want to clean up old caches whenever possible (when the author to have write perms).