Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Add a new tool for calculating file checksums faster #3633

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

frankdavid
Copy link
Contributor

When producing disk images during the IC-OS build, the Bazel built-in checksum calculator is quite slow when those images contain a lot of holes (long sections of zero bytes). These holes are stored in an optimized form using sparse files in the file system but Bazel's checksum algorithm is not optimized for them.

This wasn't an issue previously because we processed the artifacts between build steps (tarring, compressing, decompressing, untarring), but this processing between build steps adds additional slowness and code/debugging complexity.

As a preliminary step to removing the artifact processing, this PR introduces a much faster way (up to 100x faster on our biggest artifacts) to calculate checksums.

The new icsum tool implements a fast checksum algorithm optimized specifically for sparse files. The calculated checksum is stored in the user.icsum extended attribute of IC-OS artifacts and the Bazel config has been changed to try to use this checksum when available.

@frankdavid frankdavid requested review from a team as code owners January 27, 2025 14:58
@github-actions github-actions bot added the perf label Jan 27, 2025
@basvandijk
Copy link
Collaborator

@frankdavid should we review after CI is green or do you expect things are mostly done?

frankdavid and others added 2 commits January 27, 2025 15:08
…876a44863225910e146f5e2d26de2c

Image tag: 9afdf4979b00656c1ad15198488721a1055170c1b8b45323c60a434574327670
Copy link
Contributor

Run URL: https://github.com/dfinity/ic/actions/runs/12991989539

New container image: sha256:5926f073e5d8c1b37d5905697107f13dd3876a44863225910e146f5e2d26de2c
New container tag: 9afdf4979b00656c1ad15198488721a1055170c1b8b45323c60a434574327670

@pr-automation-bot-public pr-automation-bot-public bot requested a review from a team as a code owner January 27, 2025 15:19
@mbjorkqvist
Copy link
Member

Impressive performance improvements - thanks for looking into this, @frankdavid!

I'm wondering if "the Bazel built-in checksum calculator" uses sha256, or some other cryptographic hash function? xxHash seems to be a non-cryptographic hash algorithm, so changing could have some security implications.

@frankdavid
Copy link
Contributor Author

should we review after CI is green

It's all green now.

I'm wondering if "the Bazel built-in checksum calculator" uses sha256, or some other cryptographic hash function?

It uses blake hash which is a cryptographic hash function. I wonder how important it is to have a cryptographic hash for a cache checksum. I thought that since the hash is only used to calculate if there has been a change in the file, it doesn't need to be safe against adversaries. Even if someone could create a version of a file which produces the same checksum, I don't see what they would gain from tricking the build system into not recompiling dependent targets.

The more problematic thing would be if we hit collisions by accident, meaning two subsequent builds would produce the same hash even though some files have changed. With 128-bit hash values the probability of this is very low. Furthermore, xxhash is widely used in the industry.

Please let me know what you think.

@andrewbattat
Copy link
Member

LGTM! I'll leave it to Eero to take a look and approve though. Nice job!

Out of curiosity, do you have numbers on the overall build time improvements?

@@ -0,0 +1,17 @@
[workspace]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi, can you clarify why this need to be a standalone workspace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The choice was motivated by how the tool is compiled. It's compiled in the Dockerfile (also included in this PR). I didn't want to copy the entire repo just to compile this one tool, so it seemed a better choice to make it into its own workspace. Let me know if you can recommend a better approach.

@mbjorkqvist
Copy link
Member

I'm wondering if "the Bazel built-in checksum calculator" uses sha256, or some other cryptographic hash function?

It uses blake hash which is a cryptographic hash function. I wonder how important it is to have a cryptographic hash for a cache checksum. I thought that since the hash is only used to calculate if there has been a change in the file, it doesn't need to be safe against adversaries. Even if someone could create a version of a file which produces the same checksum, I don't see what they would gain from tricking the build system into not recompiling dependent targets.

The more problematic thing would be if we hit collisions by accident, meaning two subsequent builds would produce the same hash even though some files have changed. With 128-bit hash values the probability of this is very low. Furthermore, xxhash is widely used in the industry.

Please let me know what you think.

Thanks for the insight! I pinged @dfinity/product-security on Slack to have a look. The changes where FI approval is needed are trivial, so if the approach otherwise is approved then I have no problem approving.

Copy link
Member

@Bownairo Bownairo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few small comments and nits

ci/container/Dockerfile Show resolved Hide resolved
rs/ic_os/build_tools/icsum/src/main.rs Show resolved Hide resolved
rs/ic_os/build_tools/icsum/src/main.rs Show resolved Hide resolved
rs/ic_os/build_tools/icsum/src/main.rs Outdated Show resolved Hide resolved
rs/ic_os/build_tools/icsum/src/main.rs Outdated Show resolved Hide resolved
@frankdavid
Copy link
Contributor Author

I've updated the algorithm to be cryptographically secure. It runs a bit slower now (~half the speed of the previous implementation) but it's still fast enough for now (0.8s on our largest artifacts).

@frankdavid frankdavid requested a review from nmattia January 29, 2025 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants