-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Allow downloading remote patches in parallel #27333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
808ca9c to
d9d4670
Compare
tools/build_defs/repo/utils.bzl
Outdated
| name = patch_url.split("/")[-1] | ||
| patch_path = ctx.path(_REMOTE_PATCH_DIR).get_child(name) | ||
| download_info = ctx.download( | ||
| patch_path = ctx.path(_REMOTE_PATCH_DIR).get_child(str(hash(patch_url))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hash function is of low quality, so how about we concatenate name and hash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough, I also considered patch_url.replace("/", "_") but was worried about Windows path lengths...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why couldn't we keep using just the name? It should be unique since remote_patches is a dict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just the basename of the URL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, those basenames should be unique per repo right? If we just keep them, it'll be a bit easier to debug when the patch doesn't apply? I don't feel strongly against appending a hash to the downloaded file name, but I don't think it's really necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, if you had patches example.com/foo/file.patch and example.com/bar/file.patch, they would result in the same file.patch filename, but since they were downloaded and applied sequentially it was fine. Now we will have both downloads racing writes to the same file, resulting in non-deterministic patching (or maybe even corrupting the file). It's low probability but seems like an annoying-enough experience that we can trivially avoid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see, I was confused with how we specify patches in the BCR, this is a more low level feature in http_archive which might have this problem, sorry for the noise.
d9d4670 to
6e0bbc2
Compare
|
Oh, btw, can you please add a test for the name conflicting case? |
While here, we tweak the naming to avoid potential name collisions from URLs with the same suffixes