-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pin package versions? Use snapshot archives? #154
Comments
Hello @ahachete, you are correct in assuming that Chisel fetches the "latest" version. See below: chisel/internal/archive/archive.go Lines 94 to 113 in 73e7a7c
You are also right on the reproducible builds part. If a new version is available with modified contents, a re-run of Chisel will generate different rootfs. It was ultimately a design choice to always get the latest packages, as they might have security updates. But your concern is also quite valid and we might need to think about that. Unfortunately, pinning packages is not currently supported. You can pin archives at a package level via the About the hardcoded repository URLs, there aren't any immediate plans to make this configurable (via the chisel.yaml file). If it is very important to use a snapshot or a mirror, one workaround might be to somehow modify the network calls in the running host -- replacing Let me know if you have any more queries! Cheers. |
Thank you very much for your reply @rebornplusplus . It's very insightful.
I see. What would be the effect, though, of patching Chisel code and replacing that hardcoded string by a snapshot as in https://snapshot.ubuntu.com/ ? If a particular snapshot is selected, package selection should work the same way. If so, it doesn't look like to me this would be a hard change. I'd love to contribute it, but I'm not a Go programmer. I can program, though, happy to give it a try if needed, but I'd like to make sure this approach may be interesting in being adopted. In general, I'm strongly focused on reproducible container images, and would love to use Chisel for a new upcoming OSS project that wants to rely on Ubuntu as a base image, but needs to make reproducible container images. Chisel seems like a perfect fit save for this issue. As for now we've been building with Distroless and it works well, but we prefer Ubuntu for the longer LTS support window (project is WIP and not public yet). |
Yes, locally, you could replace the hardcoded string by a repository snapshot URL. As long as the relative paths (
This is nice to hear! All the best and please feel free to let us know if you have any more concerns! |
Thank you for the information, it's very valuable and I'll try to fork it to support this functionality.
Just for the record, may I ask why not? It seems to me like it would be:
I'd be more than happy to contribute such an improvement. |
Ah, it's mainly because we are focusing on other priority items currently. But I agree that this is an interesting and quite useful feature request. I will make sure to raise it to the team and track it. I will let you know of any updates! I will keep this issue open until then. Meanwhile, if you would like to do a proof-of-concept, I would be more than happy to take a look at it! |
@ahachete Even though this is a useful feature (like @rebornplusplus said) it is not that we are not prioritizing it, it is more that it goes against the fundamental design of Chisel, at least at the moment. There are several problems with pinning software versions, among the most important ones:
Lastly, you could pin your package versions by having a local archive and/or caching the packages. That is a feature that will come eventually but we do have other priorities at the moment. A workaround might be to build Chisel and point it to a local registry for example. |
Thank you for your comments @rebornplusplus and @letFunny. From what I understand it looks like Chisel chose security (from the perspective of always having Why I think reproducibility is as important as security: without reproducibility, once an image is built it becomes a "golden image". It's now the source of truth of deployments, and needs to be properly backed up and copied/distributed everywhere. It cannot be lost. Probably it has to be built on some special "golden servers" with restricted access to make sure such a golden image is not tainted. If these provisions are not taken you enter into many risks like having inconsistent deployments (where different versions of the images are deployed to parts of the same fleet); having development environments working on different versions than the production images (which in turns breaks one of the main advantages of using containers); hard to troubleshoot problems (if you cannot reproduce the very exact environment elsewhere); etc. It's worth noting that reproducibility doesn't mean that you run old versions of packages all the time. It just means (in this respect, it also requires other additional work, of course) that you source packages from a given snapshot or a set of pinnned packages. And that you can build your image anywhere and at any time, and you will get the same byte-by-byte output. No golden images, no special servers, no need to backup up images or redistribute them. Just rebuild them anytime needed. Now to avoid using old software with CVEs, build and re-deploy images with a relative high frequency. That anyone can choose, depending on their operational needs. Therefore reproducibility and security (via using latest versions) can and, in my opinion should, co-exist. If Chisel will ignore reproducibility and just focus on security, that's a choice I respect but one I'd respectfully consider not the best one and therefore a project I doubt I will put to use. But that's fine, it's just me (well, maybe others too). But if this feedback serves for any purpose to re-thinking this strategy, I'd be happy to have contributed that. On more practical terms:
I believe you have to do this anyway. Whenever a new version of an existing package introduces a change that requires updating the slide definitions, you will be forced to do this anyway. Actually, I'd say your burden is higher, since you need to update the slice definition on a very short timeframe, or else that package is temporarily broken (in contrast, if you pin and cherry pick versions, you can chose when to update them and avoid having temporarily broken packages). Therefore, package slice definitions need to be updated regardless, and the only difference when allowing pinning versions would be that you will need to keep that history of slice definitions explicitly (and not only on git history). That could be done by maintaining a list of slice definitions per package and add version boundaries (lower and possibly upper) for each element of that list. This is a bit of work in designing this now, but IMHO nothing really big and should not require additional maintenance work in the future (since, again, updating slice definitions will need to be done anyway).
At no point I'd have expected pinning to be the default behavior. It could just be an option that you need to explicitly opt-in, for those that are conscious about the advantages of combining latest or recent package versions with reproducibility. |
Hey @ahachete, thanks for your comment, you have a lot of valid points. I feel I didn't do a good job of explaining the reasoning in my previous comment so I will try to elaborate a bit more here.
I think this is the key point because this use case is very different from "general" package version pining. If I get that correctly, what we want is to be able to rebuild an image given that we had an inventory of what went inside it. That is indeed a good idea and something that we could support because we are already making the scripts deterministic (given the same package versions) and we are working on producing a manifest that could serve as an inventory. This does not contradict the design at all and, if you have a snapshot of the slice definitions used and the manifest, we could in fact add a feature to rebuild the same image. What goes against the current design is version pinning or saying I want x version of package A and y version of package B. That is indeed a much harder problem if we consider the guarantees that we make about compatibility and updates. I will elaborate why by responding to your comments:
We see that as a feature and not a bug. In the same way that Ubuntu updates the versions of packages and does not break the system we attempt to do the same thing in a timely manner. Even if we pinned versions we would have to update as fast as possible to get the latest security fixes. Lastly, we can rely on the fact that Ubuntu packagers are not going to change packages unexpectedly for an arbitrary reason, substantial changes only happen for new releases.
I think this is where we have the biggest disconnect. I think this is a valid point for traditional package managers which support different compatibility matrices for packages. Marking one version as compatible with another is more of a manual process where it is the case by default if the packages do not interact. Chisel however has much stronger guarantees. Because we specify the contents of the packages upfront and we verify that there are no conflicts without downloading them, we have to be more restrictive. As a result, we can be sure that if there is no conflict in the slice definitions then there is never going to be a conflict in the final image. The problem with maintaining different version of packages is that all the versions cannot have any conflict with any other package or their versions. Coupled with the restrictiveness of the slice definitions, that means that it will be really hard to guarantee that there are no conflicts. Just to give you an example, we do not allow different slices to declare the same path (with some exceptions) because we cannot guarantee that the extracted content will be the same. That translates roughly to a one slice per path rule. We will have a combinatorial explosion if we allowed all versions to co-exist, where some of them declare a path and are incompatible with some packages and some of them don't. But all of them have to be compatible so we would need to change them in a non-obvious way. The alternative is to end up with groups of packages that are compatible with each other and taint the rest. The moment one group has one package that is incompatible with a package in another group, both will be incompatible as a whole, which is why we do not want to segment it. Then there is the questions of backporting features and Chisel bug/security fixes. If we maintained slices for each version of a package we will have to effectively do that for each version, for no real benefit as the outdated version will not contain the latest security fixes for the package. EDIT: Please tell me if the conflict resolution bit is clear or not because maybe we need to write something more detailed in the general documentation than what we currently have. |
@letFunny agreed but maybe there's an easier / simpler way forward. When you use a Ubuntu snapshot the "pinning" would be "whatever was in that Ubuntu version at that point in time". Which, unless I'm mistaken, that's exactly what you have today except you only maintain the "latest snapshot" because So, what if on top of the I think (2) is easy because it's just building the snapshot URL and then pulling whatever version of the package is in the snapshot. And for (1), since snapshots are time-based, it would be a matter of knowing what snapshots are compatible with what slices. Or rather, at which point in time, if any, a snapshot would break for a given set of slices. Then, it would be a matter of keeping different This is something that, by definition, you should have known anyway. Otherwise,
Thus I think the problem could be reduced to knowing what snapshots are allowed to work for the current Would something like this work? |
I think there's some common concepts shared here in your comments @letFunny @jjmaestro and my initial ideas. I think when I mentioned the option to "pin packages" it was not clearly stated the limits within that may happen. And the general case where a user may pin some packages to some versions, some to others and some unpinned is a general case that would be quite challenging to solve (plus I don't think it brings any significant value). My line of thought was closer to what @jjmaestro says, something along the lines of having some Actually, from this perspective, Chisel could even do this even when this flag is not requested: if an user ask for "latest-latest", check what is the very latest snapshot available (if I'm not mistaken they happen every 4 hours or so, which means that they would be very latest) and use (internally; could be exposed to the user or not) that snapshot to build the chiselled container image. I understand Chisel requirements are more strict with slices to prevent them overlapping (that makes sense). But if the need for a snapshot functionality could be accommodated within these bounds, it would be great --and I'd be happy to try to contribute (not a Go programmer, but I'd do my best) and definitely test and evaluate it. |
Thank you for these because they are indeed very good ideas and something that could, in theory, support reproducibility without any of the downsides I mentioned in the comments above. Tagging releases with the supported snapshot(s) without having to make any of the extra guarantees or sacrifice security looks like something that could also work well with the rest of the design. That being said, I think this feature would require a lot more discussion and a more detailed design (apart from the implementation), and right now we want to focus on the foundations for Chisel, at least for the next few months. So I will make a note and when we finish the priority features we will come back to study this use-case, thanks again! |
Awesome! Feel free to ping here when time comes, happy to provide feedback. Thank you @letFunny |
(for future reference, the idea of pinning the chisel-releases itself has also been discussed in #172) |
I'm new to
chisel
, so my understanding may be wrong. But it seems like packages are always picked from the "latest" available version. While this may be good for experimenting, it doesn't help making stable (read: reproducible) builds.Is it possible to pin package versions? Is it possible to use snapshot archives?
From a quick look at the source code, it seems like repositories are hardcoded. If so, are there any plans to offer alternatives in these areas?
The text was updated successfully, but these errors were encountered: