-
Notifications
You must be signed in to change notification settings - Fork 395
Signature file layout specification #120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,35 @@ | ||
| # Signature File Layout | ||
|
|
||
| A common file layout for storing and serving signatures provides a consistent way to reference image signatures. Signatures on a filesystem or a web server shall use this common layout. Signatures stored in a REST API are not required to use this common layout. | ||
|
|
||
| ## Specification | ||
|
|
||
| This specification relies on [RFC3986](https://tools.ietf.org/html/rfc3986), focusing on defining a [path component](https://tools.ietf.org/html/rfc3986#section-3.3) to compose a concise URI reference to a signature. | ||
|
|
||
| **SCHEME[AUTHORITY]/PATH_PREFIX/IMAGE@MANIFEST_DIGEST/signature-INT** | ||
|
|
||
| **Definitions** | ||
|
|
||
| * **SCHEME**: URI scheme per [RFC3986](https://tools.ietf.org/html/rfc3986#section-3.1), e.g. **file://** or **https://** | ||
| * **AUTHORITY**: An optional authority reference per [RFC3986](https://tools.ietf.org/html/rfc3986#section-3.2), e.g. **example.com** | ||
| * **PATH_PREFIX**: An arbitrary base path to the image component | ||
| * **IMAGE**: The name of the image per [v2 API](https://docs.docker.com/registry/spec/api/#/overview). This would typically take the form of registry/repository/image but is not required to have exactly three parts. There is no requirement to include a **:PORT** component but it should be included if part of the image reference. | ||
| * **MANIFEST_DIGEST**: The value of the manifest digest, including the hash function and hash, e.g. **sha256:HASH** | ||
| * **INT**: An integer of the signature starting with 1. For multiple signatures increment by 1, e.g. **signature-1**, **signature-2**. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just a note: this differs from https://github.com/aweiteka/image/blob/sigspec/signature/spec/README.md#static-file-layout
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll update that WIP PR to this convention. |
||
|
|
||
| ## Examples | ||
|
|
||
| 1. A reference to a local file signature | ||
|
|
||
| file:///var/lib/containers/signatures/registry.example.com:5000/acme/myimage@sha256:b1c302ecc8e21804a288491cedfed9bd3db972ac8367ccab7340b33ecd1cb8eb/signature-1 | ||
| 1. A reference to a signature on a web server | ||
|
|
||
| https://sigs.example.com/signatures/registry.example.com:5000/acme/myimage@sha256:b1c302ecc8e21804a288491cedfed9bd3db972ac8367ccab7340b33ecd1cb8eb/signature-1 | ||
| 1. A reference to two signatures on a web server | ||
|
|
||
| https://sigs.example.com/signatures/registry.example.com:5000/acme/myimage@sha256:b1c302ecc8e21804a288491cedfed9bd3db972ac8367ccab7340b33ecd1cb8eb/signature-1 | ||
| https://sigs.example.com/signatures/registry.example.com:5000/acme/myimage@sha256:b1c302ecc8e21804a288491cedfed9bd3db972ac8367ccab7340b33ecd1cb8eb/signature-2 | ||
|
|
||
| ## Signature Indexing and Discovery | ||
|
|
||
| There is no signature indexing mechanism or service defined. Signatures are obtained by iterating with increasing indexes, stopping at first missing index. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about why the registry is included here. I'm thinking about how it fits into a mirroring case, where a user has an on-premise mirror of remote content. I want to understand how images and signatures stay associated with each other as they move through different registries and sigstores.
If I have a private internal registry called
reg.mycorp.net, and it contains Red Hat images, what is the expectation around a sigstore?Can my local clients still use Red Hat's sigstore directly? If so, does inclusion of the registry in the URL cause problems? Obviously Red Hat's sigstore won't know about
reg.mycorp.net. Somehow my client would need to know to include the name of Red Hat's registry in the URL when accessing the sigstore. Is there a way for the client to even know that, if it pulled an image fromreg.mycorp.net?Or is the expectation that each registry should have its own sigstore, and it should mirror signatures from remote registries to correspond with mirrored content? That sounds more plausible, but it requires tight coordination between any particular registry and its sigstore. It also removes the value of having the registry in the URL here.
Or is there another workflow I'm not thinking of?
And do we want to retain the option to distribute images without using a registry at all, but still verify signatures against a sigstore?
Back to the original question, now that I've barraged you with a variety of scenarios and details, what is the value of specifying a registry in this URL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can address distribution, mirroring, and (somewhat) discovery by using a signature-list blob like the one I've proposed in opencontainers/image-spec#176. Then you can distribute an image-layout (or whatever) with a ref pointing at the
application/vnd.oci.image.signed.blob.v1+jsonblob, and that blob would point at the signature blobs (viasignatures[]) and the blob being signed (viablob). A registry keeping track of all known signatures would automatically build a newapplication/vnd.oci.image.signed.blob.v1+jsonblob whenever a new signature was submitted (potentially validating the signature first or performing other gate-keeping), and use a name-addressable location (like theSCHEME[AUTHORITY]/PATH_PREFIX/IMAGE@MANIFEST_DIGEST/signatureproposed in this PR) to give users a way to get the most recentapplication/vnd.oci.image.signed.blob.v1+json.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mhrivnak good questions. I think it's an "all or nothing" issue: we either fully qualify the signature reference (this proposal) or we have a flat list of signatures referenced by hash ID alone (possibly with some namespacing around different transports). It's a fair discussion to have.
Regarding the mirroring use case, it seems you are either mirroring the image AND the associated signatures or you are not mirroring at all. With this design one could NOT mirror the images and point to the original sigstore. This would seem to tip in favor of a flat signature layout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking on this a bit more, one of the value propositions of the signature approach is signatures can be proliferated all over the place. Locking down a very specific namespace makes this more challenging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed on the proliferation expectation.
I do think there's a potentially common use case to mirror content but not the related signatures, especially if the tooling is separate for the two.
As a user, I want to position copies of large artifacts close to where they will be used, so deployment goes quickly. If signatures are only available from one source, that doesn't present such a bottleneck. So unless the tooling makes it easy for me to mirror signatures with images, I might not want to bother mirroring the signatures.
That said, mirroring becomes more valuable when you can eliminate dependence on remote data, so it would be a fine idea to focus on making it easy to mirror signatures with images.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. As you say, the client cannot trivially determine that this is a mirror, it only knows the hostname/repo used to refer to the mirrored content.
Because it allows setting up a single sigstore server for multiple corporate registries.
Admittedly that is a fairly weak reason, but with the model of sigstores namespaced per-repo, not flat, within a single host name, removing the host name component would still not allow sharing signatures between differently-named repos. So, including the host name is a more symmetric / cleaner way to support the model in which the signatures are definitely namespaced, not shared, it prevents accidental unwanted sharing.
It can also be useful for a very locked-down site, where every allowed image needs to be individually approved with a company-private signature, to define a single sigstore for the whole internet; the image references would not change (so an upstream application which pulls
library/busybox:latestdoes not need changing to point at a mirror), but the approving IT department would set up a single sigstore server “for the whole internet” serving these company-private signatures. Each host in the data center would be configured to require these private signatures, and to use this sigstore, but otherwise the applications can continue to use upstream references. [This is, more or less, an exact opposite of the “we mirror image layers but want to contact the original upstream for signatures”; see below for more on that.]Hum, OK. So if I understand correctly you are considering a scenario where the user really wants to mirror the layers only because they are big and slow, and conceptually is perfectly fine with not having a mirror and fetching data directly.
If that is the desire, a transparent proxy (caching both layers and signatures invisibly to the clients, as/if necessary) might be a cleaner way to achieve that: it means fewer modifications in the clients, OTOH a different kind of administrative effort/overhead.
In other cases, “an authority hosting signatures” is not a natural way to think about signatures. If there should be an authority saying “yes this image is OK / no I don’t know this one”, persistent signature objects are not needed: it can be an API provided by the authority over HTTPS.
One very important way to use signatures is to provide “non-repudiation”, i.e. a persistent, verifiable record that somebody has signed something. If an ISV publishes a backdoored image, and their client installs it because it has been signed by the ISV, it is very valuable for the client to later be able to attribute the image to the ISV even if the ISV tries to deny this (stops publishing the image and signatures). In that case it is essential for the client to have a local copy of the signature, (along with a local copy of the signed image, or a local copy of enough other data to prove what the signature applies to).
And of course the client never knows that they will need to do such an investigation in advance. So, basically, the standard operating procedure should be that any time an image is copied, the signatures are copied along, and perhaps logged somewhere. Making a mirror of a remote repo? Mirror the signatures. Pulling a container image onto a cluster host? Copy the signatures. Extracting the container image and preparing to run the container? Also record the signatures, at least in a log, or preferably in a form allowing to verify the extracted image against the signature.
Everywhere the image goes, the signatures go. A separate distribution mechanism with different lifetimes and under a different control is a risk to the ability to conduct forensic investigations.
(I’m honestly not sure how much this PR is still free to design things, or whether it is just to document the already shipped implementation.)
The thing is, containers/image can only work and be useful if the philosophy is consistent between various ways to store and access signatures; that
ImageDestination.PutSignaturesdoes something reasonably consistent betweenImageDestinationimplementations. Consider that a design constraint if you like. (Of course it can be argued that the c/image API is wrong, and yes we can and do change it. But ifImageDestinationbecomes a mess of optional methods and every client will basically need to individually implement special signature storage semantics for every transport, c/image loses a lot of its value.)That does seem to be mistaken. It is “a string with the semantics and normalization conventions of
containers/image/docker/reference.Named” (~upstream, not RH,docker/docker/reference.Named)”, and those strings do semantically specify host names, although the host name may be implicit.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all of the discussion.
I understand the value of having the signatures be persistent artifacts that a user can hold on to. I didn't mean to give an impression otherwise. I am not advocating for an API.
I'll try to articulate the mirror use cases clearly, so we can get out of the weeds of the theoretical. I work on Pulp, which is used by Katello, which is basically the upstream for Satellite. Mirroring software and related artifacts, and managing the lifecycle of those (testing, promotion, distribution to remote sites, etc) is a core use case for each of those projects/products. I hope we can create data structures and access specs that make it easy and natural for these projects, and any other tools, to manage signatures in sigstores.
As a user, I want to mirror software from third parties. I want a full copy of everything I need on-site within my infrastructure, because I don't want to depend on the internet to do deployments.
As a user, I want to carefully restrict what software is available to which parts of my infrastructure. As new content is produced by third parties, I want to mirror it on-site, then walk it through a testing workflow before promoting it to a production-facing repository.
As a user doing the above, I want to manage my private repositories of software in one place, and then mirror those repositories out to remote locations within my private infrastructure. This is to achieve redundancy and/or scalability, often across multiple facilities.
As a user doing the above, I want to mirror that content to disconnected facilities, where content must be walked in on a disk.
I think those cover the basic requirements I'm hoping we can support, although obviously there are variations. As you can see, a caching proxy would not be sufficient.
At each step in these use cases, a local client needs to figure out how to access an image and any associated signatures.
So I'll ask a much more general question now: Given what you are building, can the above use cases be met? If not, what gaps are there specifically?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Thu, Nov 03, 2016 at 12:05:36PM -0700, Michael Hrivnak wrote:
I think you can do this with a caching proxy, using @mtrmac's “manifest+layers+signatures as single atomic unit” paradigm. You have your central, in-house store as a caching proxy for the wider world (and maybe configure it to not push new content back to the world). You do your management on that central store. Individual components within your infrastructure have their own production stores, and you bring content to them with a workflow like:
You can get more performant if you track which objects are in the production stores so you can skip those when pushing (roughly like Git with it's remote references).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mhrivnak I’m not sure what issues you see, you must have much more insight into the usual mirroring workflows; so let me try to describe how I understand the situation, to establish a common ground and perhaps explicitly acknowledge some of the downsides:
Really the external/internal use cases and the transparent copy / copy which changes client-visible image ID are intertwined but somewhat orthogonal, I’m afraid the notes below mix the two up a bit but the point should be clear, and perhaps you have a better way to express the matrix of possibilities
(ISV→customer) The clients within your infrastructure would refer to the images using the on-site identities, knowing whether they want to use the testing or production images? Then, whatever software is doing the copy from the source to the destination, must also copy the signatures between the source and destination sigstores, and it doesn’t matter whether the sigstore format includes host names because the software doing the copy by definition knows the destination host name and can place the signatures into the correct location.
The difficulty here is in the indirection necessary to convince various clients which conceptually want the same image, and conceptually refer to it using the same ID/location/hostname (
rhel:7.2.3) to actually talk to the appropriate mirror, with three possible approaches:busyboxthey need to pull{Brno,Paris,Dakkar}.mirror/busybox). Yes, then the host names in sigstore paths would need to be changed when doing copies between mirrors (and that is always possible, this is really the same as the “ISV→customer" case above)—but by far the biggest difficulty is that with changing the image identity like that, the identities in image signatures no longer match (as they shouldn’t). Then it becomes necessary to maintain a specializedpolicy.jsonfor name overrides so that the signatures are accepted, which is awkward all around; it would be overall much preferable to steer users towards the mirror list approach or transparent indirection instead.I may very well be missing something of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Record from today’s meeting: we will drop the host name from the current specification (but perhaps support reading the hostname-qualified paths for compatibility); that specification change should happen together with other changes (perhaps flattening the namespace if necessary for Pulp, getting rid of special characters e.g. #187 ).