Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions docs/signature-layout.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Signature File Layout

A common file layout for storing and serving signatures provides a consistent way to reference image signatures. Signatures on a filesystem or a web server shall use this common layout. Signatures stored in a REST API are not required to use this common layout.

## Specification

This specification relies on [RFC3986](https://tools.ietf.org/html/rfc3986), focusing on defining a [path component](https://tools.ietf.org/html/rfc3986#section-3.3) to compose a concise URI reference to a signature.

**SCHEME[AUTHORITY]/PATH_PREFIX/IMAGE@MANIFEST_DIGEST/signature-INT**

**Definitions**

* **SCHEME**: URI scheme per [RFC3986](https://tools.ietf.org/html/rfc3986#section-3.1), e.g. **file://** or **https://**
* **AUTHORITY**: An optional authority reference per [RFC3986](https://tools.ietf.org/html/rfc3986#section-3.2), e.g. **example.com**
* **PATH_PREFIX**: An arbitrary base path to the image component
* **IMAGE**: The name of the image per [v2 API](https://docs.docker.com/registry/spec/api/#/overview). This would typically take the form of registry/repository/image but is not required to have exactly three parts. There is no requirement to include a **:PORT** component but it should be included if part of the image reference.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about why the registry is included here. I'm thinking about how it fits into a mirroring case, where a user has an on-premise mirror of remote content. I want to understand how images and signatures stay associated with each other as they move through different registries and sigstores.

If I have a private internal registry called reg.mycorp.net, and it contains Red Hat images, what is the expectation around a sigstore?

Can my local clients still use Red Hat's sigstore directly? If so, does inclusion of the registry in the URL cause problems? Obviously Red Hat's sigstore won't know about reg.mycorp.net. Somehow my client would need to know to include the name of Red Hat's registry in the URL when accessing the sigstore. Is there a way for the client to even know that, if it pulled an image from reg.mycorp.net ?

Or is the expectation that each registry should have its own sigstore, and it should mirror signatures from remote registries to correspond with mirrored content? That sounds more plausible, but it requires tight coordination between any particular registry and its sigstore. It also removes the value of having the registry in the URL here.

Or is there another workflow I'm not thinking of?

And do we want to retain the option to distribute images without using a registry at all, but still verify signatures against a sigstore?

Back to the original question, now that I've barraged you with a variety of scenarios and details, what is the value of specifying a registry in this URL?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can address distribution, mirroring, and (somewhat) discovery by using a signature-list blob like the one I've proposed in opencontainers/image-spec#176. Then you can distribute an image-layout (or whatever) with a ref pointing at the application/vnd.oci.image.signed.blob.v1+json blob, and that blob would point at the signature blobs (via signatures[]) and the blob being signed (via blob). A registry keeping track of all known signatures would automatically build a new application/vnd.oci.image.signed.blob.v1+json blob whenever a new signature was submitted (potentially validating the signature first or performing other gate-keeping), and use a name-addressable location (like the SCHEME[AUTHORITY]/PATH_PREFIX/IMAGE@MANIFEST_DIGEST/signature proposed in this PR) to give users a way to get the most recent application/vnd.oci.image.signed.blob.v1+json.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhrivnak good questions. I think it's an "all or nothing" issue: we either fully qualify the signature reference (this proposal) or we have a flat list of signatures referenced by hash ID alone (possibly with some namespacing around different transports). It's a fair discussion to have.

Regarding the mirroring use case, it seems you are either mirroring the image AND the associated signatures or you are not mirroring at all. With this design one could NOT mirror the images and point to the original sigstore. This would seem to tip in favor of a flat signature layout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking on this a bit more, one of the value propositions of the signature approach is signatures can be proliferated all over the place. Locking down a very specific namespace makes this more challenging.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on the proliferation expectation.

I do think there's a potentially common use case to mirror content but not the related signatures, especially if the tooling is separate for the two.

As a user, I want to position copies of large artifacts close to where they will be used, so deployment goes quickly. If signatures are only available from one source, that doesn't present such a bottleneck. So unless the tooling makes it easy for me to mirror signatures with images, I might not want to bother mirroring the signatures.

That said, mirroring becomes more valuable when you can eliminate dependence on remote data, so it would be a fine idea to focus on making it easy to mirror signatures with images.

Copy link
Collaborator

@mtrmac mtrmac Oct 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… on-premises registry at reg.hrivnak.org. ¸ a local mirror … includes repositories like redhat/rhel7. A local client tries to pull reg.hrivnak.org/redhat/rhel7:latest …

…The spec defined by this PR requires that the client insert the hostname of the original Red Hat registry into the URL. The client knows nothing about that registry, right?. Unless the intent is that all signatures on reg-sigstore.hrivnak.org would be namespaced to reg.hrivnak.org?

Yes. As you say, the client cannot trivially determine that this is a mirror, it only knows the hostname/repo used to refer to the mirrored content.

But if that's the case, why include a hostname at all?

Because it allows setting up a single sigstore server for multiple corporate registries.

Admittedly that is a fairly weak reason, but with the model of sigstores namespaced per-repo, not flat, within a single host name, removing the host name component would still not allow sharing signatures between differently-named repos. So, including the host name is a more symmetric / cleaner way to support the model in which the signatures are definitely namespaced, not shared, it prevents accidental unwanted sharing.

It can also be useful for a very locked-down site, where every allowed image needs to be individually approved with a company-private signature, to define a single sigstore for the whole internet; the image references would not change (so an upstream application which pulls library/busybox:latest does not need changing to point at a mirror), but the approving IT department would set up a single sigstore server “for the whole internet” serving these company-private signatures. Each host in the data center would be configured to require these private signatures, and to use this sigstore, but otherwise the applications can continue to use upstream references. [This is, more or less, an exact opposite of the “we mirror image layers but want to contact the original upstream for signatures”; see below for more on that.]

there would need to be a single world-wide sigstore.

No, not at all!

The idea is that any individual sigstore could host whatever collection of signatures are valuable to its audience of users. If you are an authority, you can host a sigstore with your signatures. If you want to mirror one or more authorities, you mirror the signature files from those authoritative sigstores. There's nothing centralized about it.

Hum, OK. So if I understand correctly you are considering a scenario where the user really wants to mirror the layers only because they are big and slow, and conceptually is perfectly fine with not having a mirror and fetching data directly.

If that is the desire, a transparent proxy (caching both layers and signatures invisibly to the clients, as/if necessary) might be a cleaner way to achieve that: it means fewer modifications in the clients, OTOH a different kind of administrative effort/overhead.

In other cases, “an authority hosting signatures” is not a natural way to think about signatures. If there should be an authority saying “yes this image is OK / no I don’t know this one”, persistent signature objects are not needed: it can be an API provided by the authority over HTTPS.

One very important way to use signatures is to provide “non-repudiation”, i.e. a persistent, verifiable record that somebody has signed something. If an ISV publishes a backdoored image, and their client installs it because it has been signed by the ISV, it is very valuable for the client to later be able to attribute the image to the ISV even if the ISV tries to deny this (stops publishing the image and signatures). In that case it is essential for the client to have a local copy of the signature, (along with a local copy of the signed image, or a local copy of enough other data to prove what the signature applies to).

And of course the client never knows that they will need to do such an investigation in advance. So, basically, the standard operating procedure should be that any time an image is copied, the signatures are copied along, and perhaps logged somewhere. Making a mirror of a remote repo? Mirror the signatures. Pulling a container image onto a cluster host? Copy the signatures. Extracting the container image and preparing to run the container? Also record the signatures, at least in a log, or preferably in a form allowing to verify the extracted image against the signature.

Everywhere the image goes, the signatures go. A separate distribution mechanism with different lifetimes and under a different control is a risk to the ability to conduct forensic investigations.

I think the conceptual hurdle is to ignore how the sigstore technically works and to see that containers/image generally treats manifest+layers+signatures as single atomic unit. Then the philosophy and access control issues etc. follow naturally.

The reason for this PR is to define aspects of how the sigstore technically works. I don't know what to do with your suggestion to ignore that.

(I’m honestly not sure how much this PR is still free to design things, or whether it is just to document the already shipped implementation.)

The thing is, containers/image can only work and be useful if the philosophy is consistent between various ways to store and access signatures; that ImageDestination.PutSignatures does something reasonably consistent between ImageDestination implementations. Consider that a design constraint if you like. (Of course it can be argued that the c/image API is wrong, and yes we can and do change it. But if ImageDestination becomes a mess of optional methods and every client will basically need to individually implement special signature storage semantics for every transport, c/image loses a lot of its value.)

(FWIW note that naive mirroring signatures with names cannot “just work” because the signatures claim the original (redhat/rhel7) identities, so pulling the reg.hrivnak.org/redhat/rhel7:latest will see an identity mismatch. That is quite unrelated to the sigstore mechanism, and we are prepared to support that, see hostname:5000/vendor/product in https://github.com/containers/image/blob/master/docs/policy.json.md#examples .)

This is very interesting. Thanks.

But per https://github.com/containers/image/pull/59/files#diff-c44796aa0013711d899f7345d01b8185R37, the signature would only include the name of the repository, and not include a registry hostname. I'm not sure where the identity mismatch would happen. Or is that PR (which is still not merged as of this writing) not accurate?

That does seem to be mistaken. It is “a string with the semantics and normalization conventions of containers/image/docker/reference.Named” (~upstream, not RH, docker/docker/reference.Named)”, and those strings do semantically specify host names, although the host name may be implicit.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all of the discussion.

I understand the value of having the signatures be persistent artifacts that a user can hold on to. I didn't mean to give an impression otherwise. I am not advocating for an API.

I'll try to articulate the mirror use cases clearly, so we can get out of the weeds of the theoretical. I work on Pulp, which is used by Katello, which is basically the upstream for Satellite. Mirroring software and related artifacts, and managing the lifecycle of those (testing, promotion, distribution to remote sites, etc) is a core use case for each of those projects/products. I hope we can create data structures and access specs that make it easy and natural for these projects, and any other tools, to manage signatures in sigstores.

As a user, I want to mirror software from third parties. I want a full copy of everything I need on-site within my infrastructure, because I don't want to depend on the internet to do deployments.

As a user, I want to carefully restrict what software is available to which parts of my infrastructure. As new content is produced by third parties, I want to mirror it on-site, then walk it through a testing workflow before promoting it to a production-facing repository.

As a user doing the above, I want to manage my private repositories of software in one place, and then mirror those repositories out to remote locations within my private infrastructure. This is to achieve redundancy and/or scalability, often across multiple facilities.

As a user doing the above, I want to mirror that content to disconnected facilities, where content must be walked in on a disk.

I think those cover the basic requirements I'm hoping we can support, although obviously there are variations. As you can see, a caching proxy would not be sufficient.

At each step in these use cases, a local client needs to figure out how to access an image and any associated signatures.

So I'll ask a much more general question now: Given what you are building, can the above use cases be met? If not, what gaps are there specifically?

Copy link
Contributor

@wking wking Nov 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Thu, Nov 03, 2016 at 12:05:36PM -0700, Michael Hrivnak wrote:

As a user doing the above, I want to mirror that content to disconnected facilities, where content must be walked in on a disk.

I think those cover the basic requirements I'm hoping we can support, although obviously there are variations. As you can see, a caching proxy would not be sufficient.

At each step in these use cases, a local client needs to figure out how to access an image and any associated signatures.

I think you can do this with a caching proxy, using @mtrmac's “manifest+layers+signatures as single atomic unit” paradigm. You have your central, in-house store as a caching proxy for the wider world (and maybe configure it to not push new content back to the world). You do your management on that central store. Individual components within your infrastructure have their own production stores, and you bring content to them with a workflow like:

  1. Image foo-bar:1.0 is vetted by QA, and gets the QA sig in the central store.
  2. Setup a proxy pulling from your central store which caches into a local scratch directory.
  3. Perform the operation you'd like to support (e.g. fetching the image, verifying the signatures, and walking any referenced CAS DAGs) on your proxy from (2).
  4. Pack the scratch directory created in (2) and filled by (3) up and transmit it to the appropriate production store (e.g. via a sneakernet).
  5. Push the new content from the transmitted cache into the production store.

You can get more performant if you track which objects are in the production stores so you can skip those when pushing (roughly like Git with it's remote references).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhrivnak I’m not sure what issues you see, you must have much more insight into the usual mirroring workflows; so let me try to describe how I understand the situation, to establish a common ground and perhaps explicitly acknowledge some of the downsides:

Really the external/internal use cases and the transparent copy / copy which changes client-visible image ID are intertwined but somewhat orthogonal, I’m afraid the notes below mix the two up a bit but the point should be clear, and perhaps you have a better way to express the matrix of possibilities

As a user, I want to carefully restrict what software is available to which parts of my infrastructure. As new content is produced by third parties, I want to mirror it on-site, then walk it through a testing workflow before promoting it to a production-facing repository.

(ISV→customer) The clients within your infrastructure would refer to the images using the on-site identities, knowing whether they want to use the testing or production images? Then, whatever software is doing the copy from the source to the destination, must also copy the signatures between the source and destination sigstores, and it doesn’t matter whether the sigstore format includes host names because the software doing the copy by definition knows the destination host name and can place the signatures into the correct location.

… mirror those repositories out to remote locations within my private infrastructure. This is to achieve redundancy and/or scalability, often across multiple facilities.

The difficulty here is in the indirection necessary to convince various clients which conceptually want the same image, and conceptually refer to it using the same ID/location/hostname (rhel:7.2.3) to actually talk to the appropriate mirror, with three possible approaches:

  • Transparent indirection (different DNS responses for the same host name). No issues here; do a full file-by-file-copy between mirrors, the clients won’t notice. Sigstores work exactly the same as any other mirrors or proxies.
  • A mirror list explicitly implemented in the clients, and clients contacting the mirrors instead of the official/primary host name (but this may be system-wide configuration invisible to human users, who still use the official image IDs). Yes, the clients may need a similar client-side mirror configuration system for the sigstores, and yes, that will be a tiny bit more code when the official/primary host names are included in the paths, but overall trivial I think. We don’t support mirror lists for sigstores right now at all, so if/when we do add that, we can handle the hostname component of the path appropriately.
  • User-visible reference changes, where the image pulling subsystem is not aware of the mirrors, and users, or users’ programs, need to manually update references (instead of pulling busybox they need to pull {Brno,Paris,Dakkar}.mirror/busybox). Yes, then the host names in sigstore paths would need to be changed when doing copies between mirrors (and that is always possible, this is really the same as the “ISV→customer" case above)—but by far the biggest difficulty is that with changing the image identity like that, the identities in image signatures no longer match (as they shouldn’t). Then it becomes necessary to maintain a specialized policy.json for name overrides so that the signatures are accepted, which is awkward all around; it would be overall much preferable to steer users towards the mirror list approach or transparent indirection instead.

I may very well be missing something of course.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Record from today’s meeting: we will drop the host name from the current specification (but perhaps support reading the hostname-qualified paths for compatibility); that specification change should happen together with other changes (perhaps flattening the namespace if necessary for Pulp, getting rid of special characters e.g. #187 ).

* **MANIFEST_DIGEST**: The value of the manifest digest, including the hash function and hash, e.g. **sha256:HASH**
* **INT**: An integer of the signature starting with 1. For multiple signatures increment by 1, e.g. **signature-1**, **signature-2**.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update that WIP PR to this convention.


## Examples

1. A reference to a local file signature

file:///var/lib/containers/signatures/registry.example.com:5000/acme/myimage@sha256:b1c302ecc8e21804a288491cedfed9bd3db972ac8367ccab7340b33ecd1cb8eb/signature-1
1. A reference to a signature on a web server

https://sigs.example.com/signatures/registry.example.com:5000/acme/myimage@sha256:b1c302ecc8e21804a288491cedfed9bd3db972ac8367ccab7340b33ecd1cb8eb/signature-1
1. A reference to two signatures on a web server

https://sigs.example.com/signatures/registry.example.com:5000/acme/myimage@sha256:b1c302ecc8e21804a288491cedfed9bd3db972ac8367ccab7340b33ecd1cb8eb/signature-1
https://sigs.example.com/signatures/registry.example.com:5000/acme/myimage@sha256:b1c302ecc8e21804a288491cedfed9bd3db972ac8367ccab7340b33ecd1cb8eb/signature-2

## Signature Indexing and Discovery

There is no signature indexing mechanism or service defined. Signatures are obtained by iterating with increasing indexes, stopping at first missing index.