-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] document how to add new sources #2807
base: master
Are you sure you want to change the base?
Conversation
There is a lot I do not understand here, and so could not capture. I will comment a lot inline. But fundamentally, what is it that the solver really expects of the
Clearly, it does not do that. It does more, but also differently. What, why, how? Who actually handles the caching? What are the responsibilities of the |
use what is in its cache. If it is different or does not exist, then it knows it needs to ask the source for the content referred to by the | ||
`id` | ||
|
||
This is expected to be idempotent, a read-only activity that returns a unique cache key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? Is it an idempotent "id->key" function?
The arguments to `CacheKey()` are: | ||
|
||
* `ctx context.Context`: A golang [context](https://pkg.go.dev/context#Context). | ||
* `g session.Group`: The [session.Group](https://pkg.go.dev/github.com/moby/[email protected]/session#Group) contains parameters to identify the specific buildkit client calling this build. These may be useful for things like credentials. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who creates these? Who consumes them?
docs/sources.md
Outdated
|
||
* `ctx context.Context`: A golang [context](https://pkg.go.dev/context#Context). | ||
* `g session.Group`: The [session.Group](https://pkg.go.dev/github.com/moby/[email protected]/session#Group) contains parameters to identify the specific buildkit client calling this build. These may be useful for things like credentials. | ||
* `index int`: The `CacheKey()` can be called multiple times for the same source item, each indicating that it is a separate subitem. The `index` indicates which iteration of call this is. Generally, if the `index > 0` then it is complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part really threw me for a loop. The solver
doesn't know more than, "here is an ID, give me a cache key". How can it know of multiple index integers? When does it change them? What is its meaning?
|
||
The expected return values are: | ||
|
||
* `cacheKey string`: the unique cache ID; this might be the combination of a hash with the source identifier. For example, `docker-image` uses the hash of the image config, while `local` combines the path to the directory and hash of a json that includes the session and search parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this unique across? For example, for container image in a registry, is this just going to be the hash of the root manifest, e.g. right now, docker.io/library/nginx:1.21.6-alpine
resolves to sha256:5a0df7fb7c8c03e4158ae9974bfbd6a15da2bdfdeded4fb694367ec812325d31
, so would the cache key just be sha256:5a0df7fb7c8c03e4158ae9974bfbd6a15da2bdfdeded4fb694367ec812325d31
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, how does this tie into the discussion above about index
and cacheDone
?
The expected return values are: | ||
|
||
* `cacheKey string`: the unique cache ID; this might be the combination of a hash with the source identifier. For example, `docker-image` uses the hash of the image config, while `local` combines the path to the directory and hash of a json that includes the session and search parameters. | ||
* `imgDigest string`: the digest of the root of the image. For example, `docker-image` uses the hash of the root manifest or index, while `local` returns the hash of the constructed json of the session and search parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the previous is the key, what is this?
|
||
* `cacheKey string`: the unique cache ID; this might be the combination of a hash with the source identifier. For example, `docker-image` uses the hash of the image config, while `local` combines the path to the directory and hash of a json that includes the session and search parameters. | ||
* `imgDigest string`: the digest of the root of the image. For example, `docker-image` uses the hash of the root manifest or index, while `local` returns the hash of the constructed json of the session and search parameters. | ||
* `cacheOpts solver.CacheOpts`: The [CacheOpts](https://pkg.go.dev/github.com/moby/[email protected]/solver#CacheOpts) contain metadata that can be passed from one iteration of `CacheKey()` to another, enabling context to be passed from call to call without the `Source` needing to store information. This is useful for items like progress streams and remote reference information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are examples of some opts here? And who consumes them, given that the SourceInstance
only returns them, so who consumes them and how?
docs/sources.md
Outdated
* `cacheKey string`: the unique cache ID; this might be the combination of a hash with the source identifier. For example, `docker-image` uses the hash of the image config, while `local` combines the path to the directory and hash of a json that includes the session and search parameters. | ||
* `imgDigest string`: the digest of the root of the image. For example, `docker-image` uses the hash of the root manifest or index, while `local` returns the hash of the constructed json of the session and search parameters. | ||
* `cacheOpts solver.CacheOpts`: The [CacheOpts](https://pkg.go.dev/github.com/moby/[email protected]/solver#CacheOpts) contain metadata that can be passed from one iteration of `CacheKey()` to another, enabling context to be passed from call to call without the `Source` needing to store information. This is useful for items like progress streams and remote reference information. | ||
* `cacheDone bool`: Indicates whether or not caching is complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this actually mean? What is an example?
`docker-image` can use lazy retrieval, and defer pulling layers until they really are needed. However, everything that is needed to retrieve the content gets set up | ||
by `Snapshot()`. | ||
|
||
For example, `docker-image` pulls the image, while `local` ??? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea what local
does, actually. I read the code, but I am quite unsure what it does.
And when docker-image
"pulls" the image, does it store it in some local cache? Or is it just passing the data to the solver? How does it handle various layers? It looks (from the code) like it merges them together to create a single filesystem view? And how does that all get garbage-collected?
The arguments to `Snapshot()` are: | ||
|
||
* `ctx context.Context`: golang [context](https://pkg.go.dev/context#Context) | ||
* `g session.Group`: The [session.Group](https://pkg.go.dev/github.com/moby/[email protected]/session#Group) contains parameters to identify the specific buildkit client calling this build. These may be useful for things like credentials. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as this question
|
||
The expected return values are: | ||
|
||
* `cache.ImmutableRef`: a [cache.ImmutableRef interface](../cache/refs.go#L48), which is used to read the actual content. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't a io.Reader
or TarReader
, it is an ImmutableRef
. What does that mean? How does it use it to get the actual content in a useful way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so it isn't the SourceInstance
passing a stream of content that the solver will cache in its own cache, mount, etc.
Instead, it is that the SourceInstance is expected to cache it locally, returning instead a struct that implements ImmutableRef
, i.e. something that the solver can use to mount the content in the filesystem.
So then what does the whole ImmutableRef
interface do? The interface is:
type ImmutableRef interface {
Ref
Clone() ImmutableRef
// Finalize commits the snapshot to the driver if it's not already.
// This means the snapshot can no longer be mounted as mutable.
Finalize(context.Context) error
Extract(ctx context.Context, s session.Group) error // +progress
GetRemotes(ctx context.Context, createIfNeeded bool, cfg config.RefConfig, all bool, s session.Group) ([]*solver.Remote, error)
LayerChain() RefList
}
What is the SourceInstance
expected to provide here? There's a lot more than "give me a mount point and I will mount it."
@@ -0,0 +1,268 @@ | |||
# Sources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m feeling this should be placed in a directory like /docs/internal/, as this isn’t expected to be read by end users
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with moving this there. Let's get all of the missing content in first.
356d45c
to
195fe94
Compare
|
||
* `ctx context.Context`: A golang [context](https://pkg.go.dev/context#Context). | ||
* `g session.Group`: The [session.Group](https://pkg.go.dev/github.com/moby/[email protected]/session#Group) contains parameters to identify the specific buildkit client calling this build. These may be useful for things like credentials. | ||
* `index int`: The `CacheKey()` can be called multiple times for the same source item, each indicating that it is a separate subitem. The `index` indicates which iteration of call this is. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only place I found using cacheDone
was docker-image
source. But I could not figure out when and how it uses it. The entire cacheDone
and index
seem to boil down to these lines:
cacheDone = index > 0
if index == 0 || p.configKey == "" {
return p.manifestKey, p.manifest.MainManifestDesc.Digest.String(), cacheOpts, cacheDone, nil
}
return p.configKey, p.manifest.MainManifestDesc.Digest.String(), cacheOpts, cacheDone, nil
Which appears to mean: "the first time I am called, index=0
, so return cacheDone=false
. That means I will get called again with index=1
, at which time return cacheDone=true
."
There doesn't appear to be any other logic that differs from call 0 and call 1, so what purpose does it serve?
Signed-off-by: Avi Deitcher <[email protected]>
based on the extensive slack discussions with @sipsma and @tonistiigi
I have been trying to create an OCI layout source so it can be used as a build-context. Also potentially others. The maintainers have indicated they would accept it, but I struggled to understand the required behaviour and interface.
Once @sipsma is explaining it, let's capture it for future users.
When this actually captures everything in a useful way, we can remove the
draft
status and make it a real PR.