-
Notifications
You must be signed in to change notification settings - Fork 73
api, adaptation: implement pluggable validation API and a default validator plugin. #163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
/cc @chrishenzie PTAL. |
a8e4cb2
to
6243420
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very cool... like the builtin plugin approach to handle defaults...
think need to do a walk through but get the general gist and i like it.. nit turning off oci hooks heh.. but sure that should be configurable from the container runtime's nri config..
6243420
to
5c283c0
Compare
I opted to not restrict OCI hook injection (in the default configuration), to provide backward-compatible behavior. |
pkg/adaptation/default-validator.go
Outdated
|
||
type DefaultValidatorConfig struct { | ||
Enable bool `toml:"enable"` // enable default validator | ||
RejectOCIHooks bool `toml:"reject_oci_hooks"` // reject OCI hook injection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also give the option to disable validating that required plugins are present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would that be useful ? I mean, this is already by definition an opt-in feature since you need to actively annotate your containers/pods for the 'processed by required plugins' check to take place. Wouldn't adding a separate configuration knob make this more error prone to use ? Now it's enough to ensure that validation is enabled and then opt-in for this check by annotating. With an extra knob, you'd also need to make sure that the extra knob is in the right position.
One related thing that we should probably consider is at least logging a warning or an error, if the related annotation is present but we are running with disabled validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit: my comment applies to annotations, not to the DefaultValidatorConfig
struct.
I think yes.
If we configure required plugins also as part of the container runtime configuration, an annotation that allows the required plugins to be bypassed would allow those required plugins to be deployed as containers themselves.
// Plugins that made the adjustments and updates. | ||
OwningPlugins owners = 5; | ||
// Plugins consulted for adjustments and updates. | ||
repeated PluginInstance plugins = 6; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this is a repeated
field, should this be plugin
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When translated to golang, that would become a Plugin []*PluginInstance
, whereas now we have Plugins []*PluginInstance
which I think is better.
But I think I know where your comment is stemming from, probably looking at update
in the same message which is a slice too, but named in singular. We have been (grammatically) a bit inconsistent, though consistently inconsistent at that, with naming repeated fields. Container updates/evictions, although repeated/slices, are always named in singular. All other repeated fields, except env
(for "environment") and the individual hook slices in Hooks
(which follow the naming of the corresponding OCI Spec fields) are in plural. Probably not ideal, but there is some method to the madness.
2c74b0a
to
d679a1f
Compare
6f67009
to
66ca9ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've only looked through the first couple of commits so far, but will continue to look later.
b55af0f
to
cbb4232
Compare
pkg/api/strip.go
Outdated
@@ -0,0 +1,343 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is better, but I'm still concerned about drift between the proto and the Strip functions. I'm fine going ahead with this for now, but once we merge let's get an issue open to revisit.
One thought I had would be to write a protoc plugin (like protoc-gen-go-strip
) to generate the code, and then we can assert that there's no drift. But I wouldn't consider that a prerequisite to merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it wouldn't even need to be a protoc plugin. A plain go:generate
-helper to introspect the necessary types using reflect and then generate the necessary Strip()-functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to Sam's comment. For context, if we don't call Strip()
are we then left with a bunch of tests that break because of []
vs. nil
inequality? It seems like if that was the case, fixing those might prove simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it wouldn't even need to be a protoc plugin.
Yep, either. I just biased very slightly toward protoc plugins since we already have the plumbing to run them during proto generation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to Sam's comment. For context, if we don't call
Strip()
are we then left with a bunch of tests that break because of[]
vs.nil
inequality?
No. cmpopts.EquateEmpty()
option alone is enough to take care of that for maps and slices. It is some of the struct types that we want to have special equality comparison semantics for, because of the way we initialize things internally when collecting plugin responses versus how we want to write expected results in our test cases. Basically deeply-fully initialized empty structs (and maps) with some additional fields eventually set versus minimally initialized structs with the same fields set, and we want to consider these equal.
100699d
to
b76e9c1
Compare
b76e9c1
to
bafb6d9
Compare
pkg/api/strip.go
Outdated
@@ -0,0 +1,343 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to Sam's comment. For context, if we don't call Strip()
are we then left with a bunch of tests that break because of []
vs. nil
inequality? It seems like if that was the case, fixing those might prove simpler.
aa30021
to
59bac81
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this and bearing with me as I know it took a while for me to get back with comments. Some high level things:
- Overall I'm very happy with the approach and I like the interaction model
- @chrishenzie is going to share the design doc that we've been using publicly; thanks for collaborating on it with us
- We can defer some of the enhancements (better generated Strip logic, parallel validator execution, etc) to subsequent PRs
- I think this could easily be split into 3 separate PRs and that would make it easier to get parts in sooner
- The first two commits that cover changes to unit tests + protoizing conflict detection
- Core logic for the new validation interface, including the stub
- Default validating plugin (probably including the introduction of builtin plugins)
- This should help since I think (1) and (2) above would be fairly easy once the relatively minor comments I have are addressed, while (3) would take a bit more time to get in line with the doc. Splitting up also helps me as a reviewer since I don't need to re-check the earlier commits once they're merged.
pkg/adaptation/adaptation.go
Outdated
if len(r.plugins) > 0 { | ||
log.Infof(noCtx, "plugin invocation order") | ||
for i, p := range r.plugins { | ||
log.Infof(noCtx, " #%d: %q (%s)", i+1, p.name(), p.qualifiedName()) | ||
} | ||
} | ||
if len(r.validators) > 0 { | ||
log.Infof(noCtx, "validator plugin invocation order") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we were planning to invoke validators in parallel? (I don't think that has to be part of this PR, but I think we shouldn't promise a specific invocation order. Maybe we shouldn't sort these at all, or explicitly randomly sort them?)
// of plugins which must process containers of the pod during creation. | ||
// If enabled, the default validator checks and rejects the creation | ||
// of containers if they fail the check. | ||
RequiredPluginsAnnotation = "required-plugins.nri.io" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we're using nri.io
in other annotations already, but CNCF doesn't own that domain and it's currently priced at $17,488 private sale. Can we use nri.containerd.io
or nri.containerd.dev
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... I checked that out too, quite a while ago and decided to pass as it certainly didn't sound like the deal of the century.
We definitely can use either of those (I think one of Antonio's already merged PRs used one of those which I missed earlier but spotted last week). And I can update this PR similarly. I just wish we could find a domain which is not containerd-specific with such a spelled out emphasis. I feel a pang of guilt towards CRI-O when I consider naming it like that. NRI is supposed to be a common interface between the two runtimes. And I know that exactly with annotations there are already differences apparent to plugins. But this is a bit different. This is an annotation which is interpreted in core NRI and in that sense common.
Then again I am currently out of ideas as what could be a better usable alternative...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel a pang of guilt towards CRI-O when I consider naming it like that. NRI is supposed to be a common interface between the two runtimes.
That's fair. I think we should figure out something that is available and ask CNCF to acquire it for this purpose though, rather than reusing nri.io
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened a service desk ticket with CNCF to acquire something suitable. Will update here once we get it registered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CNCF has acquired noderesource.dev
for us to use.
yaml "gopkg.in/yaml.v3" | ||
) | ||
|
||
type DefaultValidatorConfig struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per the conversation in the doc, we should add something like require_plugins
and bypass_validation_annotation
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, required_plugins
I understand, it is the set of mandatory plugins required for all workloads/containers.
With bypass_validation_annotation
, is your idea that it would configure the name of a (boolean) annotation key which then would be used to annotate workloads to 'tolerate' missing mandatory plugins, for instance for mandatory plugins when they are deployed as DaemonSets ?
I was planning to add a single well-known annotation (tolerate-missing-plugins.$OUR_NRI_ANNOTATION_KEY_DOMAIN
)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With
bypass_validation_annotation
, is your idea that it would configure the name of a (boolean) annotation key which then would be used to annotate workloads to 'tolerate' missing mandatory plugins, for instance for mandatory plugins when they are deployed as DaemonSets ?
Pretty much, yes. I'm not sure whether it'd be boolean or accept a list of plugins to bypass (I can see pros/cons of each).
I was planning to add a single well-known annotation (
tolerate-missing-plugins.$OUR_NRI_ANNOTATION_KEY_DOMAIN
)...
There are two reasons I was suggesting a configurable annotation vs. well-known:
- I think this should be empty/off by default, so that workloads cannot arbitrarily declare that they can bypass required plugins without any ability to decide which plugins can bypass.
- A custom annotation gives a cluster administrator or a cloud provider the opportunity to have better policy controls around annotations potentially without any additional work. For example, cloud providers may already have Kubernetes validating webhooks that restrict users from specifying annotations prefixed with
cloud.example.com/
.
My hope with this is that it would be relatively safe both for users to upgrade and for cluster admins or cloud providers to ensure that policy continues to be followed. If the bypass feature was enabled by default, workloads could bypass intended policy during an upgrade to a version of the container runtime that supports the bypass feature. If the annotation is well-known, it would require the cluster admin or cloud provider to already know the annotation name ahead of any clusters upgrading in order to support the bypass feature.
Regarding the bypass_validation_annotation
, maybe it should be bypass_required_plugins_annotation
instead? Since required_plugins
could also include mutating plugins?
No problemo. I am a happy camper if I can help kick this forward in any way. And I really appreciate the time and effort both you and @chrishenzie have put into all of this.
Related to the necessity for Strip(), I think it is possible to eliminate it altogether. Now we need it, because during adjustment collection and conflict detection we 'deep-create' a full skeletal adjustment structure in the very beginning instead of creating it piece by piece only when and were necessary. (I can take a look once I've split up this PR).
Okay, I'll split this one up to a stacked set of 3 PRs according to the outline above. |
59bac81
to
fce7ea9
Compare
See here for the public version of the design doc. |
Implement pluggable container adjustment validation. When validator plugins are present, use them to validate the collected adjustments, failing container creation if any validation fails. For adjustment validation plugins receive the pod, the pristing un- adjusted container, the collected container adjustments, information about which plugins adjusted what container parameters, and the list of plugins consulted for the adjustments. The plugin can then choose to accept or reject the adjustments. Accepting or rejecting adjustments are transactional. Either all or none of the adjustments are accepted, together with the container creation request. IOW, rejecting an adjustment results in a failed container creation request. Signed-off-by: Krisztian Litkey <[email protected]>
Implement a new builtin plugin type. Builtin plugins can be registered from within the runtime during instantiation of the runtime adaptation. We'll use a builtin plugin for default container adjustment validation. Signed-off-by: Krisztian Litkey <[email protected]>
Implement default (container creation/adjustment) validation as a builtin plugin. The default validator can be configured to reject OCI hook injection. Additionally, containers can be annotated with a set of required plugins. If annotated, these plugins must be present during container creation or else the creation of the container is rejected by the validator. Signed-off-by: Krisztian Litkey <[email protected]>
Provide an externally comilable standalone version of the default validator plugin. In addition to an illustrative purpose, this provides an easier way of experimenting and developing new validation features. Instead of having to recompile the runtime, one can simply modify and run the externally compiled version. Signed-off-by: Krisztian Litkey <[email protected]>
fce7ea9
to
8a54e4f
Compare
This patch series implements configurable restrictions using the proposed pluggable validation API and a builtin default validator plugin.
The series
There are a few rough edges I'd like to take a second look at, and the PR lacks the necessary documentation bits, therefore the it is marked as a draft. But I think it should be close enough already for an initial review round.
/cc @chrishenzie