Skip to content

api, adaptation: implement pluggable validation API and a default validator plugin. #163

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

klihub
Copy link
Member

@klihub klihub commented Apr 21, 2025

This patch series implements configurable restrictions using the proposed pluggable validation API and a builtin default validator plugin.

The series

  • reworks conflict detection using protobuf messages to provide raw information for validation,
  • defines a validation API, implements validation in the runtime adaptation and the plugin stub,
  • implements builtin plugins,
  • adds a new builtin plugin for configurable default validation

There are a few rough edges I'd like to take a second look at, and the PR lacks the necessary documentation bits, therefore the it is marked as a draft. But I think it should be close enough already for an initial review round.

/cc @chrishenzie

@klihub klihub requested review from samuelkarp and mikebrow April 21, 2025 13:43
@klihub
Copy link
Member Author

klihub commented Apr 21, 2025

/cc @chrishenzie PTAL.

@klihub klihub force-pushed the devel/pluggable-validation-api branch 3 times, most recently from a8e4cb2 to 6243420 Compare April 21, 2025 14:10
Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very cool... like the builtin plugin approach to handle defaults...

think need to do a walk through but get the general gist and i like it.. nit turning off oci hooks heh.. but sure that should be configurable from the container runtime's nri config..

@klihub klihub force-pushed the devel/pluggable-validation-api branch from 6243420 to 5c283c0 Compare April 21, 2025 14:42
@klihub
Copy link
Member Author

klihub commented Apr 21, 2025

very cool... like the builtin plugin approach to handle defaults...

think need to do a walk through but get the general gist and i like it.. nit turning off oci hooks heh.. but sure that should be configurable from the container runtime's nri config..

I opted to not restrict OCI hook injection (in the default configuration), to provide backward-compatible behavior.


type DefaultValidatorConfig struct {
Enable bool `toml:"enable"` // enable default validator
RejectOCIHooks bool `toml:"reject_oci_hooks"` // reject OCI hook injection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also give the option to disable validating that required plugins are present?

Copy link
Member Author

@klihub klihub Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would that be useful ? I mean, this is already by definition an opt-in feature since you need to actively annotate your containers/pods for the 'processed by required plugins' check to take place. Wouldn't adding a separate configuration knob make this more error prone to use ? Now it's enough to ensure that validation is enabled and then opt-in for this check by annotating. With an extra knob, you'd also need to make sure that the extra knob is in the right position.

One related thing that we should probably consider is at least logging a warning or an error, if the related annotation is present but we are running with disabled validation.

Copy link
Member

@samuelkarp samuelkarp May 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: my comment applies to annotations, not to the DefaultValidatorConfig struct.

I think yes.

If we configure required plugins also as part of the container runtime configuration, an annotation that allows the required plugins to be bypassed would allow those required plugins to be deployed as containers themselves.

// Plugins that made the adjustments and updates.
OwningPlugins owners = 5;
// Plugins consulted for adjustments and updates.
repeated PluginInstance plugins = 6;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is a repeated field, should this be plugin?

Copy link
Member Author

@klihub klihub Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When translated to golang, that would become a Plugin []*PluginInstance, whereas now we have Plugins []*PluginInstance which I think is better.

But I think I know where your comment is stemming from, probably looking at update in the same message which is a slice too, but named in singular. We have been (grammatically) a bit inconsistent, though consistently inconsistent at that, with naming repeated fields. Container updates/evictions, although repeated/slices, are always named in singular. All other repeated fields, except env (for "environment") and the individual hook slices in Hooks (which follow the naming of the corresponding OCI Spec fields) are in plural. Probably not ideal, but there is some method to the madness.

@klihub klihub force-pushed the devel/pluggable-validation-api branch 2 times, most recently from 2c74b0a to d679a1f Compare April 25, 2025 14:32
@klihub klihub marked this pull request as ready for review April 25, 2025 16:56
@klihub klihub requested review from chrishenzie and mikebrow April 25, 2025 17:09
@klihub klihub force-pushed the devel/pluggable-validation-api branch 3 times, most recently from 6f67009 to 66ca9ed Compare April 27, 2025 08:54
Copy link
Member

@samuelkarp samuelkarp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only looked through the first couple of commits so far, but will continue to look later.

@klihub klihub force-pushed the devel/pluggable-validation-api branch 4 times, most recently from b55af0f to cbb4232 Compare April 30, 2025 14:59
pkg/api/strip.go Outdated
@@ -0,0 +1,343 @@
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is better, but I'm still concerned about drift between the proto and the Strip functions. I'm fine going ahead with this for now, but once we merge let's get an issue open to revisit.

One thought I had would be to write a protoc plugin (like protoc-gen-go-strip) to generate the code, and then we can assert that there's no drift. But I wouldn't consider that a prerequisite to merge.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it wouldn't even need to be a protoc plugin. A plain go:generate-helper to introspect the necessary types using reflect and then generate the necessary Strip()-functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to Sam's comment. For context, if we don't call Strip() are we then left with a bunch of tests that break because of [] vs. nil inequality? It seems like if that was the case, fixing those might prove simpler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it wouldn't even need to be a protoc plugin.

Yep, either. I just biased very slightly toward protoc plugins since we already have the plumbing to run them during proto generation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to Sam's comment. For context, if we don't call Strip() are we then left with a bunch of tests that break because of [] vs. nil inequality?

No. cmpopts.EquateEmpty() option alone is enough to take care of that for maps and slices. It is some of the struct types that we want to have special equality comparison semantics for, because of the way we initialize things internally when collecting plugin responses versus how we want to write expected results in our test cases. Basically deeply-fully initialized empty structs (and maps) with some additional fields eventually set versus minimally initialized structs with the same fields set, and we want to consider these equal.

@klihub klihub force-pushed the devel/pluggable-validation-api branch 2 times, most recently from 100699d to b76e9c1 Compare May 5, 2025 06:38
@klihub klihub force-pushed the devel/pluggable-validation-api branch from b76e9c1 to bafb6d9 Compare May 5, 2025 10:56
pkg/api/strip.go Outdated
@@ -0,0 +1,343 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to Sam's comment. For context, if we don't call Strip() are we then left with a bunch of tests that break because of [] vs. nil inequality? It seems like if that was the case, fixing those might prove simpler.

@klihub klihub force-pushed the devel/pluggable-validation-api branch 2 times, most recently from aa30021 to 59bac81 Compare May 7, 2025 11:27
Copy link
Member

@samuelkarp samuelkarp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this and bearing with me as I know it took a while for me to get back with comments. Some high level things:

  • Overall I'm very happy with the approach and I like the interaction model
  • @chrishenzie is going to share the design doc that we've been using publicly; thanks for collaborating on it with us
  • We can defer some of the enhancements (better generated Strip logic, parallel validator execution, etc) to subsequent PRs
  • I think this could easily be split into 3 separate PRs and that would make it easier to get parts in sooner
    1. The first two commits that cover changes to unit tests + protoizing conflict detection
    2. Core logic for the new validation interface, including the stub
    3. Default validating plugin (probably including the introduction of builtin plugins)
  • This should help since I think (1) and (2) above would be fairly easy once the relatively minor comments I have are addressed, while (3) would take a bit more time to get in line with the doc. Splitting up also helps me as a reviewer since I don't need to re-check the earlier commits once they're merged.

if len(r.plugins) > 0 {
log.Infof(noCtx, "plugin invocation order")
for i, p := range r.plugins {
log.Infof(noCtx, " #%d: %q (%s)", i+1, p.name(), p.qualifiedName())
}
}
if len(r.validators) > 0 {
log.Infof(noCtx, "validator plugin invocation order")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we were planning to invoke validators in parallel? (I don't think that has to be part of this PR, but I think we shouldn't promise a specific invocation order. Maybe we shouldn't sort these at all, or explicitly randomly sort them?)

// of plugins which must process containers of the pod during creation.
// If enabled, the default validator checks and rejects the creation
// of containers if they fail the check.
RequiredPluginsAnnotation = "required-plugins.nri.io"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we're using nri.io in other annotations already, but CNCF doesn't own that domain and it's currently priced at $17,488 private sale. Can we use nri.containerd.io or nri.containerd.dev instead?

Copy link
Member Author

@klihub klihub May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... I checked that out too, quite a while ago and decided to pass as it certainly didn't sound like the deal of the century.

We definitely can use either of those (I think one of Antonio's already merged PRs used one of those which I missed earlier but spotted last week). And I can update this PR similarly. I just wish we could find a domain which is not containerd-specific with such a spelled out emphasis. I feel a pang of guilt towards CRI-O when I consider naming it like that. NRI is supposed to be a common interface between the two runtimes. And I know that exactly with annotations there are already differences apparent to plugins. But this is a bit different. This is an annotation which is interpreted in core NRI and in that sense common.

Then again I am currently out of ideas as what could be a better usable alternative...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel a pang of guilt towards CRI-O when I consider naming it like that. NRI is supposed to be a common interface between the two runtimes.

That's fair. I think we should figure out something that is available and ask CNCF to acquire it for this purpose though, rather than reusing nri.io.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened a service desk ticket with CNCF to acquire something suitable. Will update here once we get it registered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CNCF has acquired noderesource.dev for us to use.

yaml "gopkg.in/yaml.v3"
)

type DefaultValidatorConfig struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the conversation in the doc, we should add something like require_plugins and bypass_validation_annotation here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, required_plugins I understand, it is the set of mandatory plugins required for all workloads/containers.

With bypass_validation_annotation, is your idea that it would configure the name of a (boolean) annotation key which then would be used to annotate workloads to 'tolerate' missing mandatory plugins, for instance for mandatory plugins when they are deployed as DaemonSets ?

I was planning to add a single well-known annotation (tolerate-missing-plugins.$OUR_NRI_ANNOTATION_KEY_DOMAIN)...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With bypass_validation_annotation, is your idea that it would configure the name of a (boolean) annotation key which then would be used to annotate workloads to 'tolerate' missing mandatory plugins, for instance for mandatory plugins when they are deployed as DaemonSets ?

Pretty much, yes. I'm not sure whether it'd be boolean or accept a list of plugins to bypass (I can see pros/cons of each).

I was planning to add a single well-known annotation (tolerate-missing-plugins.$OUR_NRI_ANNOTATION_KEY_DOMAIN)...

There are two reasons I was suggesting a configurable annotation vs. well-known:

  1. I think this should be empty/off by default, so that workloads cannot arbitrarily declare that they can bypass required plugins without any ability to decide which plugins can bypass.
  2. A custom annotation gives a cluster administrator or a cloud provider the opportunity to have better policy controls around annotations potentially without any additional work. For example, cloud providers may already have Kubernetes validating webhooks that restrict users from specifying annotations prefixed with cloud.example.com/.

My hope with this is that it would be relatively safe both for users to upgrade and for cluster admins or cloud providers to ensure that policy continues to be followed. If the bypass feature was enabled by default, workloads could bypass intended policy during an upgrade to a version of the container runtime that supports the bypass feature. If the annotation is well-known, it would require the cluster admin or cloud provider to already know the annotation name ahead of any clusters upgrading in order to support the bypass feature.

Regarding the bypass_validation_annotation, maybe it should be bypass_required_plugins_annotation instead? Since required_plugins could also include mutating plugins?

@klihub
Copy link
Member Author

klihub commented May 14, 2025

Thanks for working on this and bearing with me as I know it took a while for me to get back with comments.

No problemo. I am a happy camper if I can help kick this forward in any way. And I really appreciate the time and effort both you and @chrishenzie have put into all of this.

Some high level things:

  • Overall I'm very happy with the approach and I like the interaction model
  • @chrishenzie is going to share the design doc that we've been using publicly; thanks for collaborating on it with us
  • We can defer some of the enhancements (better generated Strip logic, parallel validator execution, etc) to subsequent PRs

Related to the necessity for Strip(), I think it is possible to eliminate it altogether. Now we need it, because during adjustment collection and conflict detection we 'deep-create' a full skeletal adjustment structure in the very beginning instead of creating it piece by piece only when and were necessary. (I can take a look once I've split up this PR).

  • I think this could easily be split into 3 separate PRs and that would make it easier to get parts in sooner

    1. The first two commits that cover changes to unit tests + protoizing conflict detection
    2. Core logic for the new validation interface, including the stub
    3. Default validating plugin (probably including the introduction of builtin plugins)
  • This should help since I think (1) and (2) above would be fairly easy once the relatively minor comments I have are addressed, while (3) would take a bit more time to get in line with the doc. Splitting up also helps me as a reviewer since I don't need to re-check the earlier commits once they're merged.

Okay, I'll split this one up to a stacked set of 3 PRs according to the outline above.

@klihub
Copy link
Member Author

klihub commented May 14, 2025

This PR has been split up to a stacked set of PRs: #169, #170, and #171.

I will close this one once I've addressed all the existing review comments in the split up stack.

@chrishenzie
Copy link
Contributor

See here for the public version of the design doc.

klihub added 4 commits May 16, 2025 14:29
Implement pluggable container adjustment validation. When validator
plugins are present, use them to validate the collected adjustments,
failing container creation if any validation fails.

For adjustment validation plugins receive the pod, the pristing un-
adjusted container, the collected container adjustments, information
about which plugins adjusted what container parameters, and the list
of plugins consulted for the adjustments. The plugin can then choose
to accept or reject the adjustments.

Accepting or rejecting adjustments are transactional. Either all or
none of the adjustments are accepted, together with the container
creation request. IOW, rejecting an adjustment results in a failed
container creation request.

Signed-off-by: Krisztian Litkey <[email protected]>
Implement a new builtin plugin type. Builtin plugins can be registered
from within the runtime during instantiation of the runtime adaptation.
We'll use a builtin plugin for default container adjustment validation.

Signed-off-by: Krisztian Litkey <[email protected]>
Implement default (container creation/adjustment) validation
as a builtin plugin. The default validator can be configured
to reject OCI hook injection. Additionally, containers can be
annotated with a set of required plugins. If annotated, these
plugins must be present during container creation or else the
creation of the container is rejected by the validator.

Signed-off-by: Krisztian Litkey <[email protected]>
Provide an externally comilable standalone version of the
default validator plugin. In addition to an illustrative
purpose, this provides an easier way of experimenting and
developing new validation features. Instead of having to
recompile the runtime, one can simply modify and run the
externally compiled version.

Signed-off-by: Krisztian Litkey <[email protected]>
@klihub klihub force-pushed the devel/pluggable-validation-api branch from fce7ea9 to 8a54e4f Compare May 16, 2025 16:20
@klihub klihub marked this pull request as draft May 16, 2025 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants