Skip to content

Include linux device cgroups in adjustment logic #168

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

karlbaumg
Copy link

@karlbaumg karlbaumg commented May 11, 2025

This is a very useful feature for cases where we'd like to give wildcard device permissions to the cgroup, e.g. --device-cgroup-rule 'c *:* rwm' which is possible in both Podman and Docker.

Found out the hard way that Linux.Resources.Devices are not processed at all.

If I understood the claim logic correctly, it is to prevent multiple plugins from overriding each other. But the cgroup rules array is a layered permission list, e.g. it starts with a deny all rule and allow rules are appended so different plugins may append different rules. That's why I didn't add a claimDeviceCgroup but happy to change.

I've validated this change with containerd 1.7.27 that has this patch on top of NRI 0.8.0.

@karlbaumg karlbaumg changed the title createContainer: include linux device cgroups in adjustment logic Include linux device cgroups in adjustment logic May 11, 2025
@klihub
Copy link
Member

klihub commented May 12, 2025

Found out the hard way that Linux.Resources.Devices are not processed at all.

For some context/background... Those device permission rules in Linux.Resources.Devices were one of the last additions to the revamped NRI API (v0.2.0 at the time), and added merely to allow near-identical emulation of the original v0.1.0 NRI API through the plugins/v010-adapter plugin. They are only input to a plugin at the moment.

That's why we don't have any corresponding Set/Add/Remove functions defined for manipulating them in pkg/api/adjustment.go and we don't process the rules when adjusting the OCI Spec (although we do add explicit entries there for any devices injected through NRI). And that is why the device rules are not listed as adjustable container parameters in the documentation(, although I have to admit that it alone is not a reliable indicator as the docs are not fully accurate yet).

If I understood the claim logic correctly, it is to prevent multiple plugins from overriding each other. But the cgroup rules array is a layered permission list, e.g. it starts with a deny all rule and allow rules are appended so different plugins may append different rules.

Yes, in current HEAD the claims are only used for conflict detection. That is however changing with the pending introduction of pluggable validation (#163) , where the claims, which are also used to record which plugins made what changes, are passed on to any registered validating plugins, which then decide whether such changes are accepted or rejected and this decision can be based on which plugins made the changes.

That's why I didn't add a claimDeviceCgroup but happy to change.

Can you tell a bit more about what you are trying to do (basically your 'use case') ? Things like

  • do you now manually adjust the cgroup device rules in resources ?
  • how do your containers gain the extra device nodes (not via NRI injection because then you would also get the corresponding necessary access rules) ?
    • bind mount ?
    • mknod in the containers ?

If we extend NRI to allow direct manipulation of cgroup device access rules, keeping the general patterns of NRI and #163 in mind, I think we'd need at least

  • think through what could be a proper model for adjusting cgroup device rules (independently of adjusting devices):
    • removing is easy, but since
    • order matters, how to add (without overcomplicating), prepend/append, something else, etc...
  • add proper adjustment functions based on what we decide
  • always record in the claims which plugins touched what rules

@karlbaumg
Copy link
Author

Can you tell a bit more about what you are trying to do (basically your 'use case') ?

Sure. I'm running a program that creates new devices when it starts whose minor can't be known beforehand, hence we pass in --device-cgroup-rule 'c 238:* rwm' with podman or docker. But there is no equivalent in Kubernetes' Pod API so we implemented an NRI plugin that adds those rules. Given the existence of the flag in both, I don't think dynamically created devices is a rare use case which makes sense because you don't have to do the bookkeeping of devices when you do that.

how do your containers gain the extra device nodes

Using ioctl on binderfs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants