Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit cluster access #219

Merged
merged 3 commits into from
Jan 7, 2025
Merged

Conversation

guptaNswati
Copy link
Contributor

@guptaNswati guptaNswati commented Dec 12, 2024

I am not sure if validatingadmissionpolicy is required right now. And we may add more rules like only allow updates when driver is gpu.nvidia.com, something like

object.spec.driver == 'gpu.nvidia.com'

I did a quick test with the original implementation done here https://github.com/kubernetes/kubernetes/blob/4aeaf1e99e82da8334c0d6dddd848a194cd44b4f/test/e2e/dra/test-driver/deploy/example/plugin-permissions.yaml#L38

1 resourceslicecontroller.go:313] "processing ResourceSlice objects" err="update resource slice: resourceslices.resource.k8s.io \"nvidia-dra-driver-k8s-dra-driver-controller-7f78c745f6-ngnjlfwt\" is forbidden: ValidatingAdmissionPolicy 'resourceslices-policy-nvidia-dra-driver-k8s-dra-driver' with binding 'resourceslices-policy-nvidia-dra-driver-k8s-dra-driver' denied request: this user running on node 'xxxxxxxxx' may not modify cluster resourceslices" logger="UnhandledError"

@guptaNswati guptaNswati requested a review from klueska December 12, 2024 02:12
@klueska
Copy link
Collaborator

klueska commented Dec 12, 2024

We do need it, but the rule for the controller needs to be slightly different since it does not create a node-local resourceslice but rather a resourceslice associated with a group of nodes.

@klueska
Copy link
Collaborator

klueska commented Dec 12, 2024

Also, the DCO is failing because you didn't sign your commits.

@guptaNswati
Copy link
Contributor Author

@bart0sh @pohly what is the best way to add admission policy for a pool of nodes and not just a single node. The upstream work exemplifies node level validation https://github.com/kubernetes/kubernetes/blob/4aeaf1e99e82da8334c0d6dddd848a194cd44b4f/test/e2e/dra/test-driver/deploy/example/plugin-permissions.yaml#L41

Why is this even needed, help me understand.

Our resource slice for the imex controller has these fields

kubectl explain resourceslice.spec
GROUP:      resource.k8s.io
KIND:       ResourceSlice
VERSION:    v1alpha3

FIELD: spec <ResourceSliceSpec>


DESCRIPTION:
    Contains the information published by the driver.
    
    Changing the spec automatically increments the metadata.generation number.
    ResourceSliceSpec contains the information published by the driver in one
    ResourceSlice.
    
FIELDS:
  allNodes	<boolean>
    AllNodes indicates that all nodes have access to the resources in the pool.
    
    Exactly one of NodeName, NodeSelector and AllNodes must be set.

  devices	<[]Device>
    Devices lists some or all of the devices in this pool.
    
    Must not have more than 128 entries.

  driver	<string> -required-
    Driver identifies the DRA driver providing the capacity information. A field
    selector can be used to list only ResourceSlice objects with a certain
    driver name.
    
    Must be a DNS subdomain and should end with a DNS domain owned by the vendor
    of the driver. This field is immutable.

  nodeName	<string>
    NodeName identifies the node which provides the resources in this pool. A
    field selector can be used to list only ResourceSlice objects belonging to a
    certain node.
    
    This field can be used to limit access from nodes to ResourceSlices with the
    same node name. It also indicates to autoscalers that adding new nodes of
    the same type as some old node might also make new resources available.
    
    Exactly one of NodeName, NodeSelector and AllNodes must be set. This field
    is immutable.

  nodeSelector	<NodeSelector>
    NodeSelector defines which nodes have access to the resources in the pool,
    when that pool is not limited to a single node.
    
    Must use exactly one term.
    
    Exactly one of NodeName, NodeSelector and AllNodes must be set.

  pool	<ResourcePool> -required-
    Pool describes the pool that this ResourceSlice belongs to.

And this is what i modified it to. I dint find any direct way to get the node label in validatingadmissionpolicyspec. You can only request node name with request.userInfo.extra[?'[authentication.kubernetes.io/node-name']0] and it does not allow complex querying.

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicy
metadata:
  name: resourceslices-policy-nvidia-dra-driver-k8s-dra-driver
spec:
  failurePolicy: Fail
  matchConstraints:
    resourceRules:
    - apiGroups: ["resource.k8s.io"]
      apiVersions: ["v1alpha3"]
      operations: ["CREATE", "UPDATE"]
      resources: ["resourceslices"]
  variables:
  - name: nodeSelectorValue
    expression: >-
      (request.operation == "DELETE" ? oldObject : object).spec.nodeSelector.nodeSelectorTerms[0].matchExpressions[0].values[0].orValue("")
  - name: poolName
    expression: >-
      (request.operation == "DELETE" ? oldObject : object).spec.pool.name.orValue("")
  validations:
  - expression: variables.nodeSelectorValue == variables.poolName
    messageExpression: >-
      "The nodeSelector value '"+variables.nodeSelectorValue+"' does not match the pool name '"+variables.poolName+"'."

Should there be any driver name validation in addition to the pool of nodes.

Thank you for your time.

@guptaNswati guptaNswati force-pushed the limit-cluster-access branch from 89abbec to 8b7a9d1 Compare January 6, 2025 22:16
@klueska
Copy link
Collaborator

klueska commented Jan 7, 2025

Looks good to me at first glance. I'll leave the final approval to @cdesiniotis though, since he reviewed things more thoroghly.

@guptaNswati guptaNswati merged commit 572730b into NVIDIA:main Jan 7, 2025
6 checks passed
@guptaNswati guptaNswati self-assigned this Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants