[WIP] Implement NetworkPolicy support #3986
Open
+1,487
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why are these changes needed?
Pods within a namespace can communicate freely by default. Exposing the Ray head via an Ingress or Route without a NetworkPolicy allows any pod in the cluster to access the Ray Dashboard and API. Managing NetworkPolicy objects manually is cumbersome and error-prone. This additional controller lifecycles a NetworkPolicy per RayCluster ensuring that only relevant pods can communicate. It is behind a feature flag in this PR to ensure no disruption to users.
This is WIP to serve as a demonstration of how I might approach this.
Verification Steps
Setup
ray-operator
directory.ray-operator/config/manager/manager.yaml
to point to our new image and to enable the feature-flag. You can do this by adjusting the--feature-gates
on line 32 to include the network policy flag like below:imagePullPolicy
to never (this ensures that it falls back to the local image rather than pulling from a registry). You can do this by making the follow adjustment to line 34 and 35:kubectl get pods | grep kuberay-operator
raycluster-kuberay-head
andraycluster-kuberay-worker
. You can view the NetworkPolicy by running the below command:Rule Verification
Rule 1
kubectl run test-rule1-intra --image=busybox --rm -i --restart=Never -n default --labels="ray.io/cluster=raycluster-kuberay" -- timeout 5 nc -zv raycluster-kuberay-head-svc 6379
kubectl run test-rule1-metrics --image=busybox --rm -i --restart=Never -n default --labels="ray.io/cluster=raycluster-kuberay" -- timeout 5 nc -zv raycluster-kuberay-head-svc 8080
kubectl run test-rule1-intra --image=busybox --rm -i --restart=Never -n default --labels="ray.io/cluster=raycluster-kuberay" -- timeout 5 nc -zv raycluster-kuberay-head-svc 6379
Rule 2
Rule 3
app.kubernetes.io/name=kuberay
label. We can verify this by running the below command.kubectl run test-rule3-operator --image=busybox --rm -i --restart=Never -n default --labels="app.kubernetes.io/name=kuberay" -- timeout 5 nc -zv raycluster-kuberay-head-svc 8265
kubectl run test-cross-ns-operator --image=busybox --rm -i --restart=Never -n default --labels="app.kubernetes.io/name=kuberay" -- timeout 5 nc -zv raycluster-kuberay-head-svc.test-namespace 8265
Rule 4
kubectl create namespace openshift-monitoring && kubectl create namespace prometheus
Rule 5
Related issue number
Closes #3987
Checks