-
Notifications
You must be signed in to change notification settings - Fork 622
Description
Search before asking
- I had searched in the issues and found no similar feature requirement.
Description
I’d like to propose an enhancement to the KubeRay operator that would be a step forward for users in production and enterprise environments. The idea is to add some additional security in the form of pod isolation and port configuration.
This would be a Kubernetes-native solution, and importantly, it will be an optional feature controlled by a flag, so everyone can continue using KubeRay as they are and adopt this when it makes sense for them.
We would introduce a more granular approach to how Ray clusters are secured and monitored. The core of this is:
-
Granular Access Control: Instead of a simple “all traffic allowed” setup, we’d add more specific rules for head and worker pods. This would restrict communication to only what’s necessary.
-
Readiness for any future mTLS work: The update would also prepare KubeRay for mTLS-enabled clusters by securing specific ports, making it easier for users to adopt advanced security features down the road.
-
Monitoring Support: Configuration is present to allow any prometheus in the cluster access to scrape metrics.
-
Production Ready Template: Allows users in a production environment a simple feature flag toggle to achieve a more strict networking approach.
-
Retains flexibility if needed: In the event that a user still wants a high level of flexibility, the WIP implements a rule where pods with a matching label can still communicate freely with the head and worker pods.
All in all, it attempts to give platform providers a mechanism to switch on some of the recommendations here, thus negating the need for manual configuration.
Let me know what you all think! We're happy to dive into the technical details and answer any questions you might have.
Use case
For our customers and our own internal use-cases, network security is obviously a major concern. Given the ephemeral nature of RayClusters and RayJobs, we would see great benefit in having access to an optional feature that would lifecycle networkpolicies with the resources themselves. This would provide a simple, templated option for network security hardening, and handle the cleanup of these resources.
Related issues
No response
Are you willing to submit a PR?
- Yes I am willing to submit a PR!