[Feature] Add NetworkPolicy support

### Search before asking

- [x] I had searched in the [issues](https://github.com/ray-project/kuberay/issues) and found no similar feature requirement.


### Description

I’d like to propose an enhancement to the KubeRay operator that would be a step forward for users in production and enterprise environments. The idea is to add some additional security in the form of pod isolation and port configuration.
This would be a Kubernetes-native solution, and importantly, it will be an optional feature controlled by a flag, so everyone can continue using KubeRay as they are and adopt this when it makes sense for them.

We would introduce a more granular approach to how Ray clusters are secured and monitored. The core of this is:

* **Granular Access Control**: Instead of a simple “all traffic allowed” setup, we’d add more specific rules for head and worker pods. This would restrict communication to only what’s necessary.

* **Readiness for any future mTLS work**: The update would also prepare KubeRay for mTLS-enabled clusters by securing specific ports, making it easier for users to adopt advanced security features down the road.

* **Monitoring Support**: Configuration is present to allow any prometheus in the cluster access to scrape metrics.

* **Production Ready Template**: Allows users in a production environment a simple feature flag toggle to achieve a more strict networking approach.

* **Retains flexibility if needed**: In the event that a user still wants a high level of flexibility, the WIP implements a rule where pods with a matching label can still communicate freely with the head and worker pods.

All in all, it attempts to give platform providers a mechanism to switch on some of the recommendations [here](https://docs.ray.io/en/latest/ray-security/index.html#deploy-ray-clusters-in-a-controlled-network-environment), thus negating the need for manual configuration. 

Let me know what you all think! We're happy to dive into the technical details and answer any questions you might have.

### Use case

For our customers and our own internal use-cases, network security is obviously a major concern. Given the ephemeral nature of RayClusters and RayJobs, we would see great benefit in having access to an optional feature that would lifecycle networkpolicies with the resources themselves. This would provide a simple, templated option for network security hardening, and handle the cleanup of these resources.  

### Related issues

_No response_

### Are you willing to submit a PR?

- [x] Yes I am willing to submit a PR!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Add NetworkPolicy support #3987

Search before asking

Description

Use case

Related issues

Are you willing to submit a PR?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Add NetworkPolicy support #3987

Description

Search before asking

Description

Use case

Related issues

Are you willing to submit a PR?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions