Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Infiniband counters visible inside worker pods #403

Open
rdjjke opened this issue Feb 6, 2025 · 5 comments
Open

Make Infiniband counters visible inside worker pods #403

rdjjke opened this issue Feb 6, 2025 · 5 comments

Comments

@rdjjke
Copy link
Collaborator

rdjjke commented Feb 6, 2025

Soperator pods currently use hostNetwork: false. Because of that Infiniband counters are not visible containers' sysfs (/sys/class/infiniband/mlx5_0/ports/1/counters).

To make them visible there, we can try to install the RDMA CNI plugin to K8s.

@rdjjke rdjjke added this to Soperator Feb 6, 2025
@Uburro
Copy link
Collaborator

Uburro commented Feb 7, 2025

@thien-lm
Copy link

thien-lm commented Feb 9, 2025

Hi @rdjjke @Uburro
does this make slurmd pod can not use infiniband ? i am wondering.

@asteny
Copy link
Collaborator

asteny commented Feb 10, 2025

@thien-lm No, pod can use infiniband, everything fine. It's just about counters in /sys/class/infiniband/mlx5_0/ports/1/counters. That's useful metric, but it's not visible inside pod container.

@Uburro
Copy link
Collaborator

Uburro commented Feb 10, 2025

@thien-lm We do not support RDMA-CNI yet, as well as some features from this CNI. For example, the scheduling spec in requests and, as @asteny mentioned, the visibility of certain parameters.

@thien-lm
Copy link

Thanks for your explaination.
as @Uburro said, soperator currently does not support RDMA-CNI, so slurm worker pod can not use RDMA feature, am i right ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

4 participants