Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network Configuration plan for 48 H100 Nodes, what goes where #1488

Open
msdisme opened this issue Mar 5, 2025 · 2 comments
Open

Network Configuration plan for 48 H100 Nodes, what goes where #1488

msdisme opened this issue Mar 5, 2025 · 2 comments
Assignees

Comments

@msdisme
Copy link

msdisme commented Mar 5, 2025

Motivation

We want the H100 nodes available as soon as possible. Having information about the necessary network configurations will enable us to configure the nodes as needed during their initial configuration.

Description

We have several different requirements for the H100 nodes as they are being initially configured. At a high level, we are considering one rack for Bare Metal usage and one rack for OpenShift configurations.

This issue is meant to capture specific variations so that they can be set up this way from the beginning, which will speed up their availability.

The information, per node, is expected to include:

  1. Planned Use
  2. Label for the use/cluster
  3. Do you need the interfaces aggregated or separated into 4 x 400 individual links?
  4. For Layer 2 stuff, what networks do you need between nodes if you have more than 1
  5. Provisioning the node - don’t think this will be ESI, so either give folks access to BMCs, or tell us what they want, and if we have it, we can provision

In this initial table rather then the node names I am using n0-n48. We will update that with node names.

Node Information Collection (Row 4, Pod C)

Cage 02

Node Name Planned Use Label for Use/Cluster Interface Aggregation (Aggregated / 4x400G) Layer 2 Networking Needs Provisioning Preference (BMC Access / Specific Provisioning Request) Additional Networking Modifications Needed
MOC-R4PCC02U03
MOC-R4PCC02U04
MOC-R4PCC02U05
MOC-R4PCC02U06
MOC-R4PCC02U09
MOC-R4PCC02U10
MOC-R4PCC02U11
MOC-R4PCC02U12
MOC-R4PCC02U15
MOC-R4PCC02U16
MOC-R4PCC02U17
MOC-R4PCC02U18
MOC-R4PCC02U23
MOC-R4PCC02U24
MOC-R4PCC02U25
MOC-R4PCC02U26
MOC-R4PCC02U29
MOC-R4PCC02U30
MOC-R4PCC02U31
MOC-R4PCC02U32
MOC-R4PCC02U35
MOC-R4PCC02U36
MOC-R4PCC02U37
MOC-R4PCC02U38

Cage 04

Node Name Planned Use Label for Use/Cluster Interface Aggregation (Aggregated / 4x400G) Layer 2 Networking Needs Provisioning Preference (BMC Access / Specific Provisioning Request) Additional Networking Modifications Needed
MOC-R4PCC04U03
MOC-R4PCC04U04
MOC-R4PCC04U05
MOC-R4PCC04U06
MOC-R4PCC04U09
MOC-R4PCC04U10
MOC-R4PCC04U11
MOC-R4PCC04U12
MOC-R4PCC04U15
MOC-R4PCC04U16
MOC-R4PCC04U17
MOC-R4PCC04U18
MOC-R4PCC04U23
MOC-R4PCC04U24
MOC-R4PCC04U25
MOC-R4PCC04U26
MOC-R4PCC04U29
MOC-R4PCC04U30
MOC-R4PCC04U31
MOC-R4PCC04U32
MOC-R4PCC04U35
MOC-R4PCC04U36
MOC-R4PCC04U37
MOC-R4PCC04U38

Completion criteria

Definition of Done

User Personas

List of Personas and characteristics

References

  1. document

Changelog

YYYY-MM-DD: Why you made the change

@msdisme msdisme changed the title Configuration plan for 48 H100 Nodes Network Configuration plan for 48 H100 Nodes Mar 5, 2025
@msdisme msdisme changed the title Network Configuration plan for 48 H100 Nodes Initial Network Configuration plan for 48 H100 Nodes Mar 5, 2025
@hakasapl
Copy link

@msdisme @joachimweyl I think this issue should turn into "what node is going where" rather than specific switchport configs as the initial switchport configs already exist now, just a matter of duplicating them on the right nodes. Can we gather this info and I can finish setting up the switch for the remaining nodes?

@joachimweyl joachimweyl changed the title Initial Network Configuration plan for 48 H100 Nodes Network Configuration plan for 48 H100 Nodes, what goes where Mar 19, 2025
@hakasapl
Copy link

hakasapl commented Apr 2, 2025

@msdisme @joachimweyl are we still interested in filling this out? It won't really help me at this point, but for logistics?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants