You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We want the H100 nodes available as soon as possible. Having information about the necessary network configurations will enable us to configure the nodes as needed during their initial configuration.
Description
We have several different requirements for the H100 nodes as they are being initially configured. At a high level, we are considering one rack for Bare Metal usage and one rack for OpenShift configurations.
This issue is meant to capture specific variations so that they can be set up this way from the beginning, which will speed up their availability.
The information, per node, is expected to include:
Planned Use
Label for the use/cluster
Do you need the interfaces aggregated or separated into 4 x 400 individual links?
For Layer 2 stuff, what networks do you need between nodes if you have more than 1
Provisioning the node - don’t think this will be ESI, so either give folks access to BMCs, or tell us what they want, and if we have it, we can provision
In this initial table rather then the node names I am using n0-n48. We will update that with node names.
Node Information Collection (Row 4, Pod C)
Cage 02
Node Name
Planned Use
Label for Use/Cluster
Interface Aggregation (Aggregated / 4x400G)
Layer 2 Networking Needs
Provisioning Preference (BMC Access / Specific Provisioning Request)
Additional Networking Modifications Needed
MOC-R4PCC02U03
MOC-R4PCC02U04
MOC-R4PCC02U05
MOC-R4PCC02U06
MOC-R4PCC02U09
MOC-R4PCC02U10
MOC-R4PCC02U11
MOC-R4PCC02U12
MOC-R4PCC02U15
MOC-R4PCC02U16
MOC-R4PCC02U17
MOC-R4PCC02U18
MOC-R4PCC02U23
MOC-R4PCC02U24
MOC-R4PCC02U25
MOC-R4PCC02U26
MOC-R4PCC02U29
MOC-R4PCC02U30
MOC-R4PCC02U31
MOC-R4PCC02U32
MOC-R4PCC02U35
MOC-R4PCC02U36
MOC-R4PCC02U37
MOC-R4PCC02U38
Cage 04
Node Name
Planned Use
Label for Use/Cluster
Interface Aggregation (Aggregated / 4x400G)
Layer 2 Networking Needs
Provisioning Preference (BMC Access / Specific Provisioning Request)
@msdisme@joachimweyl I think this issue should turn into "what node is going where" rather than specific switchport configs as the initial switchport configs already exist now, just a matter of duplicating them on the right nodes. Can we gather this info and I can finish setting up the switch for the remaining nodes?
joachimweyl
changed the title
Initial Network Configuration plan for 48 H100 Nodes
Network Configuration plan for 48 H100 Nodes, what goes where
Mar 19, 2025
Motivation
We want the H100 nodes available as soon as possible. Having information about the necessary network configurations will enable us to configure the nodes as needed during their initial configuration.
Description
We have several different requirements for the H100 nodes as they are being initially configured. At a high level, we are considering one rack for Bare Metal usage and one rack for OpenShift configurations.
This issue is meant to capture specific variations so that they can be set up this way from the beginning, which will speed up their availability.
The information, per node, is expected to include:
In this initial table rather then the node names I am using n0-n48. We will update that with node names.
Node Information Collection (Row 4, Pod C)
Cage 02
Cage 04
Completion criteria
Definition of Done
User Personas
List of Personas and characteristics
References
Changelog
YYYY-MM-DD: Why you made the change
The text was updated successfully, but these errors were encountered: