Skip to content

Commit 2f7f807

Browse files
authored
Allow up to 254 vCPUs to a VM (#9385)
This follows on turning the crank to max vCPUs in Helios and Propolis; if the hardware has so many vCPUs available, what's to stop someone from allocating them all for a single VM? Similar to creating a VM requiring more memory than is available, one can create (or resize) a VM into a size that is much larger than any hardware has, or is available at runtime. Attempting to run such an instance will error because the instance can't get placed. One could imagine a future operator control to limit max VM sizes for a silo; larger VMs get more difficult to migrate, can be more difficult to place. Without something like "anti-fragmentation" to group smaller VMs together it's quite possible that a sled could have 255 CPUs, 2 vCPUs for one small VM, 253 CPUs not spoken for, and unable to fit a 254 vCPU VM. Further, 254 busy vCPUs leaves zero to one CPUs available for Propolis, driving emulated hardware, processing I/O, co-located Crucible, sled-agent, other services, etc. There is no mechanism to earmark CPUs for control plane and I/O purposes, so this isn't any worse than the status quo. But when such a mechanism comes to exist, we'll need to gracefully tolerate prior existence of sled-or-larger-size VMs. Note that Helios is fine with being asked to oversubscribe hardware threads to vCPUs, and that's how I'd tested that a 254-vCPU VM works reasonably (on a 32-thread CPU). `test_cannot_provision_instance_beyond_cpu_capacity` is the demonstration that the control plane isn't willing to oversubscribe hardware in practice. (Dan pointed out to me a bit ago that we *could* allow 255 vCPUs - my choice of 254 on the Helios side was really a fencepost error on my part. But I'd like to disallow odd vCPU counts in the first place, related to [Propolis#940](oxidecomputer/propolis#940), so 254 is fine.)
1 parent d5a1c81 commit 2f7f807

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

nexus/src/app/mod.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,7 @@ pub(crate) const MAX_EXTERNAL_IPS_PER_INSTANCE: usize =
132132
as usize;
133133
pub(crate) const MAX_EPHEMERAL_IPS_PER_INSTANCE: usize = 1;
134134

135-
pub const MAX_VCPU_PER_INSTANCE: u16 = 64;
135+
pub const MAX_VCPU_PER_INSTANCE: u16 = 254;
136136

137137
pub const MIN_MEMORY_BYTES_PER_INSTANCE: u32 = 1 << 30; // 1 GiB
138138
// This is larger than total memory (let alone reservoir) on some sleds; it is

0 commit comments

Comments
 (0)