-
Notifications
You must be signed in to change notification settings - Fork 60
Support different types of Disk #9274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
In the same way that a higher level type was added to SiloUser and SiloGroup, add a type to Disks. Currently this is only Crucible, but that will change imminently to include local storage. The main driver for this change was for different types of disks to occupy the same PCI BDF space. Having a higher level "Disk" that includes the runtime state for a disk (with information like what instance it's attached to, what slot it occupies) be separate from where the disk's blocks are coming from achieves this without any complicated database or diesel magic. Call sites that used the Disk db model now take either a Disk datastore type, or directly take a CrucibleDisk (for functions that should only be performed on Crucible disks). Several sagas have changed to only accept CrucibleDisk in the parameters as well. In this commit there's also a small change to how the instance spec is computed with respect to the storage that provides a new option for instances to have file backed storage. This is currently unused but the framework for creating different components depending on this new Disk type is there. Other stuff in this commit: - fix a bug where `project_create_disk` would insert a disk into the database, and _then_ perform some checks. If this failed in the context of a saga it would leave behind the inserted disk. - delete `disk_refetch`, fulfilling a `TODO-cleanup`. similarly, `disk_set_runtime` is deleted. both of these will need to be revisited for hotplug, but the whole redesign of how sled agent and nexus work with VMMs obviates the old code.
nexus/db-model/src/disk.rs
Outdated
| device_path, | ||
| } | ||
| pub fn slot(&self) -> Option<u8> { | ||
| self.slot.map(|x| x.into()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style nit, considered unimportant: could be
| self.slot.map(|x| x.into()) | |
| self.slot.map(Into::into) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in f4425c6
| Crucible => b"crucible" | ||
| ); | ||
|
|
||
| /// A Disk, where how the blocks are stored depend on the disk_type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be worthwhile saying something here (or on the disk_type field?) that data specific to the particular disk type is stored in another table, depending on the value of disk_type. Maybe we also ought to explicitly list those tables here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, done in 227268d
| pub fn pantry_address(&self) -> Option<SocketAddrV6> { | ||
| self.pantry_address.as_ref().map(|x| x.parse().unwrap()) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize that this was here before, and we haven't changed this code, but, is the rationale for storing this as a string rather than using Postgres' inet type plus a port is that we want to enforce that this is only ever an IPv6 address?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe so yeah, it's been a while.
| #[derive(Debug, Clone, Serialize, Deserialize)] | ||
| pub enum Disk { | ||
| Crucible(CrucibleDisk), | ||
| } | ||
|
|
||
| impl Disk { | ||
| pub fn model(&self) -> &model::Disk { | ||
| match &self { | ||
| Disk::Crucible(disk) => disk.model(), | ||
| } | ||
| } | ||
|
|
||
| pub fn id(&self) -> Uuid { | ||
| self.model().id() | ||
| } | ||
|
|
||
| pub fn name(&self) -> &api::external::Name { | ||
| self.model().name() | ||
| } | ||
|
|
||
| pub fn time_deleted(&self) -> Option<DateTime<Utc>> { | ||
| self.model().time_deleted() | ||
| } | ||
|
|
||
| pub fn project_id(&self) -> Uuid { | ||
| self.model().project_id | ||
| } | ||
|
|
||
| pub fn runtime(&self) -> DiskRuntimeState { | ||
| self.model().runtime() | ||
| } | ||
|
|
||
| pub fn state(&self) -> DiskState { | ||
| self.model().state() | ||
| } | ||
|
|
||
| pub fn size(&self) -> model::ByteCount { | ||
| self.model().size | ||
| } | ||
|
|
||
| pub fn slot(&self) -> Option<u8> { | ||
| self.model().slot() | ||
| } | ||
|
|
||
| pub fn block_size(&self) -> model::BlockSize { | ||
| self.model().block_size | ||
| } | ||
| } | ||
|
|
||
| #[derive(Debug, Clone, Serialize, Deserialize)] | ||
| pub struct CrucibleDisk { | ||
| pub disk: model::Disk, | ||
| pub disk_type_crucible: DiskTypeCrucible, | ||
| } | ||
|
|
||
| impl CrucibleDisk { | ||
| pub fn model(&self) -> &model::Disk { | ||
| &self.disk | ||
| } | ||
|
|
||
| pub fn id(&self) -> Uuid { | ||
| self.model().id() | ||
| } | ||
|
|
||
| pub fn name(&self) -> &api::external::Name { | ||
| self.model().name() | ||
| } | ||
|
|
||
| pub fn time_deleted(&self) -> Option<DateTime<Utc>> { | ||
| self.model().time_deleted() | ||
| } | ||
|
|
||
| pub fn project_id(&self) -> Uuid { | ||
| self.model().project_id | ||
| } | ||
|
|
||
| pub fn runtime(&self) -> DiskRuntimeState { | ||
| self.model().runtime() | ||
| } | ||
|
|
||
| pub fn state(&self) -> DiskState { | ||
| self.model().state() | ||
| } | ||
|
|
||
| pub fn size(&self) -> model::ByteCount { | ||
| self.model().size | ||
| } | ||
|
|
||
| pub fn slot(&self) -> Option<u8> { | ||
| self.model().slot() | ||
| } | ||
|
|
||
| pub fn volume_id(&self) -> VolumeUuid { | ||
| self.disk_type_crucible.volume_id() | ||
| } | ||
|
|
||
| pub fn pantry_address(&self) -> Option<SocketAddrV6> { | ||
| self.disk_type_crucible.pantry_address() | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this representation is interesting. It seems like storing the model within the types that represent different disk types and providing an accessor for the shared model results in a certain amount of boilerplate that has to be duplicated into every new disk type.
Is there a reason we've arrived at this representation rather than one where the shared disk model is present on the type that represents every kind of disk, like:
pub struct Disk {
model: model::Disk,
type_specific: DiskType,
}
enum DiskType {
Crucible(model::DiskTypeCrucible),
}Or, is this intended to allow us to always have the cross-model state accessible regardless of the disk type?
If we need that, might also consider a representation that's like:
pub struct Disk<T> {
pub model: model::Disk,
pub type_specific: T,
}so that the shared code need not be repeated every time?
For what it's worth, I'm certainly not trying to argue that the way you've done it here is wrong, just curious about what led us to settle on this way of representing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shook out this way because I wanted not to have to deal with any Options (beyond what the actual db model fields have), have things like volume_id and pantry_address only be defined for a CrucibleDisk, and have the ability for functions to accept the specific type of CrucibleDisk instead of Disk. There is a regrettable amount of copy and paste of that boilerplate though.
| Ok(disk) | ||
| } | ||
|
|
||
| /// Return all the Crucible Disks matching a list of volume ids. Currently |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
| /// Return all the Crucible Disks matching a list of volume ids. Currently | |
| /// Return all the Crucible Disks matching a list of volume IDs. Currently |
though i presume this was already there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| if self.slot_usage.contains(&slot) { | ||
| return Err(Error::internal_error(&format!( | ||
| "disk PCI slot {slot} used more than once", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should this mention which disk we were trying to attach in that slot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // XXX here is where we would restrict the type of disk used for | ||
| // a boot disk. should we? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using a file-backed disk as the boot device would mean that the instance can never be restarted anywhere else, and is perma-wasted if the physical disk that boot device file was on goes away...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, using any local storage restricts the VMM to only ever appearing on one sled. I'm inclined to let users select local storage as a boot volume, but for it to work they would need to:
- boot first from a Crucible volume made from an image or snapshot
- prepare the local storage disk they want to use by installing an OS and/or formatting how ever they want
- then stop the instance, detach the crucible volume, start the instance again, selecting it as a boot disk
It's not impossible to do, just a bit annoying. Maybe possible to automate with Packer? Coming back to this after a few days, I can't come up with a reason for restricting boot disk type.
| CREATE TYPE IF NOT EXISTS omicron.public.disk_type AS ENUM ( | ||
| 'crucible' | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be nice to have a comment here saying that this indicates which other table to query in order to look up disk-type-specific information...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the new comment for disk_type in nexus/db-model/src/disk.rs suffice? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It occurs to me that we might be able to reduce some of the boilerplate in code that queries the database for specific disk types if we made a SQL view for each disk type that's defined like
CREATE VIEW IF NOT EXISTS
omicron.public.crucible_disk
AS SELECT
-- ...
FROM
omicron.public.disk
INNER JOIN
omicron.public.disk_type_crucible
ON
disk.id = disk_type_crucible.disk_id
WHERE
disk.type = ' crucible'or whatever?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hah, I did this exact thing first! When asking about it, we decided it was better that Nexus be aware of the different underlying tables rather than trying to mask that here. I also couldn't insert into a view so the disk slot allocation query didn't work.
| * table? | ||
| */ | ||
| /* Runtime state */ | ||
| -- disk_state omicron.public.DiskState NOT NULL, /* TODO see above */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this line now references "TODO see above" and there is no longer an "above" comment. should we just delete this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this above comment this is referencing starts on line 1451?
In the same way that a higher level type was added to SiloUser and SiloGroup, add a type to Disks. Currently this is only Crucible, but that will change imminently to include local storage.
The main driver for this change was for different types of disks to occupy the same PCI BDF space. Having a higher level "Disk" that includes the runtime state for a disk (with information like what instance it's attached to, what slot it occupies) be separate from where the disk's blocks are coming from achieves this without any complicated database or diesel magic.
Call sites that used the Disk db model now take either a Disk datastore type, or directly take a CrucibleDisk (for functions that should only be performed on Crucible disks). Several sagas have changed to only accept CrucibleDisk in the parameters as well.
In this commit there's also a small change to how the instance spec is computed with respect to the storage that provides a new option for instances to have file backed storage. This is currently unused but the framework for creating different components depending on this new Disk type is there.
Other stuff in this commit:
fix a bug where
project_create_diskwould insert a disk into the database, and then perform some checks. If this failed in the context of a saga it would leave behind the inserted disk.delete
disk_refetch, fulfilling aTODO-cleanup. similarly,disk_set_runtimeis deleted. both of these will need to be revisited for hotplug, but the whole redesign of how sled agent and nexus work with VMMs obviates the old code.