PendingMgsUpdates does not impl Serialize/Deserialize correctly#7992
PendingMgsUpdates does not impl Serialize/Deserialize correctly#7992davepacheco merged 3 commits intomainfrom
Conversation
|
I should add: I believe there's zero risk with this change because there is no code in Omicron today that constructs a blueprint with a non-empty Besides what's in this PR, I also needed these two patches for my testing. The first patch makes diff --git a/dev-tools/reconfigurator-exec-unsafe/src/main.rs b/dev-tools/reconfigurator-exec-unsafe/src/main.rs
index cba59993e..c8abfec5e 100644
--- a/dev-tools/reconfigurator-exec-unsafe/src/main.rs
+++ b/dev-tools/reconfigurator-exec-unsafe/src/main.rs
@@ -109,12 +109,7 @@ impl ReconfiguratorExec {
info!(&log, "setting up database pool");
let pool = Arc::new(db::Pool::new(&log, &qorb_resolver));
- let datastore = Arc::new(
- DataStore::new_failfast(&log, pool)
- .await
- .context("creating datastore")?,
- );
-
+ let datastore = Arc::new(DataStore::new_unchecked(log.clone(), pool));
let result = self.do_exec(log, &datastore, &qorb_resolver).await;
datastore.terminate().await;
resultThis is dangerous in general but harmless in my case because this isn't doing anything with the changed part of the schema. And this patch is what I used to have diff --git a/nexus/reconfigurator/planning/src/blueprint_builder/builder.rs b/nexus/reconfigurator/planning/src/blueprint_builder/builder.rs
index 4780a2aef..7ce561cdf 100644
--- a/nexus/reconfigurator/planning/src/blueprint_builder/builder.rs
+++ b/nexus/reconfigurator/planning/src/blueprint_builder/builder.rs
@@ -83,7 +83,16 @@ use thiserror::Error;
use super::ClickhouseZonesThatShouldBeRunning;
use super::clickhouse::ClickhouseAllocator;
+use gateway_client::types::SpType;
+use nexus_types::deployment::ExpectedVersion;
+use nexus_types::deployment::PendingMgsUpdate;
+use nexus_types::deployment::PendingMgsUpdateDetails;
use nexus_types::deployment::PendingMgsUpdates;
+use nexus_types::inventory::BaseboardId;
+use std::sync::Arc;
+use tufaceous_artifact::ArtifactHashId;
+use tufaceous_artifact::ArtifactKind;
+use tufaceous_artifact::KnownArtifactKind;
/// Errors encountered while assembling blueprints
#[derive(Debug, Error)]
@@ -452,10 +461,33 @@ impl<'a> BlueprintBuilder<'a> {
.collect::<BTreeMap<_, _>>();
let num_sleds = sleds.len();
+ let mut pending_mgs_updates = PendingMgsUpdates::new();
+ let baseboard_id = Arc::new(BaseboardId {
+ part_number: String::from("913-0000019"),
+ serial_number: String::from("BRM27230037"),
+ });
+ let artifact_hash_id = ArtifactHashId {
+ kind: ArtifactKind::from_known(KnownArtifactKind::GimletSp),
+ hash: "47266ede81e13f5f1e36623ea8dd963842606b783397e4809a9a5f0bda0f8170".parse().unwrap(),
+ };
+ let update = PendingMgsUpdate {
+ baseboard_id,
+ sp_type: SpType::Sled,
+ slot_id: 15,
+ details: PendingMgsUpdateDetails::Sp {
+ expected_active_version: "1.0.36".parse().unwrap(),
+ expected_inactive_version: ExpectedVersion::Version(
+ "1.0.36".parse().unwrap(),
+ ),
+ },
+ artifact_hash_id,
+ artifact_version: "1.0.34".parse().unwrap(),
+ };
+ pending_mgs_updates.insert(update);
Blueprint {
id: rng.next_blueprint(),
sleds,
- pending_mgs_updates: PendingMgsUpdates::new(),
+ pending_mgs_updates,
parent_blueprint_id: None,
internal_dns_version: Generation::new(),
external_dns_version: Generation::new(),
and it serializes like this: {
"pending_mgs_updates": [
{
"baseboard_id": {
"part_number": "913-0000019",
"serial_number": "BRM27230037"
},
"sp_type": "sled",
"slot_id": 15,
"details": {
"component": "sp",
"expected_active_version": "1.0.36",
"expected_inactive_version": {
"kind": "version",
"version": "1.0.36"
}
},
"artifact_hash_id": {
"kind": "gimlet_sp",
"hash": "47266ede81e13f5f1e36623ea8dd963842606b783397e4809a9a5f0bda0f8170"
},
"artifact_version": "1.0.34"
}
],
}
|
| creator: OmicronZoneUuid::from_untyped_uuid(creator), | ||
| sender, | ||
| mgs_updates, | ||
| mgs_updates: mgs_updates.clone(), |
There was a problem hiding this comment.
This was needed because the driver shuts down when the sender side of its watch channel closes. We don't want that to happen until later, now at L299 (in the new code).
There was a problem hiding this comment.
could you add a comment here and in drop below?
| creator: OmicronZoneUuid::from_untyped_uuid(creator), | ||
| sender, | ||
| mgs_updates, | ||
| mgs_updates: mgs_updates.clone(), |
There was a problem hiding this comment.
could you add a comment here and in drop below?
| { | ||
| let mut map = PendingMgsUpdates::new(); | ||
| while let Some(u) = seq.next_element()? { | ||
| map.insert(u); |
There was a problem hiding this comment.
Is it legal for there to be duplicates?
There was a problem hiding this comment.
Good question. In practice:
- on the serialize side, this is impossible because IdMap uses a BTreeMap under the hood
- in JSON, this is possible but the behavior is unspecified
- on the deserialize side, we will ignore all entries having the same baseboard id except for the last one
I think this is about the same behavior as all the other maps that we have.
jmpesp
left a comment
There was a problem hiding this comment.
Looks fine, but we should add some unit tests 🚀
| loop { | ||
| let status = status_rx.borrow(); | ||
| debug!(&log, "MGS update status"; "status" => ?status); | ||
| info!(&log, "MGS update status"; "status" => ?status); |
There was a problem hiding this comment.
should these be changed back to debug?
There was a problem hiding this comment.
Good question but I don't think so. The intent was to print these every few seconds after execution finishes until the updates are complete. Otherwise you don't really know what's going on with the updates.
Agreed on having tests. These will be a lot easier once we do the |
Well, I realized how easy it would be to add a smoke test, so I just did it 😄 |
This PR fixes a few problems that I ran into when I went to go test #7978 using
reconfigurator-exec-unsafe.My basic plan was:
omdb reconfigurator exportto save the reconfigurator state to a filereconfigurator-clito export the system's target blueprint from that filePendingMgsUpdatesthat contains an example update to one of the sled SPsreconfigurator-exec-unsafeto execute the blueprintThe main problem I ran into was constructing a serialized
PendingMgsUpdatescontaining the update that I wanted to do. Since we don't have reconfigurator-cli support for this property yet, I wrote some code to do this. (I'll put that into a comment below for posterity.) But when I went to serialize it I got:The problem is that
PendingMgsUpdatesdidn't implSerializecorrectly. The derived impl tries to serialize a map whose keys (Arc<BaseboardId>) are objects. This is not allowed in JSON.This PR impls
Serialize/Deserializeby hand as a sequence ofPendingMgsUpdateobjects. This is closer to the spirit ofIdMapanyway.This also fixes a few small things in
reconfigurator-exec-unsafethat made it a little harder to test.With this PR, I'm able to complete the above test and verify that using
reconfigurator-exec-unsafe, we complete an SP update through blueprint execution 🎉 (This still won't work in Nexus until we do the database implementation of this field.)