Skip to content

Conversation

@DavidHurta
Copy link
Contributor

@DavidHurta DavidHurta commented Oct 30, 2025

availableupdates: Unify capitalization when comparing architectures
Otherwise, the following condition in cincinnati.go in MultiArch clusters:

    if desiredArch == string(configv1.ClusterVersionArchitectureMulti) && currentArch != desiredArch {
        return current, []configv1.Release{current}, nil, nil
    }

gets evaluated to:

    if "Multi" == string(configv1.ClusterVersionArchitectureMulti) && "multi" != "Multi" {
        return current, []configv1.Release{current}, nil, nil
    }

This will cause MultiArch clusters with a set non-empty
ClusterVersion.Spec.DesiredUpdate.Architecture field to indefinitely
have available updates set to []configv1.Release{current} because
the "multi" != "Multi" logic will always match.


availableupdates: Refactor getting desired/current architecture

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 30, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 30, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: DavidHurta

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 30, 2025
@DavidHurta DavidHurta changed the title WIP: Unify capitalization when comparing architectures WIP: AvailableUpdates: Unify capitalization when comparing architectures Oct 30, 2025
@DavidHurta
Copy link
Contributor Author

Installing a 4.18 cluster (My attempts for a multi 4.21 with expanded logging were futile):

launch 4.18.10 aws

Available updates are as expected:

$ oc adm upgrade
Cluster version is 4.18.10

Upstream: https://api.integration.openshift.com/api/upgrades_info/graph
Channel: candidate-4.18 (available channels: candidate-4.18, candidate-4.19, candidate-4.20, eus-4.18, fast-4.18, fast-4.19, fast-4.20, stable-4.18, stable-4.19)

Recommended updates:

  VERSION     IMAGE
  4.18.27     quay.io/openshift-release-dev/ocp-release@sha256:4686c8d26194f890c2a241271d41a762d4be26af0be60e9cfd0c563f61b3beab
  4.18.21     quay.io/openshift-release-dev/ocp-release@sha256:9d1b107adad76f023493b8c2b74902639f66273cc120e255454ad447a9ef27d9
  4.18.20     quay.io/openshift-release-dev/ocp-release@sha256:5e06105a6ba80d04eb5d8d3f9a672fb743ce4710876d99a375c2d9f7b7eaa783
  4.18.19     quay.io/openshift-release-dev/ocp-release@sha256:e6d80b9ab85b17b47e90cb8de1b9ad0e3fe457780148629d329d532ef902d222
  4.18.18     quay.io/openshift-release-dev/ocp-release@sha256:eca2e3f7de2bd92b18f69547c8f0ad842fdb83f0821f76b8692f2716a86b0bde

Additional updates which are not recommended, or where the recommended status is "Unknown", for your cluster configuration are available, to view those re-run the command with --include-not-recommended.

Migrate to multi-arch:

$ oc adm upgrade --to-multi-arch
Requested update to multi cluster architecture

Wait for the migration to complete... The available updates still only show the current version.

$ oc adm upgrade
Cluster version is 4.18.10

Upstream: https://api.integration.openshift.com/api/upgrades_info/graph
Channel: candidate-4.18 (available channels: candidate-4.18, candidate-4.19, candidate-4.20, eus-4.18, fast-4.18, fast-4.19, fast-4.20, stable-4.18, stable-4.19)

Recommended updates:

  VERSION     IMAGE
  4.18.10     quay.io/openshift-release-dev/ocp-release@sha256:0e63f36129991fe6fa25112ab12a56110d1660f21f3e2582d290f7820a3479d2

Running a local CVO with the PR's changes proceeds to fix the available updates:

$ oc adm upgrade 
info: An upgrade is in progress. Working towards 4.18.10: 509 of 901 done (56% complete)

Upgradeable=False

  Reason: PoolUpdating
  Message: Cluster operator machine-config should not be upgraded between minor versions: One or more machine config pools are updating, please see `oc get mcp` for further details

Upstream: https://api.integration.openshift.com/api/upgrades_info/graph
Channel: candidate-4.18

Recommended updates:

  VERSION     IMAGE
  4.18.27     quay.io/openshift-release-dev/ocp-release@sha256:edcedfff11a2f0a1dead9d818b3ee6cfd0d152c23537ca703237d40c007ac207
  4.18.21     quay.io/openshift-release-dev/ocp-release@sha256:7df246db514d1648bb62d6d9dfd2cdfa25e20e0d6fe457a263c85dd0a81a227a
  4.18.20     quay.io/openshift-release-dev/ocp-release@sha256:c78c1eafc0c4ea46a83e6fea53aed8272259415a3686281b3cb866242d9181f5
  4.18.19     quay.io/openshift-release-dev/ocp-release@sha256:a0d938131d24ad872c4d865885189f395c64fbf0c09a9c361cb497142ffa331c
  4.18.18     quay.io/openshift-release-dev/ocp-release@sha256:fd6c7ba61d366321e1bcbb21395e3e6a2aa00443e42df5c25a877c7ee28e4d69

Additional updates which are not recommended, or where the recommended status is "Unknown", for your cluster configuration are available, to view those re-run the command with --include-not-recommended.

even with a set spec.desiredUpdate "Multi" update:

$ oc get clusterversions.config.openshift.io -ojson | jq .items[0].spec
{
  "channel": "candidate-4.18",
  "clusterID": "eca1fe4a-d1f1-481b-a017-feed8926c977",
  "desiredUpdate": {
    "architecture": "Multi",
    "force": false,
    "image": "",
    "version": "4.18.10"
  },
  "upstream": "https://api.integration.openshift.com/api/upgrades_info/graph"
}

@DavidHurta
Copy link
Contributor Author

Running the introduced tests without the changes in the availableupdates.go file:

=== RUN   TestSyncAvailableUpdatesMultiArchAfterMigration
I1031 18:43:21.389557   48480 availableupdates.go:68] First attempt to retrieve available updates
I1031 18:43:21.389754   48480 cincinnati.go:392] parsed metadata: URL "", architecture "", channels [], errors []
    availableupdates_test.go:472: available updates differ from expected:
          &cvo.availableUpdates{
          	... // 2 ignored and 3 identical fields
          	Current: {Version: "4.5.5", Image: "payload/4.5.5"},
        - 	Updates: nil,
        + 	Updates: []v1.Release{{Version: "4.5.5", Image: "payload/4.5.5"}},
        - 	ConditionalUpdates: []v1.ConditionalUpdate{
        - 		{
        - 			Release:    v1.Release{Version: "4.5.6", Image: "payload/4.5.6"},
        - 			Risks:      []v1.ConditionalUpdateRisk{{...}},
        - 			Conditions: []v1.Condition{{...}},
        - 		},
        - 	},
        + 	ConditionalUpdates: nil,
          	... // 1 ignored and 1 identical fields
          }
--- FAIL: TestSyncAvailableUpdatesMultiArchAfterMigration (0.00s)
=== RUN   TestSyncAvailableUpdatesMultiArchAfterMigrationDesiredUpdateNil
I1031 18:43:21.389944   48480 availableupdates.go:68] First attempt to retrieve available updates
I1031 18:43:21.390161   48480 cincinnati.go:392] parsed metadata: URL "", architecture "", channels [], errors []
I1031 18:43:21.390168   48480 cincinnati.go:392] parsed metadata: URL "", architecture "", channels [], errors []
    availableupdates_test.go:495: available updates differ from expected:
          &cvo.availableUpdates{
          	UpdateService: "http://127.0.0.1:36915",
          	Channel:       "channel",
        - 	Architecture:  "Multi",
        + 	Architecture:  "multi",
          	... // 3 ignored and 4 identical fields
          }
--- FAIL: TestSyncAvailableUpdatesMultiArchAfterMigrationDesiredUpdateNil (0.00s)

The second test is interesting as it highlights that the wrong multi value was being placed into the optr.availableUpdates.Architecture field. However, I did not find any significant impact of such an issue.

Otherwise, the following condition in cincinnati.go in MultiArch clusters:

```
    if desiredArch == string(configv1.ClusterVersionArchitectureMulti) && currentArch != desiredArch {
        return current, []configv1.Release{current}, nil, nil
    }
```

gets evaluated to:

```
    if "Multi" == string(configv1.ClusterVersionArchitectureMulti) && "multi" != "Multi" {
        return current, []configv1.Release{current}, nil, nil
    }
```

This will cause MultiArch clusters with a set non-empty
`ClusterVersion.Spec.DesiredUpdate.Architecture` field to indefinitely
have available updates set to `[]configv1.Release{current}` because
the `"multi" != "Multi"` logic will always match.
The commit does change the default behaviour of the
`getDesiredArchitecture` method. However, the method is only used once
in the `syncAvailableUpdates` method and nowhere else. The commit adds
the subsequent logic for evaluating a desired architecture to the method
itself and implements the `getCurrentArchitecture` method.

The goal is to introduces "getters" for such values where the
unified capitalization is enforced, and their return values are of the
same nature (e.g., "Multi", "amd64", ...).
@DavidHurta DavidHurta changed the title WIP: AvailableUpdates: Unify capitalization when comparing architectures OCPBUGS-57646: Unify capitalization when comparing architectures for available updates Oct 31, 2025
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 31, 2025
@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Oct 31, 2025
@openshift-ci-robot
Copy link
Contributor

@DavidHurta: This pull request references Jira Issue OCPBUGS-57646, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @jiajliu

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from jiajliu October 31, 2025 18:09
@openshift-ci-robot
Copy link
Contributor

@DavidHurta: This pull request references Jira Issue OCPBUGS-57646, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @jiajliu

In response to this:

availableupdates: Unify capitalization when comparing architectures
Otherwise, the following condition in cincinnati.go in MultiArch clusters:

   if desiredArch == string(configv1.ClusterVersionArchitectureMulti) && currentArch != desiredArch {
       return current, []configv1.Release{current}, nil, nil
   }

gets evaluated to:

   if "Multi" == string(configv1.ClusterVersionArchitectureMulti) && "multi" != "Multi" {
       return current, []configv1.Release{current}, nil, nil
   }

This will cause MultiArch clusters with a set non-empty
ClusterVersion.Spec.DesiredUpdate.Architecture field to indefinitely
have available updates set to []configv1.Release{current} because
the "multi" != "Multi" logic will always match.


availableupdates: Refactor getting desired/current architecture

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

currentArch := runtime.GOARCH

if optr.release.Architecture == configv1.ClusterVersionArchitectureMulti {
currentArch = "multi"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this downcased form for Cincinnati, right? Here's Multi returning nothing, while multi returns 4.20.0:

$ curl -s 'https://api.openshift.com/api/upgrades_info/graph?channel=stable-4.20&arch=Multi' | jq .nodes
null
$ curl -s 'https://api.openshift.com/api/upgrades_info/graph?channel=stable-4.20&arch=multi' | jq -c '[.nodes[] | .version]'
["4.20.0"]

Copy link
Contributor Author

@DavidHurta DavidHurta Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, however, the currentArch variable is only used for the following comparison:

    if desiredArch == string(configv1.ClusterVersionArchitectureMulti) && currentArch != desiredArch {
        return current, []configv1.Release{current}, nil, nil
    }

We use the desiredArch for the query parameter, "which is downcased" before creating the query parameters at:

releaseArch := desiredArch
if desiredArch == string(configv1.ClusterVersionArchitectureMulti) {
releaseArch = "multi"
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to check in multi-arch CI: 4.21 nightly multi CI -> 4.21.0-0.nightly-multi-2025-10-30-034235 -> but both e2e-ovn-serial-aws-multi-a-a-1of2 and e2e-ovn-serial-aws-multi-a-a-2of2 are failing to bootstrap.

Moving back to 4.20.0-0.nightly-multi: 4.20.0-0.nightly-multi-2025-10-30-035942 -> e2e-ovn-serial-aws-multi-a-a-1of2 -> Artifacts -> ... -> e2e artifacts:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-multiarch-master-nightly-4.20-ocp-e2e-ovn-serial-aws-multi-a-a-1of2/1983747935866720256/artifacts/ocp-e2e-ovn-serial-aws-multi-a-a/openshift-e2e-test/artifacts/e2e.log | grep 'upgrade recommend'
started: 0/4/45 "[Serial][sig-cli] oc adm upgrade recommend When the update service has conditional recommendations runs successfully with conditional recommendations to the --version target [Suite:openshift/conformance/serial]"
passed: (26.9s) 2025-10-30T05:58:16 "[Serial][sig-cli] oc adm upgrade recommend When the update service has conditional recommendations runs successfully with conditional recommendations to the --version target [Suite:openshift/conformance/serial]"
started: 0/24/45 "[Serial][sig-cli] oc adm upgrade recommend When the update service has no recommendations runs successfully [Suite:openshift/conformance/serial]"
passed: (22s) 2025-10-30T06:22:53 "[Serial][sig-cli] oc adm upgrade recommend When the update service has no recommendations runs successfully [Suite:openshift/conformance/serial]"

which means this test logic is happy about the CVO showing the recommended update to the sha256:cccc... release. Checking gather-extra pod logs:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-multiarch-master-nightly-4.20-ocp-e2e-ovn-serial-aws-multi-a-a-1of2/1983747935866720256/artifacts/ocp-e2e-ovn-serial-aws-multi-a-a/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-f89bdbd46-sln5x_cluster-version-operator.log | grep '1030 05:5[78].* \(cincinnati\|availableupdates\)'
I1030 05:57:50.548607       1 availableupdates.go:83] Retrieving available updates again, because more than 2m35.488910548s has elapsed since last attempt at 2025-10-30T05:54:37Z
I1030 05:57:56.533407       1 availableupdates.go:77] Retrieving available updates again, because the channel has changed from "" to "test-channel"
I1030 05:57:56.537343       1 cincinnati.go:114] Using a root CA pool with 0 root CA subjects to request updates from https://api.openshift.com/api/upgrades_info/v1/graph?arch=multi&channel=test-channel&id=021db8b4-7df4-4870-a035-24fc93f70d05&version=4.20.0-0.nightly-multi-2025-10-29-210751
I1030 05:57:56.800389       1 availableupdates.go:398] Update service https://api.openshift.com/api/upgrades_info/v1/graph could not return available updates: VersionNotFound: currently reconciling cluster version 4.20.0-0.nightly-multi-2025-10-29-210751 not found in the "test-channel" channel
I1030 05:57:56.811990       1 availableupdates.go:98] Available updates were recently retrieved, with less than 2m35.488910548s elapsed since 2025-10-30T05:57:56Z, will try later.
I1030 05:57:57.042846       1 availableupdates.go:103] Retrieving available updates again, because the update service has changed from "" to "http://172.30.166.91:8000/graph" from ClusterVersion spec.upstream
I1030 05:57:57.045682       1 cincinnati.go:114] Using a root CA pool with 0 root CA subjects to request updates from http://172.30.166.91:8000/graph?arch=multi&channel=test-channel&id=021db8b4-7df4-4870-a035-24fc93f70d05&version=4.20.0-0.nightly-multi-2025-10-29-210751
I1030 05:57:57.101065       1 availableupdates.go:98] Available updates were recently retrieved, with less than 2m35.488910548s elapsed since 2025-10-30T05:57:57Z, will try later.
I1030 05:58:06.331303       1 availableupdates.go:98] Available updates were recently retrieved, with less than 2m35.488910548s elapsed since 2025-10-30T05:57:57Z, will try later.
I1030 05:58:16.883450       1 availableupdates.go:103] Retrieving available updates again, because the update service has changed from "http://172.30.166.91:8000/graph" to "https://api.openshift.com/api/upgrades_info/v1/graph" from the operator's default update service
I1030 05:58:16.886723       1 cincinnati.go:114] Using a root CA pool with 0 root CA subjects to request updates from https://api.openshift.com/api/upgrades_info/v1/graph?arch=multi&channel=test-channel&id=021db8b4-7df4-4870-a035-24fc93f70d05&version=4.20.0-0.nightly-multi-2025-10-29-210751
I1030 05:58:17.152154       1 availableupdates.go:398] Update service https://api.openshift.com/api/upgrades_info/v1/graph could not return available updates: VersionNotFound: currently reconciling cluster version 4.20.0-0.nightly-multi-2025-10-29-210751 not found in the "test-channel" channel
I1030 05:58:17.164073       1 availableupdates.go:98] Available updates were recently retrieved, with less than 2m35.488910548s elapsed since 2025-10-30T05:58:17Z, will try later.

well, I can see arch=multi in there, and it's not talking about VersionNotFound when pulling from the e2e test's 172.30.166.91. Would be nice if it logged this single->multi transition branch. Ah here is cincinnati.go doing the Multi -> multi downcasing, so that's one bit I'd missed earlier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, the multi CI is passing because those tests aren't setting spec.desired.architecture: Multi. We should grow some CI that exercises that.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 31, 2025

@DavidHurta: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@wking
Copy link
Member

wking commented Nov 7, 2025

/cc

@openshift-ci openshift-ci bot requested a review from wking November 7, 2025 15:41
}

func (optr *Operator) getDesiredArchitecture(update *configv1.Update) string {
if update != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be if update != nil && len(update.Architecture) > 0? Otherwise, we could return an empty string here, and we'd rather fall back to getCurrentArchitecture, because your syncAvailableUpdates change is dropping the old:

if desiredArch == "" {
	desiredArch = currentArch
}

}
}

func TestSyncAvailableUpdatesMultiArchAfterMigrationDesiredUpdateNil(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: looks like this could be collapsed with TestSyncAvailableUpdatesMultiArchAfterMigration into a single TestSyncAvailableUpdatesMultiArch function that iterated over (test-case name, desired, expected) tuples, to reduce duplication, and make it easy to add a test-case for desired set, but architecture unset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants