-
Notifications
You must be signed in to change notification settings - Fork 170
OCPBUGS-62013, OCPBUGS-61742: DownStream Merge [11-19-2025] #2864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Mykola Yurchenko <[email protected]>
Signed-off-by: Mykola Yurchenko <[email protected]>
Signed-off-by: Mykola Yurchenko <[email protected]>
- not to update the DPU connection status annotation on pod if pod is to be deleted - return failure if Acl log meter failed to be created Signed-off-by: Yun Zhou <[email protected]>
Signed-off-by: Mykola Yurchenko <[email protected]>
Remove --subresource=status from ovnkube.sh get_node_zone
Signed-off-by: Patryk Matuszak <[email protected]>
Fixes issues were introduced by adb1fc8 The core problem is that with the change to move the networkID from nodes to NADs, the upgrade logic left a gap in time where pods could not start. This because cluster-manager was responsible for migrating the networkID from the node->NAD, and in our upgrade strategy, workers upgrade before control plane nodes. This would leave worker nodes in a state where new OVNK code was running that was only looking for the networkID on the NAD, but it had not yet been migrated. This patch changes the behavior so that any NAD Controller (zone, node, or cluster manager) will attempt at start up to find NADs that are missing networkIDs and search nodes for the legacy values. Node and Zone NAD controllers will fallback to the legacy ID, but will not annotate the NAD. Cluster manager will also use the legacy ID and update the NAD with it. Unit tests added to cover the different scenarios. Signed-off-by: Tim Rozet <[email protected]>
Fixes NAD Controller syncAll for networkID upgrade from node->NAD
When a namespace/pod/EgressIP label update causes it to move from one EIP to another,
the EIP controller may process the associated EIPs in an order that leads
to incorrect assignment behavior.
Example Scenario:
1. Two EIPs exist:
* eip1 matches namespace label test: qe
* eip2 matches namespace label test: dev
2. Namespace ns1 initially has label test: dev and is served by eip2.
3. The label on ns1 is updated from test: dev to test: qe.
4. The EIP controller processes the Namespace update event:
Step 1: eip1 is processed first but skips assignment since the pod is already
served by eip2.
* In reconcileEgressIPNamespace, eip1 is processed first and matches the new
Namespace object.
* It invokes addNamespaceEgressIPAssignments → addPodEgressIPAssignments for
the pod, detect that eip2 (not yet processed) is still serving the pod, adds
eip1 to podState.standbyEgressIPNames, and returns.
Step 2: eip2 is processed next, matches the old Namespace object, and deletes
the pod from the assignment cache.
* In deleteNamespaceEgressIPAssignment → deletePodEgressIPAssignments, it cleans
up OVN entries (LRP, SNAT, and address sets) and removes the pod’s status entry
from podAssignment cache.
As a result, ns1 is no longer assigned to eip1
Fix:
When eip2 is processed in the deletePodEgressIPAssignments method, promote eip1
from standby EgressIP to active.
The same issue might also occur during Pod label updates or EgressIP selector
label updates (Namespace or Pod selector).
Added unit tests to cover Namespace, Pod, and EgressIP label update scenarios
to reproduce the issue and verify the fix.
Signed-off-by: Periyasamy Palanisamy <[email protected]>
With multiple subnets for a network, only the first one was being used for cluster subnet exclusion. Signed-off-by: Tim Rozet <[email protected]>
miscellaneous fixes
Refresh PID when calling OVS/OVN binaries
Fixes regression with ecabc89
|
/ok-to-test |
|
@openshift-pr-manager[bot]: This pull request explicitly references no jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@openshift-pr-manager[bot]: trigger 5 job(s) of type blocking for the ci release of OCP 4.21
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/985871b0-c53f-11f0-95d3-3c26d7486622-0 trigger 13 job(s) of type blocking for the nightly release of OCP 4.21
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/985871b0-c53f-11f0-95d3-3c26d7486622-1 |
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: openshift-pr-manager[bot] The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retitle OCPBUGS-62013, OCPBUGS-61742: DownStream Merge [11-19-2025] |
|
@openshift-pr-manager[bot]: This pull request references Jira Issue OCPBUGS-62013, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. This pull request references Jira Issue OCPBUGS-61742, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest |
|
/payload-job periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade |
|
@pperiyasamy: trigger 4 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7cf90190-c612-11f0-9593-c177a4ba179c-0 |
|
@openshift-pr-manager[bot]: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/retest /payload-job periodic-ci-openshift-release-master-ci-4.21-e2e-aws-upgrade-ovn-single-node /payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ipi-ovn-bm |
|
@jluhrsen: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/70dedc20-c68c-11f0-98cb-d7913ab95c7d-0 |
|
we should otherwise, CI is just not super healthy on the failing jobs. doesn't look like anything to do with this PR and the failures across the jobs are not common. trying again: /retest |
|
@jluhrsen: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/ac1aa910-c743-11f0-8023-39e6977616bb-0 |
|
/test 4.21-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade |
Automated merge of upstream/master → master.