Add multi-attach error to RWO volumes #161

alifelan · 2025-10-22T06:00:59Z

What this PR does / why we need it:
When a volume attachment cannot be deleted, KCM will mark the volume as uncertain. An uncertain volume is not considered attach to the node, and multi-attach errors will only be triggered if the volume is confirmed as attached.

This leads to a problem with KubeVirt CSI that expands. If a volume cannot be released from the previous VM (as in, the newest hotplug pod that releases the volume cannot start), KubeVirt CSI will timeout waiting for the operation. Once the timeout happens, KCM marks the volume as uncertain, and creates a new volume attachment for the new node. Worth noting here: The previous volume attachment is still in the cluster, it has an error, and it also has a deletion timestamp.
When KubeVirt CSI starts reconciling on the second volume attachment, it will issue a new virt addvolume command. The hotplug pod gets created, but it is never able to start due to a multi-attach error. This ends up impacting all of the future volumes that get attached (until the problem gets addressed). Not only that, but any "released" volume from the problematic VM will propagate the error to any new VM that receives that volume.

This scenario is not ideal, and while there was a PR for it a year ago, it was not merged.

The CSI Specification denotes an error that should be returned by the plugin if the volume is already attached to a different node, with a PreconditionFailed error. This is what this PR achieves.

While possible to add a map, having a loop over the VirtualMachineInstance * VolumeStatus is performant enough for the KubeVirt CSI Driver. A test (albeit, created without a real API server) with 10,000 VirtualMachineInstances with 40 VolumeStatus each took ~9ms.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:
This was built on top of #160, however, this change is independent from it

Release note:

The KubeVirt CSI Driver now checks for multi-attach errors on RWO volumes.

kubevirt-bot · 2025-10-22T06:01:06Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign davidvossel for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kubevirt-bot · 2025-10-22T06:01:10Z

Hi @alifelan. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Due to an issue with Kubelet, a second volume attachment may be created for an RWO volume if the first volume attachment is undergoing deletion. Once the initial operation times out, the volume is marked as uncertain in the Actual State of the World, and the multi-attach check does not prevent a second node for starting the attachment. This ends up causing a problem with the KubeVirt CSI Driver. This new volume won't be attached until its released from the previous VM, and any new and unrelated volumes that we try to attach to this new VM will fail since the hotplug pod is stuck in ContainerCreating due to the multi-attach error. Here, we introduce an initial check for volumes that are RWO (or, as in the code, non-RWX) where we iterate through the available Virtual Machine Instances, and see if our current volume is still in the VolumeStatus of any of them. Signed-off-by: Alí Felán <[email protected]>

awels · 2025-10-22T12:07:41Z

/test all
Thanks for the PR, it is on the list for me to take a look at.

awels

Looks pretty good, just a few questions.

awels · 2025-10-22T18:26:51Z

pkg/service/controller.go

+// Returns:
+//   - bool: True if the capability represents an RWX volume.
+//   - bool: True if the capability represents an RWO volume.
+func getCapabiltyAccessMode(cap *csi.VolumeCapability_AccessMode) (isRWX bool, isRWO bool) {


I would probably switch this to

func hasRWXCapabiltyAccessMode(cap *csi.VolumeCapability_AccessMode) (bool, error) { switch cap.GetMode() { ...RWO CAPS return false, nil ...RWX CAPS return true, nil } return false, fmt.Errorf("unknown volume capability") }

Since we don't seem to care about the isRWO bool, and the error will allow us to detect invalid capabilities.

Sure, on it. I'd like to mention that we do care of isRWO when doing our checks in ControllerPublishVolume, but we can do a !isRWX. The returned error here will also bubble up through getAccessMode, which will now return (bool, bool, error)

awels · 2025-10-22T18:47:49Z

pkg/service/controller.go

+		for _, volumeStatus := range vmi.Status.VolumeStatus {
+			// If the name in the status matches our PVC name, it means the volume
+			// is actively attached to this other VMI.
+			if volumeStatus.Name == dvName {


Do we want to be a little more explicit. The volume could be in the process of unplugging, or won't that matter in this case?

I don't think we want to be more specific since we want to cover a volume in the unplugging process. The error we had seen comes from a volumeStatus in Detaching state, so its being unplugged, but it cannot be released because the new hotplug pod has not started successfully.

From the VirtualMachineInstance, we can see a volume shows in the status with the following updates when detaching:

- hotplugVolume: attachPodName: hp-volume-5f2jw attachPodUID: f6523b31-2a7f-4dbf-865d-8fc1f4108efb message: Successfully attach hotplugged volume pvc-6c23f224-bc1a-489f-9104-1762cd9cbbff to VM name: pvc-6c23f224-bc1a-489f-9104-1762cd9cbbff persistentVolumeClaimInfo: accessModes: - ReadWriteOnce capacity: storage: 1Gi claimName: pvc-6c23f224-bc1a-489f-9104-1762cd9cbbff filesystemOverhead: "0" requests: storage: "1073741824" volumeMode: Block phase: Ready reason: VolumeReady target: sdb --- - hotplugVolume: attachPodName: hp-volume-5f2jw attachPodUID: f6523b31-2a7f-4dbf-865d-8fc1f4108efb message: Successfully attach hotplugged volume pvc-6c23f224-bc1a-489f-9104-1762cd9cbbff to VM name: pvc-6c23f224-bc1a-489f-9104-1762cd9cbbff persistentVolumeClaimInfo: accessModes: - ReadWriteOnce capacity: storage: 1Gi claimName: pvc-6c23f224-bc1a-489f-9104-1762cd9cbbff filesystemOverhead: "0" requests: storage: "1073741824" volumeMode: Block phase: Detaching reason: VolumeReady target: sdb --- - hotplugVolume: attachPodName: hp-volume-5f2jw attachPodUID: f6523b31-2a7f-4dbf-865d-8fc1f4108efb message: Deleted hotplug attachment pod hp-volume-5f2jw, for volume pvc-6c23f224-bc1a-489f-9104-1762cd9cbbff name: pvc-6c23f224-bc1a-489f-9104-1762cd9cbbff persistentVolumeClaimInfo: accessModes: - ReadWriteOnce capacity: storage: 1Gi claimName: pvc-6c23f224-bc1a-489f-9104-1762cd9cbbff filesystemOverhead: "0" requests: storage: "1073741824" volumeMode: Block phase: Detaching reason: SuccessfulDelete target: sdb

In our scenario, we are stuck in the Detaching phase with a VolumeReady reason. This happens because the current hotplug pod has not been deleted since the new one is still in ContainerCreating. While we could check for a Detaching phase with a SuccessfulDelete reason, I think that is shown temporarily in the VirtualMachineInstance status, so a simple name check should be enough (one check would fail, the follow up would succeed; IIUC the period of this state is short).

What do you think?

Okay sounds good, I was just making sure I fully understood the issue here. What you are saying makes sense.

This provides a function that, based on the access mode, returns whether a volume is RWX and RWO. Signed-off-by: Alí Felán <[email protected]>

awels · 2025-10-22T20:00:23Z

/test all

alifelan · 2025-10-22T22:00:26Z

I'm looking into the e2e failure, and I don't think that was caused by this commit.

The failure here happened when checking the events for the pod (which, if we made it here, the pod has finished running, aka everything was mounted properly)

------------------------------
CreatePVC multi attach - creates 3 pvcs, attach all 3 to pod, detach all 3 from the pod Block volume mode [pvcCreation, Block]
/home/prow/go/src/github.com/kubevirt/csi-driver/e2e/create-pvc_test.go:376
  STEP: creating a pvc @ 10/22/25 20:21:19.589
  STEP: creating a pod that uses 3 PVCs @ 10/22/25 20:21:19.638
  STEP: Wait for pod to reach a completed phase @ 10/22/25 20:21:19.651
  [FAILED] in [It] - /home/prow/go/src/github.com/kubevirt/csi-driver/e2e/create-pvc_test.go:641 @ 10/22/25 20:21:54.749
  STEP: dumping k8s artifacts to /logs/artifacts/create-pvc_test.go:641 @ 10/22/25 20:21:54.749
  [FAILED] in [JustAfterEach] - /home/prow/go/src/github.com/kubevirt/csi-driver/e2e/e2e-suite_test.go:119 @ 10/22/25 20:21:54.75
• [FAILED] [35.353 seconds]
CreatePVC multi attach - creates 3 pvcs, attach all 3 to pod, detach all 3 from the pod [It] Block volume mode [pvcCreation, Block]
/home/prow/go/src/github.com/kubevirt/csi-driver/e2e/create-pvc_test.go:376
  [FAILED] Expected
      <string>: MapVolume.MapPodDevice failed for volume "pvc-eeddcd3e-4456-48c3-91d0-b0886f0f2702" : rpc error: code = Unknown desc = couldn't find device by serial id
  not to contain substring
      <string>: find device by serial id
  In [It] at: /home/prow/go/src/github.com/kubevirt/csi-driver/e2e/create-pvc_test.go:641 @ 10/22/25 20:21:54.749
  There were additional failures detected.  To view them in detail run ginkgo -vv
------------------------------

I haven't seen a scenario where the serial id shows up later, but i guess it could happen if udev is taking some time? The other possibility I can think of is ControllerPublishVolume returning success when the operation wasn't performed, but the newly introduced return statements all return errors.

I don't see anything pointing to the issue being caused by this PR. ill retry the test

alifelan · 2025-10-22T22:00:37Z

/test pull-csi-driver-e2e-k8s

kubevirt-bot · 2025-10-22T22:00:53Z

@alifelan: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

In response to this:

/test pull-csi-driver-e2e-k8s

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

alifelan · 2025-10-22T22:03:12Z

Fair, I cannot rerun the test. I think this is unrelated to the PR, can we rerun the test? Is this test known to be flaky?

awels · 2025-10-23T12:22:18Z

/test pull-csi-driver-e2e-k8s

awels · 2025-10-23T12:31:54Z

I am not aware of the test being flaky, but lets run it again and find out.

alifelan · 2025-10-28T03:08:14Z

It seems like it may be flaky, probably because of udev timing out.

The way I understand is this test was added to ensure we don't succeed a ControllerPublishVolume operation before the volume is available in the VirtualMachineInstance status. Here we are correctly going through that flow, but the serial id was not yet available, maybe because udev was taking longer than it took for KCM to trigger NodeStageVolume

kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Oct 22, 2025

kubevirt-bot requested review from aglitke and awels October 22, 2025 06:01

kubevirt-bot added the size/XXL label Oct 22, 2025

alifelan force-pushed the multiattach branch from 5a64e74 to 63148ad Compare October 22, 2025 06:30

alifelan mentioned this pull request Oct 22, 2025

Fix deadlock while reattaching volume #143

Merged

alifelan force-pushed the multiattach branch from 63148ad to 6a81b07 Compare October 22, 2025 09:02

kubevirt-bot added size/L and removed size/XXL labels Oct 22, 2025

awels reviewed Oct 22, 2025

View reviewed changes

Update volume capabilty access check

4533dca

This provides a function that, based on the access mode, returns whether a volume is RWX and RWO. Signed-off-by: Alí Felán <[email protected]>

alifelan force-pushed the multiattach branch from 6a81b07 to 4533dca Compare October 22, 2025 19:53

Add multi-attach error to RWO volumes #161

Are you sure you want to change the base?

Add multi-attach error to RWO volumes #161

Uh oh!

Conversation

alifelan commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kubevirt-bot commented Oct 22, 2025

Uh oh!

kubevirt-bot commented Oct 22, 2025

Uh oh!

awels commented Oct 22, 2025

Uh oh!

awels left a comment

Choose a reason for hiding this comment

Uh oh!

awels Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

alifelan Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

awels Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

alifelan Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

awels Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

awels commented Oct 22, 2025

Uh oh!

alifelan commented Oct 22, 2025

Uh oh!

alifelan commented Oct 22, 2025

Uh oh!

kubevirt-bot commented Oct 22, 2025

Uh oh!

alifelan commented Oct 22, 2025

Uh oh!

awels commented Oct 23, 2025

Uh oh!

awels commented Oct 23, 2025

Uh oh!

alifelan commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alifelan commented Oct 22, 2025 •

edited

Loading