-
Notifications
You must be signed in to change notification settings - Fork 39
Fix deadlock while reattaching volume #143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
08a6174 to
43c63ad
Compare
Signed-off-by: Andrei Kvapil <[email protected]>
|
/assign @awels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: awels The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Been crazy busy, finally got some time to look. Looks great, thanks for the PR. |
…1135) ## What this PR does This pr imports upstream fix for volume reattaching procedure - kubevirt/csi-driver#143 ### Release note <!-- Write a release note: - Explain what has changed internally and for users. - Start with the same [label] as in the PR title - Follow the guidelines at https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md. --> ```release-note [kubernetes] Fix dead-lock while reattaching a KubeVirt-CSI volume ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Improved volume management for virtual machines by adding checks to skip unnecessary attach or detach operations when the volume is already in the desired state. * **Tests** * Added new unit tests to verify optimized volume attach/detach workflows and ensure fast-path logic is functioning correctly. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
| // Fast-path: nothing to do if the volume is already attached | ||
| attached, err := c.virtClient.EnsureVolumeAvailableVM(ctx, c.infraClusterNamespace, vmName, dvName) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| if attached { | ||
| klog.V(3).Infof("Volume %s already attached to VM %s - skipping hot-plug", dvName, vmName) | ||
| return &csi.ControllerPublishVolumeResponse{}, nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apologies for only noticing this today. isn't this risky? the volume is only available to the kubevirt VM once the vmi status reflects that. so ctrlpublish may return success prior to that
|
any chance you could provide the reproducer? I don't fully understand tbh and I don't think the PR does what the description states
EDIT: |
|
Hey, I just added an additional check if volume still attached (is in vmi spec) do not remove finalizer from it if volume detached (is missing in vmi spec) safely remove finalizer |
Signed-off-by: Andrei Kvapil <[email protected]>
|
Hey! I was looking into recent commits, and stumbled upon this. I may lack some context, but I think this may be causing a regression, or at least, using different sources of truth ( For a
For a
This stems from using a different source of truth between the initial check ( |
Yeah I was alluding to the same with my review in #143 (review) and the following comment. I did not fully understand the problem/handling of it - #143 (comment). |
|
Was there anything that made you change your mind? Have we seen any issues related to this behaviour? While I don't have experience deploying a KubeVirt CSI Driver with these changes, at a first glance I believe we will be impacted by them. We run into timeouts frequently, and after the second attempt, our |
oh, I didn't change my mind - I never pulled this commit into the productized fork. |
…ozystack#1135) ## What this PR does This pr imports upstream fix for volume reattaching procedure - kubevirt/csi-driver#143 ### Release note <!-- Write a release note: - Explain what has changed internally and for users. - Start with the same [label] as in the PR title - Follow the guidelines at https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md. --> ```release-note [kubernetes] Fix dead-lock while reattaching a KubeVirt-CSI volume ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Improved volume management for virtual machines by adding checks to skip unnecessary attach or detach operations when the volume is already in the desired state. * **Tests** * Added new unit tests to verify optimized volume attach/detach workflows and ensure fast-path logic is functioning correctly. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
What this PR does / why we need it:
This PR adds logic that prevents the controller from trying to attach or detach a volume when the VMI status already shows the volume is in the desired state.
Which issue(s) this PR fixes
We ran into a problem while a volume was being attached. The attacher logged:
On the node the disk was already attached to the VM, yet the corresponding VolumeAttachment had a deletion timestamp.
So we got stuck: the hp-volume pod couldn’t start because it was waiting for the VolumeAttachment to disappear, and the VolumeAttachment couldn’t be removed because the VM was still using the disk.
This may be a race: kubevirt-csi-controller thought the disk had been detached (the hp-volume pod was gone), while the VM was still holding it. Kubernetes then created a new VolumeAttachment in the tenant cluster, triggering another attach operation.
Special notes for your reviewer:
Release note: