Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add proper implementation of UnprepareResources in kubelet plugin #109

Merged
merged 1 commit into from
May 7, 2024

Conversation

klueska
Copy link
Collaborator

@klueska klueska commented May 7, 2024

Previously, we always returned SUCCESS from every call to UnprepareResources, regardless of whether we successfully unprepared the claim or not. The idea being that we would always garbage collect it later, once the claim got deallocated by the controller.

This commit changes the semantics to carry out the unprepare at the point it is actually requested, returning an error if this is not possible for some reason. In the past, such errors would have been missed, and the resources could have been deallocated prematurely.

@klueska klueska requested a review from elezar May 7, 2024 08:13
Copy link
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments / questions. No blockers though.

@@ -114,6 +113,20 @@ func (d *driver) NodePrepareResources(ctx context.Context, req *drapbv1.NodePrep
return preparedResources, nil
}

func (d *driver) NodeUnprepareResources(ctx context.Context, req *drapbv1.NodeUnprepareResourcesRequest) (*drapbv1.NodeUnprepareResourcesResponse, error) {
klog.Infof("NodeUnPrepareResource is called: number of claims: %d", len(req.Claims))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
klog.Infof("NodeUnPrepareResource is called: number of claims: %d", len(req.Claims))
klog.Infof("NodeUnprepareResource is called: number of claims: %d", len(req.Claims))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

err = d.unprepare(ctx, claim.Uid)
if err != nil {
return &drapbv1.NodeUnprepareResourceResponse{
Error: fmt.Sprintf("error unpreparing devices for claim %v: %v", claim.Uid, err),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Should this be "devices" or "resources"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think either is fine since what we end up returning on success of the prepare calls is a list of CDI devices:

&drapbv1.NodePrepareResourceResponse{CDIDevices: prepared}

cmd/nvidia-dra-plugin/driver.go Show resolved Hide resolved
@klueska klueska force-pushed the fix-bug-on-unprepare branch from 850abba to 57acdf6 Compare May 7, 2024 11:14
Previously, we always returned SUCCESS from every call to UnprepareResources,
regardless of whether we successfully unprepared the claim or not. The idea
being that we would always garbage collect it later, once the claim got
deallocated by the controller.

This commit changes the semantics to carry out the unprepare at the point it is
actually requested, returning an error if this is not possible for some reason.
In the past, such errors would have been missed, and the resources could have
been deallocated prematurely.

Signed-off-by: Kevin Klues <[email protected]>
@klueska klueska force-pushed the fix-bug-on-unprepare branch from 57acdf6 to 0f8ce59 Compare May 7, 2024 11:17
@klueska klueska merged commit 917e1ce into NVIDIA:main May 7, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants