-
Notifications
You must be signed in to change notification settings - Fork 798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DHCP lease maintenance should terminate when interface no longer exists. #1143
base: main
Are you sure you want to change the base?
DHCP lease maintenance should terminate when interface no longer exists. #1143
Conversation
3cb9c6b
to
20e6ba9
Compare
20e6ba9
to
0834450
Compare
plugins/ipam/dhcp/lease.go
Outdated
|
||
ctx, cancel := context.WithTimeout(ctx, linkCheckTotalTimeout) | ||
defer cancel() | ||
_, err := backoffRetry(ctx, linkCheckRetryMax, checkFunc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the only downside here is that backoffRetry()
sets a "base delay" using the resendDelay0
const which was from the DHCP RFC stuff, so, it's 4 seconds the first time. This could probably be quicker? Will happily rejig backoffRetry to take a base delay parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops missed a major part of this... fix incoming.
Soooo.... yeah what I have currently pushed has some things I totally whiffed on and I'm working on correcting it, but, in an ideal world, the netlink library release v1.3.0 doesn't include the commit to detect the ErrDumpInterrupted I'm looking into working around it until that can get released, which I also requested in vishvananda/netlink#1019 |
0834450
to
0249707
Compare
plugins/ipam/dhcp/lease.go
Outdated
return exists, err | ||
} | ||
|
||
func backoffRetryLinkExists(ctx context.Context, maxDelay time.Duration, f func() (bool, error)) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wound up creating another backoff method because I needed to have the retry method f func() (bool, error)
return a bool and an error, so, I additionally added linkCheckDelay0
plugins/ipam/dhcp/lease.go
Outdated
if err != nil { | ||
log.Printf("!bang: Error checking network link '%s': %v, type: %v", l.link.Attrs().Name, err, reflect.TypeOf(err)) | ||
var linkNotFoundErr *netlink.LinkNotFoundError | ||
if errors.As(err, &linkNotFoundErr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wasn't evaluating to true even though I was seeing logs as:
2025/01/21 21:36:46 Error checking network link 'net1': Link not found, type: netlink.LinkNotFoundError
I'll remove the reflect package and also the lines marked with !bang
once I get it sorted.
ea7c4f0
to
55262dc
Compare
I'm going to change to checking for the string
That would never evaluate as true, even when I could find the error as |
c0f73b9
to
78d100b
Compare
You're passing the address of an empty pointer (pointer of pointer) to This should do the job
Try it in https://go.dev/play/p/GZJO1RdVR8g |
78d100b
to
7758510
Compare
plugins/ipam/dhcp/lease.go
Outdated
_, err := netlink.LinkByName(l.link.Attrs().Name) | ||
if err != nil { | ||
var linkNotFoundErr *netlink.LinkNotFoundError = &netlink.LinkNotFoundError{} | ||
if errors.As(err, linkNotFoundErr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Owe you one @zeeke !!! Thanks. Appreciate that, it works a charm and much cleaner than the string check, thanks.
plugins/ipam/dhcp/lease.go
Outdated
ctx, cancel := context.WithTimeout(ctx, linkCheckTotalTimeout) | ||
defer cancel() | ||
|
||
exists, err := backoffRetryLinkExists(ctx, linkCheckRetryMax, checkFunc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
linkCheckRetryMax
is constant variable, defined as global scope in this file. So we may remove it from function argument. Otherwise, remove constant variable, linkCheckRetryMax
and define it in function variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, simplified so that it isn't passed through the method param and just used directly there, good call
plugins/ipam/dhcp/lease.go
Outdated
func backoffRetryLinkExists(ctx context.Context, maxDelay time.Duration, f func() (bool, error)) (bool, error) { | ||
baseDelay := linkCheckDelay0 | ||
for { | ||
exists, err := f() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, this function is pretty for backoffRetryLinkExists
, not for common retry, hence we may not required to have f func() (bool, error))
as argument, I guess. Instead of that, having linkName
in function argument makes it simple, I feel.
func backoffRetryLinkExists(ctx context.Context, maxDelay time.Duration, linkName string) (bool, error) {
checkFunc := func() (bool, error) {
_, err := netlink.LinkByName(linkName)
if err != nil {
var linkNotFoundErr *netlink.LinkNotFoundError = &netlink.LinkNotFoundError{}
if errors.As(err, linkNotFoundErr) {
return false, nil
}
return false, err
}
return true, nil
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Tomo, nice simplification. I did wind up breaking out a method checkLinkByName
to use there, and then just check it in a for loop in a condensed version of the checkLinkExistsWithBackoff
method.
plugins/ipam/dhcp/lease.go
Outdated
@@ -344,6 +360,45 @@ func (l *DHCPLease) maintain() { | |||
} | |||
} | |||
|
|||
func (l *DHCPLease) checkLinkExistsWithBackoff(ctx context.Context) (bool, error) { | |||
checkFunc := func() (bool, error) { | |||
_, err := netlink.LinkByName(l.link.Attrs().Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we assume that this link is already removed, then l.link.Attrs()
may be invalid. We should store interface name ahead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting, I didn't really account for this, even though it seems it was pulling up the correct name. However, just in case this does happen in other situations, I instead opted to store linkName
as a property of the DHCPLease
object. Thanks!
Due to oberservations that threads can grow and the dhcp daemon uses an increasing amount of memory. This situation can happen organically when using say, bridge CNI, and the bridge has been removed outside of the bridge CNI lifecycle, and an interface no longer exists on a pod. Does so on a retry loop using the `backoffRetry()` method. Signed-off-by: dougbtv <[email protected]>
7758510
to
ce0c680
Compare
Due to oberservations that threads can grow and the dhcp daemon uses an increasing amount of memory.
This situation can happen organically when using say, bridge CNI, and the bridge has been removed outside of the bridge CNI lifecycle, and an interface no longer exists on a pod.
Also, for reproduction steps (admittedly, openshift biased), see my steps in https://gist.github.com/dougbtv/803316016c1e96a529725013402d5f0d