Skip to content

Conversation

@adrianmoisey
Copy link
Contributor

Fixes #1121
Heavily inspired by moby/moby#48407

Signed-off-by: Adrian Moisey <[email protected]>
A wrapper to add retry on for netlink when it receives a ErrDumpInterrupted

Signed-off-by: Adrian Moisey <[email protected]>
These are functions identified as potentially receiving ErrDumpInterrupted and needing to retry

Signed-off-by: Adrian Moisey <[email protected]>
github.com/pkg/errors v0.9.1
github.com/safchain/ethtool v0.5.10
github.com/vishvananda/netlink v1.3.0
github.com/vishvananda/netlink v1.3.1-0.20250303224720-0e7078ed04c8
Copy link
Contributor

@aojea aojea Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this contains the docker patch in the library that will alliviate the problem considerably, as it will not return early

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aha, interesting -- do you have a link to the commit / patch?

@@ -0,0 +1,321 @@
// Package netlinksafe wraps vishvandanda/netlink functions that may return EINTR.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you forking the code it is better to indicate the source , open source licenses and copyright should be respected

License is apache so it is ok, but I noticed they have copyright https://github.com/moby/moby/blob/6de8ba3bc52377ddb06d80c76b0c8d302cb81261/LICENSE#L179

// Copyright Docker, Inc.
// Copied from https://github.com/moby/moby/blob/edaa0eb56d7e749df81bd3dcc2945b81e911b52e/internal/nlwrap/nlwrap_linux.go

Different projects follow different rules, but this seems to be correct to the best of my knowledge

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, a copyright attribution is the right thing to do here. Something like

// SPDX-License-Identifier: Apache-2.0
// Copyright Docker, Inc.
// Copied from https://github.com/moby/moby/blob/edaa0eb56d7e749df81bd3dcc2945b81e911b52e/internal/nlwrap/nlwrap_linux.go

Given that the upstream files do not have a copyright header, we're on our own to Do The Right Thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made #1159 as a follow up to get this added

@aojea
Copy link
Contributor

aojea commented Mar 27, 2025

/assign @s1061123 @squeed

LGTM module the comment for respecting the copyright #1154 (comment)

Folks this is causing a lot of pain in CIs with errors spawning Pods, please take a look and prioritize if possible

Thanks

@MikeZappa87
Copy link
Contributor

I really wish the maintainers of the netlink pkg did a major version bump for this. It really had a large impact.

// by netlink functions along with the EINTR. So, it's not possible to do
// anything but retry. After maxAttempts the EINTR will be returned to the
// caller.
package netlinksafe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably bike-shedding, but I'm slightly wondering if netlinksafe is a good name for this; wondering if "safe" gives incorrect expectations for what it does.

Also similar comments as on the containerd PR;

  • should this be an internal/ package, or do we want external users to use it directly?
  • if we're implementing it in this repository as a package, it could be used by containerd (instead of duplicating it there)

Slightly wondering if (now that more projects start to fork this code) it would perhaps be a better idea to upstream the package from moby to the github.com/vishvananda/netlink repository (as a utility package), so that we don't end up with a gazillion forks of the same code.

To my understanding, it was not correct to do this by default, but having it as an optional utility package could still work?

Any thoughts, @robmry?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thaJeztah want to discuss this on slack? I sent you a dm on the cncf slack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably bike-shedding, but I'm slightly wondering if netlinksafe is a good name for this; wondering if "safe" gives incorrect expectations for what it does.

Agreed, when I picked the name it felt wrong. Happy to adjust it when a name is figured out

Slightly wondering if (now that more projects start to fork this code) it would perhaps be a better idea to upstream the package from moby to the github.com/vishvananda/netlink repository (as a utility package), so that we don't end up with a gazillion forks of the same code.

Agreed on this too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MikeZappa87 about to jump into another call (sorry! etoomanycalls), but will have a peek later

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative is to downgrade the netlink pkg version until this is resolved as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with netlinksafe, mostly we just need to know which package and where to go

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, cni/plugins isn't that big of a project, so I'm not too worried about having perfect naming conventions. safenetlink is fine by me 🤷.

@thaJeztah
Copy link
Contributor

I really wish the maintainers of the netlink pkg did a major version bump for this. It really had a large impact.

Yeah, we really need one; I know debian packages of Docker are (or were) broken because of that. (also have some PRs pending there to slightly lift CI to a better situation (vishvananda/netlink#1033, vishvananda/netlink#1048, vishvananda/netlink#1009)

@aboch @vishvananda any chance we could get help with that?

@MikeZappa87
Copy link
Contributor

I really wish the maintainers of the netlink pkg did a major version bump for this. It really had a large impact.

Yeah, we really need one; I know debian packages of Docker are (or were) broken because of that. (also have some PRs pending there to slightly lift CI to a better situation (vishvananda/netlink#1033, vishvananda/netlink#1048, vishvananda/netlink#1009)

@aboch @vishvananda any chance we could get help with that?

I reached out to Vish via email. I am hoping we can possibly undo some of the blast radius from the change.

// by netlink functions along with the EINTR. So, it's not possible to do
// anything but retry. After maxAttempts the EINTR will be returned to the
// caller.
package netlinksafe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with netlinksafe, mostly we just need to know which package and where to go

@squeed squeed merged commit b088cc3 into containernetworking:main Mar 31, 2025
6 checks passed
@squeed
Copy link
Member

squeed commented Mar 31, 2025

We want to cut a release -- we can worry about the code header if it's a problem :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

netlink walk calls may return EINTR (regression)

6 participants