Skip to content

Conversation

jrife
Copy link

@jrife jrife commented May 31, 2025

AllowedIPConfig

I recently implemented support for the WGALLOWEDIP_F_REMOVE_ME flag to WireGuard Linux's Netlink API which, in the same way that WGPEER_F_REMOVE_ME allows a user to remove a single peer from a WireGuard device's configuration, allows a user to remove an ip from a peer's set of allowed ips. This capability was subsequently ported to wireguard-go as well.

This PR adds support for this feature to wgctrl-go, allowing clients to incrementally remove allowed IPs on a peer like so:

c.ConfigureDevice("wg0", wgtypes.Config{
	Peers: []wgtypes.PeerConfig{
		{
			PublicKey: peerKey,
			AllowedIPs: []wgtypes.AllowedIPConfig{
				{
					IPNet:  ip,
					Remove: true,
				},
			},
		},
	},
})

WithShim

The second part of this PR adds a WithShim option for clients. Since direct allowed IP removal is a new feature, set to land in the next Linux release, its availability is limited for now. Clients who want to take advantage of the capability would need to know ahead of time if WireGuard on their system supports it or probe to see if they're running on a system that supports it. To ease the transition, add the WithShim option for clients:

	c, err := wgctrl.New(wgctrl.WithShim)

WithShim wraps internal clients with a shim client that probes to see if direct IP removal is supported on their system. If not, the shim client emulates the effect by assigning IPs to a dummy peer then removing that peer. At some point in the future, this option should no longer be necessary once the feature becomes commonplace.

In my case, I plan to use WithShim to simplify Cilium's WireGuard orchestration logic, since it needs to work across a range of Linux kernel versions.

Testing

I expanded the integration tests to exercise the direct allowed IP removal capability and ran them on all platforms.

$ WGCTRL_INTEGRATION=yesreallydoit go test .

OPERATING SYSTEM DRIVER REMOVE IP SUPPORTED RESULT
FreeBSD 14.2 native no PASS
OpenBSD 7.7 native no PASS1
Windows 11 native no PASS2
Linux native no PASS
Linux wireguard-go no PASS
Linux native yes PASS
Linux wireguard-go yes PASS

I compiled Linux from the bpf-next/master tree which includes commit ba3d7b93dbe3 ("wireguard: allowedips: add WGALLOWEDIP_F_REMOVE_ME flag") and wireguard-go from the head of master which includes commit 256bcbd70d5b ("device: add support for removing allowedips individually") to test platforms with native support.

On systems where direct IP removal is not supported, I also made sure that ConfigureDevice returns an error when Remove is used without the shim.

1 OpenBSD skips this test case, since the driver is read only.
2 Two assertions fail in Windows due to missing protocol version, but testRemoveManyIPs passes.

@jrife jrife force-pushed the jrife/support-allowedips-removeme branch 4 times, most recently from 067de43 to 8d219c0 Compare May 31, 2025 03:48
[1] adds the WGALLOWEDIP_F_REMOVE_ME flag to WireGuard's Netlink
API which, in the same way that WGPEER_F_REMOVE_ME allows a user to
remove a single peer from a WireGuard device's configuration, allows a
user to remove an ip from a peer's set of allowed ips. This capability
was subsequently ported to wireguard-go as well.

Add support for this feature to wgctrl-go, allowing clients to
incrementally remove allowed IPs on a peer like so:

wgtypes.Config{
	Peers: []wgtypes.PeerConfig{
		{
			PublicKey: peerKey,
			AllowedIPs: []wgtypes.AllowedIPConfig{
				{
					IPNet:  ip,
					Remove: true,
				},
			},
		},
	},
}

[1]: https://lore.kernel.org/netdev/[email protected]/

Signed-off-by: Jordan Rife <[email protected]>
@jrife jrife force-pushed the jrife/support-allowedips-removeme branch from 8d219c0 to 8e89697 Compare May 31, 2025 03:51
Direct allowed IP removal is a new feature, so its availability is
limited for now. Clients who want to take advantage of this capability
would need to know ahead of time if WireGuard on their system supports
it or probe to see if they're running on a system that supports it. To
ease the transition, add the WithShim option for clients:

	c, err := wgctrl.New(wgctrl.WithShim)

WithShim wraps internal clients with a shim client that probes to see
if direct IP removal is supported on their system. If not, the shim
client emulates the effect by assigning IPs to a dummy peer then
removing that peer. At some point in the future, this option should no
longer be necessary once the feature becomes commonplace.

In my case, I plan to use WithShim to simplify Cilium's WireGuard
orchestration logic.

Signed-off-by: Jordan Rife <[email protected]>
@jrife jrife force-pushed the jrife/support-allowedips-removeme branch 3 times, most recently from 5efcced to 04b7167 Compare June 6, 2025 05:33
@jrife jrife changed the title Support direct allowed IP removal (WIP) Support direct allowed IP removal Jun 6, 2025
@jrife jrife marked this pull request as ready for review June 6, 2025 05:48
@jrife
Copy link
Author

jrife commented Jun 6, 2025

FYI @zx2c4, I mentioned this work on the mailing lists, but it looks like most development for wgctrl-go happens on GitHub so tagging you here.

Add testRemoveManyIPs to the integration tests to exercise the direct
allowed IP removal capability and run the test suite on all platforms.

$ WGCTRL_INTEGRATION=yesreallydoit go test .

┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ OPERATING SYSTEM ┃ DRIVER       ┃ REMOVE IP SUPPORTED ┃ RESULT ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ FreeBSD 14.2     │ native       │ no                  │ PASS   │
├──────────────────┼──────────────┼─────────────────────┼────────┤
│ OpenBSD 7.7      │ native       │ no                  │ PASS*  │
├──────────────────┼──────────────┼─────────────────────┼────────┤
│ Windows 11       │ native       │ no                  │ PASS** │
├──────────────────┼──────────────┼─────────────────────┼────────┤
│ Linux            │ native       │ no                  │ PASS   │
├──────────────────┼──────────────┼─────────────────────┼────────┤
│ Linux            │ wireguard-go │ no                  │ PASS   │
├──────────────────┼──────────────┼─────────────────────┼────────┤
│ Linux            │ native       │ yes                 │ PASS   │
├──────────────────┼──────────────┼─────────────────────┼────────┤
│ Linux            │ wireguard-go │ yes                 │ PASS   │
└──────────────────┴──────────────┴─────────────────────┴────────┘

I compiled Linux from the bpf-next/master tree which includes
commit ba3d7b93dbe3 ("wireguard: allowedips: add WGALLOWEDIP_F_REMOVE_ME
flag") and wireguard-go from the head of master which includes commit
256bcbd70d5b ("device: add support for removing allowedips
individually") to test platforms with native support.

On systems where direct IP removal is not supported, I also made sure
that ConfigureDevice returns an error when Remove is used without the
shim.

*  OpenBSD skips this test case, since the driver is read only.
** Two assertions fail in Windows due to missing protocol version, but
   testRemoveManyIPs passes.

Signed-off-by: Jordan Rife <[email protected]>
@jrife jrife force-pushed the jrife/support-allowedips-removeme branch from 04b7167 to 56a8249 Compare June 6, 2025 18:32
@jrife
Copy link
Author

jrife commented Jun 28, 2025

cc @mdlayher could you take a look at this when you get a chance? Thanks!

@zx2c4
Copy link
Member

zx2c4 commented Jun 30, 2025

So the thing I don't like about this is that it's actually kind of well designed! Specifically, this wgctrl.New(wgctrl.WithShim) business. That's a very weird thing to complain about so let me explain my feelings a bit more.

In a year or two or three or four, most users won't really care about the lack of support on old kernels because new kernels will be more readily deployable. But right now, you do care, so you understandably want something now so that you can begin using this API. At the same time, "WithShim" is so generic and nice and neat. The API looks like a general purpose shim with a new general purpose options flag to enable it. But hopefully in five years that will be useless and we can remove the nice pretty API. Except nobody likes to remove nice pretty APIs.

Because this is only going to be useful for a limited amount of time, and the actual thing that it does is a bit of a clever hack, what if we just treat the whole thing as a hack?

For example, if this is to become always on (and why not?), then the flow looks like:

if aip.Flags != 0 && !backend.supportsAIPFlags() { ... do magic ... }

Where supportsAIPFlags() is something even dumber that runs uname(2) and caches the result in a boolean for subsequent usage. And then we decide based on the vulgar version number.

Yes, this is terrible! This is objectively worse in almost every way than what you've proposed in your PR. Trying, failing, and retrying -- "probing" for features -- takes care of older userspace implementations, backported kernelspace implementations, and so forth. (It also makes things kinda messier in the future if we wind up with two things to probe for.) But userspace implementations can be upgraded easily, and kernel implementations will eventually be updated. So why not do the dumbest and least intrusive thing?

Another solution would be to enable this with an option wgctrl.hacks.EnableAIPRemovalBackportHack() but why not just enable this all the time? If somebody is using the library and makes use of this new flag feature, chances are they want it to work.

And then in a year or two when the kernel feature is widespread enough, we can just remove the built-in hack, and folks who hit this and complain can simply update their kernels at this point, but don't need to update their code.

Does that make sense?

@jrife
Copy link
Author

jrife commented Jun 30, 2025

In a year or two or three or four, most users won't really care about the lack of support on old kernels because new kernels will be more readily deployable.

As someone who is still supporting platforms stuck on RHEL 4.18 kernels, I really hope so haha. Old kernels stick around a while, especially for enterprisey stuff.

Where supportsAIPFlags() is something even dumber that runs uname(2) and caches the result in a boolean for subsequent usage. And then we decide based on the vulgar version number.

Unfortunately, you can't really trust the kernel version info to tell you what features are available. This probably works for most platforms, but there are exceptions like RHEL and its derivatives like Rocky where, for example, their "4.18" kernel is more like a 5.10 kernel as far as features go. This can be true of Ubuntu to a more limited extent as well, although they tend to track upstream kernels a little more closely, only backporting the odd bug fix here and there. This is an annoyance I've grappled with a lot, especially as it relates to Cilium.

A more feature-specific probe is simple enough, as evidenced by the code here, so I'd advise sticking with that approach to avoid the inevitable headaches that come with comparing kernel version strings. This is the only part I have a strong feeling about.

At the same time, "WithShim" is so generic and nice and neat. The API looks like a general purpose shim with a new general purpose options flag to enable it. But hopefully in five years that will be useless and we can remove the nice pretty API. Except nobody likes to remove nice pretty APIs.

Another solution would be to enable this with an option wgctrl.hacks.EnableAIPRemovalBackportHack() but why not just enable this all the time? If somebody is using the library and makes use of this new flag feature, chances are they want it to work.

I hear you on the API, and as far as that goes, I have no strong opinions one way or another. I'm fine enabling the behavior by default and dropping the WithShim option. With that, I actually think the opposite might be better (wgctrl.hacks.DisableAIPRemovalBackportHack()). My worry is more that there is some arcane system or setup where the probing technique is too disruptive or there is a use case where the shim behavior gets in the way and someone wants to disable it. Then again, if such a use case arises somewhere somebody could always open a PR or issue here. I wouldn't be using that option personally and am probably just being paranoid.

How would you feel about this plan?:

  1. Drop the public API for enabling the shim/emulation behavior and instead just making that the default behavior (no more WithShim)
  2. Leave out any user-facing controls for now (no wgctrl.hacks.EnableAIPRemovalBackportHack() or wgctrl.hacks.DisableAIPRemovalBackportHack()). If a problem arises, somebody can open an issue or PR here to add knobs if needed.
  3. Keep the feature-specific probing rather than doing something less reliable like uname.

It should be trivial to drop the shim layer later by deleting a file and a few functions later on without impacting users of the library.

@jrife
Copy link
Author

jrife commented Jul 28, 2025

@zx2c4 Just checking back in! How do you feel about my counterproposal?

@jrife
Copy link
Author

jrife commented Aug 11, 2025

@zx2c4 @mdlayher Any thoughts here? I'd love to close the loop on this, as it's coming up on a year since I embarked on the endeavor to support direct IP removal throughout the stack. The Remove flag would let me finally get rid of some hacks we're doing over in Cilium and simplify our WireGuard device configuration logic.

As an aside, I noticed this repo seems to be short on reviewers, so if you're looking for some extra help I'd be happy to chip in.

@jrife
Copy link
Author

jrife commented Sep 22, 2025

bumping this thread @zx2c4 @mdlayher

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants