Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use thread locking instead of clone hack for netns #3402

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

cpuguy83
Copy link
Member

@cpuguy83 cpuguy83 commented Dec 14, 2022

This uses gonso to setup the network namespace which is a thin wrapper around setns(2) that works out the details of locking/releasing go threads, can manage ns state, etc.
Notably it does not require calling SYS_CLONE or setting up fork hooks for go (at least not for network namespaces).

Also worth noting, the old implementation does not work with gccgo.

This uses gonso to setup the network namespace which is a thin wrapper
around setns(2) that works out the details of locking/releasing go
threads, can manage ns state, etc.
Notably it does not require calling SYS_CLONE or setting up fork hooks
for go (at least not for network namespaces).

Signed-off-by: Brian Goff <[email protected]>
@cpuguy83 cpuguy83 marked this pull request as ready for review December 14, 2022 19:53
@tonistiigi tonistiigi added this to the v0.12.0 milestone Dec 15, 2022
@tonistiigi
Copy link
Member

Looks like this issue was already addressed in #3738 (this PR is actually older but looks like it got lost in review). The difference is still if we would want to use a library. Iiuc then Moby as well doesn't use that library atm. If the intention is to change that(and if using the library simplifies the code) we can make buildkit use it as well.

There is another PR still open about umask that uses the same pattern(without extra library).

@tonistiigi tonistiigi removed this from the v0.12.0 milestone Jun 28, 2023
@neersighted
Copy link
Member

FWIW I'd like to see us use this library in moby/moby; cc @corhere for thoughts on that.

@corhere
Copy link
Contributor

corhere commented Jul 13, 2023

While there is significant overlap between gonso and github.com/docker/docker/internal/unshare I think they have different strengths. Gonso provides a far superior toolset for managing the lifecycles of namespaces, and unlike internal/unshare actually supports creating user namespaces, but I think it's a little heavy for the use case I designed internal/unshare for: just running a function in an ephemeral throwaway namespace.

Compared to

unshare.Go(unix.CLONE_NEWNS, func() error { return nil }, nil)
// open current ns, unshare, setns current ns, close

gonso issues a lot more syscalls to do the same thing:

c, _ := gonso.Current(gonso.NS_MNT) // open current ns
defer c.Close() // close current ns
s, _ := c.Unshare(gonso.NS_MNT) // unshare, open new ns, setns current ns
s.Do(func() error { return nil }) // open current ns, setns new ns, setns current ns
s.Close() // close new ns

Also, gonso doesn't have an API to spawn a goroutine asynchronously. You'd have to call go s.Do() at the cost of an additional goroutine. This is admittedly a very minor point as it wouldn't take much effort to add a func (*gonso.Set) Go(setupfn func() error, fn func()) error method which mirrors unshare.Go.

I think the two packages are complementary. And maybe with a bit of work gonso could be made to be no costlier than unshare.Go when used for the same tasks, at which point I would gladly replace Moby's internal/unshare package with it.

@cpuguy83
Copy link
Member Author

cpuguy83 commented Jul 19, 2023

I think it depends on what being done.

I played around with adding a raw "just unshare these namespaces and run my function":

goos: linux
goarch: amd64
pkg: github.com/cpuguy83/gonso
cpu: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
BenchmarkSetDo-4                   24938             53406 ns/op            2047 B/op          7 allocs/op
BenchmarkUnshareAndDo/raw=false-4                   1464            981305 ns/op            1981 B/op          4 allocs/op
BenchmarkUnshareAndDo/raw=true-4                    1518            892166 ns/op               0 B/op          0 allocs/op
BenchmarkDo/restore=true/one_namespace-4          109070              9596 ns/op             488 B/op         11 allocs/op
BenchmarkDo/restore=true/two_namespaces-4          22126             50920 ns/op            2048 B/op          7 allocs/op
BenchmarkDo/restore=true/three_namespaces-4                18984             58766 ns/op            2048 B/op          7 allocs/op
BenchmarkDo/restore=true/four_namespaces-4                 19220             57435 ns/op            2048 B/op          7 allocs/op
BenchmarkDo/restore=true/five_namespaces-4                 18938             61277 ns/op            2048 B/op          7 allocs/op
BenchmarkDo/restore=false/one_namespace-4                  19984             59669 ns/op            2047 B/op          6 allocs/op
BenchmarkDo/restore=false/two_namespaces-4                 21476             52254 ns/op            2048 B/op          7 allocs/op
BenchmarkDo/restore=false/three_namespaces-4               21147             54989 ns/op            2048 B/op          7 allocs/op
BenchmarkDo/restore=false/four_namespaces-4                18476             59510 ns/op            2048 B/op          7 allocs/op
BenchmarkDo/restore=false/five_namespaces-4                20599             57372 ns/op            2048 B/op          7 allocs/op

BenchmarkUnshareAndDo is two different benchmarks:

  1. raw=false where a groutine is created internally and synchronized over a channel
  2. raw=true where the caller runs the unshare in a goroutine and does its own synchronization, which means the function doesn't need to allocate for a channel (or any other sync primitive).

These both should be compared with BenchmarkDo's restore=false where it is not restoring the thread to the previous state and instead just dropping it.

BenmarkDo significantly faster here because it is re-using the Set over and over.
So it seems Setns (even called up to 5 times in the benchamarks for setting 5 namespaces) is significantly faster than 1 call to unshare(CLONE_NEWNET)

Of note, with restore=true, it is still faster (which means 2x the setns per namespace)... of course the benchmark isn't actually making any changes to the namespaces so there's no thread state that needs to be reset except for the namespaces being used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants