Skip to content

Commit c3619ec

Browse files
committedDec 10, 2024·
Add bubblewrap-in-kubernetes post
Change-Id: I4f65aa8f2fe7e52614c48de4cee5bd8a19250aad
1 parent bd7fdb1 commit c3619ec

4 files changed

+372
-0
lines changed
 
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
This post explores how to create nested containers securely inside Kubernetes.
2+
In the previous post titled [Recursive namespaces to run containers inside a container][prev-post]
3+
I showed how to create nested containers using a rootless container runtimes like Podman.
4+
In this post, I'll demonstrate how to run the same workload with [Kubernetes][k8s].
5+
6+
In two parts, I will present:
7+
8+
- How to run Kubernetes from source.
9+
- The ProcMountType feature to work around the original issue.
10+
11+
12+
## Context and problem statement
13+
14+
The context of this post is to deploy a service named zuul-executor for running CI builds securely inside Kubernetes,
15+
without requiring a privileged security context.
16+
17+
The problem is that this service performs build isolation locally using [Bubblewrap][bwrap],
18+
which is similar to running a container inside a container.
19+
20+
21+
## Run kubernetes locally
22+
23+
In this section, let's set up Kubernetes locally.
24+
On a fresh Fedora 41 system, install the following requirements:
25+
26+
```ShellSession
27+
$ sudo dnf install -y etcd crio crictl kubectl containernetworking-plugins
28+
$ sudo systemctl start crio
29+
```
30+
31+
Then, start Kubernetes using the *local-up-cluster* script as follows:
32+
33+
```ShellSession
34+
$ mkdir -p ~/src/github.com/kubernetes; cd ~/src/github.com/kubernetes
35+
$ git clone https://github.com/kubernetes/kubernetes/
36+
$ cd kubernetes
37+
$ sudo env CGROUP_DRIVER=systemd CONTAINER_RUNTIME=remote CONTAINER_RUNTIME_ENDPOINT='unix:///var/run/crio/crio.sock' \
38+
./hack/local-up-cluster.sh
39+
...
40+
Local Kubernetes cluster is running. Press Ctrl-C to shut it down.
41+
```
42+
43+
… using the following test resource:
44+
45+
```yaml
46+
apiVersion: v1
47+
kind: Pod
48+
metadata:
49+
name: test-bwrap
50+
spec:
51+
containers:
52+
- name: test
53+
image: quay.io/zuul-ci/zuul-executor
54+
command: ["/bin/sleep", "infinity"]
55+
securityContext:
56+
capabilities:
57+
add: ["SETFCAP"]
58+
```
59+
60+
> As seen previously, we need *CAP_SETFCAP* to create the user namespace, otherwise bwrap fails early with the following error:
61+
>
62+
> ```
63+
> bwrap: setting up uid map: Operation not permitted
64+
> ```
65+
66+
Apply the test resource with the following commands:
67+
68+
```ShellSession
69+
$ export KUBECONFIG=/var/run/kubernetes/admin.kubeconfig
70+
$ kubectl apply -f test-bwrap.yaml
71+
$ kubectl exec test-bwrap -- bwrap --ro-bind /lib /lib --ro-bind /usr /usr --symlink /usr/lib64 /lib64 --proc /proc --dev /dev --tmpfs /tmp --unshare-all --new-session ps afx
72+
bwrap: Can't mount proc on /newroot/proc: Operation not permitted
73+
```
74+
75+
This produces the same error we encountered in the [previous post][prev-post]: the /proc filesystem is tainted in the pod, preventing Bubblewrap from being able to create a new procfs for the new PID namespace.
76+
77+
The next section introduces the *ProcMountType* feature to work around this issue.
78+
79+
## The ProcMountType feature
80+
81+
The *ProcMountType* feature can be enabled by adding the following environment variable to the *local-up-cluster*: `FEATURE_GATES='UserNamespacesSupport=true,ProcMountType=true'`.
82+
To make use of the new feature, we also need to activate *UserNamespacesSupport*, as explained in the following [documentation](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#proc-access).
83+
84+
With these features, we can update the resource like that:
85+
86+
```yaml
87+
apiVersion: v1
88+
kind: Pod
89+
metadata:
90+
name: test-bwrap
91+
spec:
92+
hostUsers: false
93+
containers:
94+
- name: test
95+
image: quay.io/zuul-ci/zuul-executor
96+
command: ["/bin/sleep", "infinity"]
97+
securityContext:
98+
procMount: Unmasked
99+
capabilities:
100+
add: ["SETFCAP"]
101+
```
102+
103+
… using the following commands:
104+
105+
```
106+
$ sudo crictl rm -af; kubectl delete -f ./test-bwrap.yaml && kubectl apply -f ./test-bwrap.yaml
107+
pod/test-bwrap created
108+
$ kubectl exec test-bwrap -- bwrap --ro-bind /lib /lib --ro-bind /usr /usr --symlink /usr/lib64 /lib64 --proc /proc --dev /dev --tmpfs /tmp --unshare-all --new-session ps afx
109+
bwrap: Can't mount proc on /newroot/proc: Permission denied
110+
```
111+
112+
This time we get a new permission denied, which is caused by SELinux. Using *audit2allow*, we can see that the following policy needs to be installed:
113+
114+
```
115+
module nestedcontainers 1.0;
116+
117+
require {
118+
type proc_t;
119+
type devpts_t;
120+
type container_t;
121+
class filesystem mount;
122+
}
123+
124+
#============= container_t ==============
125+
allow container_t devpts_t:filesystem mount;
126+
allow container_t proc_t:filesystem mount;
127+
```
128+
129+
… which lets us run Bubblewrap inside an unprivileged pod:
130+
131+
```ShellSession
132+
$ sudo semodule -i nestedcontainers.pp
133+
$ kubectl exec test-bwrap -- bwrap --ro-bind /lib /lib --ro-bind /usr /usr --symlink /usr/lib64 /lib64 --proc /proc --dev /dev --tmpfs /tmp --unshare-all --new-session ps afx
134+
PID TTY STAT TIME COMMAND
135+
1 ? Ss 0:00 bwrap --ro-bind /lib /lib --ro-bind /usr /usr --symlink /usr/lib64 /lib64 --proc /proc --dev /dev --tmpfs /tmp --unshare-all --new-session --cap-add all --uid 0 ps afx
136+
2 ? R 0:00 ps afx
137+
```
138+
139+
Notice how the `sleep infinity` process is not visible in the ps output, confirming that we are indeed running in a nested container.
140+
141+
## Conclusion
142+
143+
This post demonstrates that we can run a container inside a container with Kubernetes thanks to the following settings:
144+
145+
- The SETFCAP to create the user namespace,
146+
- The ProcMountType and UserNamespacesSupport to unmask the /proc filesystem, and
147+
- A SELinux policy to enable mounting filesystems inside the new namespace.
148+
149+
[prev-post]: https://www.softwarefactory-project.io/recursive-namespaces-to-run-containers-inside-a-container.html
150+
[k8s]: https://kubernetes.io/
151+
[bwrap]: https://github.com/containers/bubblewrap
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
Secure Bubblewrap inside Kubernetes with ProcMount
2+
##################################################
3+
4+
:date: 2024-12-09
5+
:category: blog
6+
:authors: tristanC
7+
8+
.. raw:: html
9+
10+
<style type="text/css">
11+
12+
.literal {
13+
border-radius: 6px;
14+
padding: 1px 1px;
15+
background-color: rgba(27,31,35,.05);
16+
}
17+
18+
</style>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
#! /usr/bin/env nix-shell
2+
#! nix-shell -i bash -p pandoc
3+
#! nix-shell -I nixpkgs=https://github.com/NixOS/nixpkgs/archive/4d2b37a84fad1091b9de401eb450aae66f1a741e.tar.gz
4+
5+
NAME="blog-bubblewrap-in-kubernetes-pod-with-procmount"
6+
7+
pandoc --include-in-header=./$NAME.rst \
8+
-f gfm --reference-links \
9+
-t rst ./$NAME.md -o ../website/content/$NAME.rst
10+
11+
sed -e 's|^.. code::|.. code-block::|' -i ../website/content/$NAME.rst
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,192 @@
1+
Secure Bubblewrap inside Kubernetes with ProcMount
2+
##################################################
3+
4+
:date: 2024-12-09
5+
:category: blog
6+
:authors: tristanC
7+
8+
.. raw:: html
9+
10+
<style type="text/css">
11+
12+
.literal {
13+
border-radius: 6px;
14+
padding: 1px 1px;
15+
background-color: rgba(27,31,35,.05);
16+
}
17+
18+
</style>
19+
20+
This post explores how to create nested containers securely inside
21+
Kubernetes. In the previous post titled `Recursive namespaces to run
22+
containers inside a container`_ I showed how to create nested containers
23+
using a rootless container runtimes like Podman. In this post, I'll
24+
demonstrate how to run the same workload with `Kubernetes`_.
25+
26+
In two parts, I will present:
27+
28+
- How to run Kubernetes from source.
29+
- The ProcMountType feature to work around the original issue.
30+
31+
Context and problem statement
32+
=============================
33+
34+
The context of this post is to deploy a service named zuul-executor for
35+
running CI builds securely inside Kubernetes, without requiring a
36+
privileged security context.
37+
38+
The problem is that this service performs build isolation locally using
39+
`Bubblewrap`_, which is similar to running a container inside a
40+
container.
41+
42+
Run kubernetes locally
43+
======================
44+
45+
In this section, let's set up Kubernetes locally. On a fresh Fedora 41
46+
system, install the following requirements:
47+
48+
.. code-block:: ShellSession
49+
50+
$ sudo dnf install -y etcd crio crictl kubectl containernetworking-plugins
51+
$ sudo systemctl start crio
52+
53+
Then, start Kubernetes using the *local-up-cluster* script as follows:
54+
55+
.. code-block:: ShellSession
56+
57+
$ mkdir -p ~/src/github.com/kubernetes; cd ~/src/github.com/kubernetes
58+
$ git clone https://github.com/kubernetes/kubernetes/
59+
$ cd kubernetes
60+
$ sudo env CGROUP_DRIVER=systemd CONTAINER_RUNTIME=remote CONTAINER_RUNTIME_ENDPOINT='unix:///var/run/crio/crio.sock' \
61+
./hack/local-up-cluster.sh
62+
...
63+
Local Kubernetes cluster is running. Press Ctrl-C to shut it down.
64+
65+
… using the following test resource:
66+
67+
.. code-block:: yaml
68+
69+
apiVersion: v1
70+
kind: Pod
71+
metadata:
72+
name: test-bwrap
73+
spec:
74+
containers:
75+
- name: test
76+
image: quay.io/zuul-ci/zuul-executor
77+
command: ["/bin/sleep", "infinity"]
78+
securityContext:
79+
capabilities:
80+
add: ["SETFCAP"]
81+
82+
..
83+
84+
As seen previously, we need *CAP_SETFCAP* to create the user
85+
namespace, otherwise bwrap fails early with the following error:
86+
87+
::
88+
89+
bwrap: setting up uid map: Operation not permitted
90+
91+
Apply the test resource with the following commands:
92+
93+
.. code-block:: ShellSession
94+
95+
$ export KUBECONFIG=/var/run/kubernetes/admin.kubeconfig
96+
$ kubectl apply -f test-bwrap.yaml
97+
$ kubectl exec test-bwrap -- bwrap --ro-bind /lib /lib --ro-bind /usr /usr --symlink /usr/lib64 /lib64 --proc /proc --dev /dev --tmpfs /tmp --unshare-all --new-session ps afx
98+
bwrap: Can't mount proc on /newroot/proc: Operation not permitted
99+
100+
This produces the same error we encountered in the `previous post`_: the
101+
/proc filesystem is tainted in the pod, preventing Bubblewrap from being
102+
able to create a new procfs for the new PID namespace.
103+
104+
The next section introduces the *ProcMountType* feature to work around
105+
this issue.
106+
107+
The ProcMountType feature
108+
=========================
109+
110+
The *ProcMountType* feature can be enabled by adding the following
111+
environment variable to the *local-up-cluster*:
112+
``FEATURE_GATES='UserNamespacesSupport=true,ProcMountType=true'``. To
113+
make use of the new feature, we also need to activate
114+
*UserNamespacesSupport*, as explained in the following `documentation`_.
115+
116+
With these features, we can update the resource like that:
117+
118+
.. code-block:: yaml
119+
120+
apiVersion: v1
121+
kind: Pod
122+
metadata:
123+
name: test-bwrap
124+
spec:
125+
hostUsers: false
126+
containers:
127+
- name: test
128+
image: quay.io/zuul-ci/zuul-executor
129+
command: ["/bin/sleep", "infinity"]
130+
securityContext:
131+
procMount: Unmasked
132+
capabilities:
133+
add: ["SETFCAP"]
134+
135+
… using the following commands:
136+
137+
::
138+
139+
$ sudo crictl rm -af; kubectl delete -f ./test-bwrap.yaml && kubectl apply -f ./test-bwrap.yaml
140+
pod/test-bwrap created
141+
$ kubectl exec test-bwrap -- bwrap --ro-bind /lib /lib --ro-bind /usr /usr --symlink /usr/lib64 /lib64 --proc /proc --dev /dev --tmpfs /tmp --unshare-all --new-session ps afx
142+
bwrap: Can't mount proc on /newroot/proc: Permission denied
143+
144+
This time we get a new permission denied, which is caused by SELinux.
145+
Using *audit2allow*, we can see that the following policy needs to be
146+
installed:
147+
148+
::
149+
150+
module nestedcontainers 1.0;
151+
152+
require {
153+
type proc_t;
154+
type devpts_t;
155+
type container_t;
156+
class filesystem mount;
157+
}
158+
159+
#============= container_t ==============
160+
allow container_t devpts_t:filesystem mount;
161+
allow container_t proc_t:filesystem mount;
162+
163+
… which lets us run Bubblewrap inside an unprivileged pod:
164+
165+
.. code-block:: ShellSession
166+
167+
$ sudo semodule -i nestedcontainers.pp
168+
$ kubectl exec test-bwrap -- bwrap --ro-bind /lib /lib --ro-bind /usr /usr --symlink /usr/lib64 /lib64 --proc /proc --dev /dev --tmpfs /tmp --unshare-all --new-session ps afx
169+
PID TTY STAT TIME COMMAND
170+
1 ? Ss 0:00 bwrap --ro-bind /lib /lib --ro-bind /usr /usr --symlink /usr/lib64 /lib64 --proc /proc --dev /dev --tmpfs /tmp --unshare-all --new-session --cap-add all --uid 0 ps afx
171+
2 ? R 0:00 ps afx
172+
173+
Notice how the ``sleep infinity`` process is not visible in the ps
174+
output, confirming that we are indeed running in a nested container.
175+
176+
Conclusion
177+
==========
178+
179+
This post demonstrates that we can run a container inside a container
180+
with Kubernetes thanks to the following settings:
181+
182+
- The SETFCAP to create the user namespace,
183+
- The ProcMountType and UserNamespacesSupport to unmask the /proc
184+
filesystem, and
185+
- A SELinux policy to enable mounting filesystems inside the new
186+
namespace.
187+
188+
.. _Recursive namespaces to run containers inside a container: https://www.softwarefactory-project.io/recursive-namespaces-to-run-containers-inside-a-container.html
189+
.. _Kubernetes: https://kubernetes.io/
190+
.. _Bubblewrap: https://github.com/containers/bubblewrap
191+
.. _previous post: https://www.softwarefactory-project.io/recursive-namespaces-to-run-containers-inside-a-container.html
192+
.. _documentation: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#proc-access

0 commit comments

Comments
 (0)
Please sign in to comment.