From 3bd5f9f368b87ba3de268cd3cc39bc14ec34072a Mon Sep 17 00:00:00 2001 From: Xing Zhou Date: Thu, 15 Dec 2016 05:57:51 +0000 Subject: [PATCH 1/2] Create no-new-privs support proposal doc. Create no-new-privs support proposal doc under contributors/design-proposals. --- contributors/design-proposals/no-new-privs.md | 65 +++++++++++++++++++ 1 file changed, 65 insertions(+) create mode 100644 contributors/design-proposals/no-new-privs.md diff --git a/contributors/design-proposals/no-new-privs.md b/contributors/design-proposals/no-new-privs.md new file mode 100644 index 00000000000..bd6cac313a8 --- /dev/null +++ b/contributors/design-proposals/no-new-privs.md @@ -0,0 +1,65 @@ +#Support "no new privileges" in Kubernetes + +##Description + +In Linux, the `execve` system call can grant more privileges to a newly-created process than its parent process. Considering security issues, since Linux kernel v3.5, there is a new flag named `no_new_privs` added to prevent those new privileges from being granted to the processes. + +`no_new_privs` is inherited across `fork`, `clone` and `execve` and can not be unset. With `no_new_privs` set, `execve` promises not to grant the privilege to do anything that could not have been done without the `execve` call. + +For more details about `no_new_privs`, please check the Linux kernel document [here](https://www.kernel.org/doc/Documentation/prctl/no_new_privs.txt). + +Docker started to support `no_new_privs` option since 1.11. Here is the [link](https://github.com/docker/docker/issues/20329) of the ticket in Docker community to support `no_new_privs` option. + +We want to support the creation of containers with `no_new_privs` enabled in Kubernetes, which will make the Kubernetes cluster more safe. Here is the [link](https://github.com/kubernetes/kubernetes/issues/38417) of the ticket in Kubernetes community to track this proposal. + + +##Current implementation + +###Support in Docker + +Since Docker 1.11, user can specify `--security-opt` to enable `no_new_privs` while creating containers, e.g. `docker run --security-opt=no-new-privileges busybox` + +For program client, Docker provides an object named `ContainerCreateConfig` defined in package `github.com/docker/engine-api/types` to config container creation parameters. In this object, there is a string array `HostConfig.SecurityOpt` to specify the security options. Client can utilize this field to specify the arguments for security options while creating new containers. + +###Support in OCI runtimes + +Since version 0.3.0 of the OCI runtime specification, a user can specify the `noNewPrivs` boolean flag in the configuration file. + +More details of OCI implementation can be checked [here](https://github.com/opencontainers/runtime-spec/pull/290). + +###SecurityContext in Kubernetes + +Kubernetes defines `SecurityContext` for `Container` and `PodSecurityContext` for `PodSpec`. `SecurityContext` objects define the related security options for Kubernetes containers, e.g. selinux options. + +While creating a container, kubelet parses the security context object and formats the security option strings for Docker. The security options strings will finally be inserted into `ContainerCreateConfig.HostConfig.SecurityOpt` and passed to Docker. Different Kubernetes runtimes now are using different methods to parse and format the security option strings: +* method `#getSecurityOpts` in `docker_mager_xxxx.go` for Docker runtime +* method `#getContainerSecurityOpts` in `docker_container.go` for CRI + + +##Proposal to support "no new privileges" + +To support "no new privileges" options in Kubernetes, it is proposed to make the following changes: + +###Changes of SecurityContext objects + +Add a new bool type field named `noNewPrivileges` to both `SecurityContext` definition and `PodSecurityContext` definition: +* `noNewPrivileges=true` in `PodSecurityContext` means that all the containers in the pod should be run with `no-new-privileges` enabled. This should be a pod level control of `no-new-privileges` flag. +* `noNewPrivileges` in `SecurityContext` is a container level control of `no-new-privileges` flag, and can override the pod level `noNewPrivileges` setting. + +By default, `noNewPrivileges` is `false`. + +The change of security context API objects requires the update of corresponding Kubernetes documents, need to submit another PR to track this. + +###Changes of docker runtime + +When parsing the new `SecurityContext` object, kubelet has to take care of `noNewPrivileges` field from security context objects. Once `noNewPrivileges` is `true`, kubelet needs to change `#getSecurityOpts` method in `docker_manager_xxx.go` to add `no-new-privileges` option to `ContainerCreateConfig.HostConfig.SecurityOpt` + +###Changes of CRI runtime + +When parsing the new `SecurityContext` object, kubelet has to take care of `noNewPrivileges` field from security context objects. Once `noNewPrivileges` is `true`, kubelet needs to change `#getContainerSecurityOpts` method in `docker_container.go` to add `no-new-privileges` option to `ContainerCreateConfig.HostConfig.SecurityOpt` + +###Changes of kubectl + +This is an additional proposal for kubectl. To improve kubectl user experience, we can add a new flag for kubectl command named `--security-opt`. This flag allows user to create pod with security options configured when using `kubectl run` command. For example, if user issues command like `kubectl run busybox --image=busybox --security-opt=no-new-privileges -- top`, kubernetes shall create a pod with `noNewPrivileges` enabled. + +If the proposal of kubectl changes is accepted, the patch can also be submitted as a separate PR. From efba5a6608f4cfaa3172a09bcbdef39d14c4912b Mon Sep 17 00:00:00 2001 From: Jess Frazelle Date: Thu, 18 May 2017 21:24:46 -0400 Subject: [PATCH 2/2] update no new privs proposal Signed-off-by: Jess Frazelle --- contributors/design-proposals/no-new-privs.md | 144 +++++++++++++----- 1 file changed, 110 insertions(+), 34 deletions(-) diff --git a/contributors/design-proposals/no-new-privs.md b/contributors/design-proposals/no-new-privs.md index bd6cac313a8..f764e399f9f 100644 --- a/contributors/design-proposals/no-new-privs.md +++ b/contributors/design-proposals/no-new-privs.md @@ -1,65 +1,141 @@ -#Support "no new privileges" in Kubernetes +# No New Privileges -##Description +- [Description](#description) + * [Interactions with other Linux primitives](#interactions-with-other-linux-primitives) +- [Current Implementations](#current-implementations) + * [Support in Docker](#support-in-docker) + * [Support in rkt](#support-in-rkt) + * [Support in OCI runtimes](#support-in-oci-runtimes) +- [Existing SecurityContext objects](#existing-securitycontext-objects) +- [Changes of SecurityContext objects](#changes-of-securitycontext-objects) +- [Pod Security Policy changes](#pod-security-policy-changes) -In Linux, the `execve` system call can grant more privileges to a newly-created process than its parent process. Considering security issues, since Linux kernel v3.5, there is a new flag named `no_new_privs` added to prevent those new privileges from being granted to the processes. -`no_new_privs` is inherited across `fork`, `clone` and `execve` and can not be unset. With `no_new_privs` set, `execve` promises not to grant the privilege to do anything that could not have been done without the `execve` call. +## Description -For more details about `no_new_privs`, please check the Linux kernel document [here](https://www.kernel.org/doc/Documentation/prctl/no_new_privs.txt). +In Linux, the `execve` system call can grant more privileges to a newly-created +process than its parent process. Considering security issues, since Linux kernel +v3.5, there is a new flag named `no_new_privs` added to prevent those new +privileges from being granted to the processes. -Docker started to support `no_new_privs` option since 1.11. Here is the [link](https://github.com/docker/docker/issues/20329) of the ticket in Docker community to support `no_new_privs` option. +[`no_new_privs`](https://www.kernel.org/doc/Documentation/prctl/no_new_privs.txt) +is inherited across `fork`, `clone` and `execve` and can not be unset. With +`no_new_privs` set, `execve` promises not to grant the privilege to do anything +that could not have been done without the `execve` call. -We want to support the creation of containers with `no_new_privs` enabled in Kubernetes, which will make the Kubernetes cluster more safe. Here is the [link](https://github.com/kubernetes/kubernetes/issues/38417) of the ticket in Kubernetes community to track this proposal. +For more details about `no_new_privs`, please check the +[Linux kernel documention](https://www.kernel.org/doc/Documentation/prctl/no_new_privs.txt). +This is different from `NOSUID` in that `no_new_privs`can give permission to +the container process to further restrict child processes with seccomp. This +permission goes only one-way in that the container process can not grant more +permissions, only further restrict. -##Current implementation +### Interactions with other Linux primitives -###Support in Docker +- suid binaries: will break when `no_new_privs` is enabled +- seccomp2 as a non root user: requires `no_new_privs` +- seccomp2 with dropped `CAP_SYS_ADMIN`: requires `no_new_privs` +- ambient capabilities: requires `no_new_privs` +- selinux transitions: bugs that were fixed documented [here](https://github.com/moby/moby/issues/23981#issuecomment-233121969) -Since Docker 1.11, user can specify `--security-opt` to enable `no_new_privs` while creating containers, e.g. `docker run --security-opt=no-new-privileges busybox` -For program client, Docker provides an object named `ContainerCreateConfig` defined in package `github.com/docker/engine-api/types` to config container creation parameters. In this object, there is a string array `HostConfig.SecurityOpt` to specify the security options. Client can utilize this field to specify the arguments for security options while creating new containers. +## Current Implementations -###Support in OCI runtimes +### Support in Docker -Since version 0.3.0 of the OCI runtime specification, a user can specify the `noNewPrivs` boolean flag in the configuration file. +Since Docker 1.11, a user can specify `--security-opt` to enable `no_new_privs` +while creating containers, for example +`docker run --security-opt=no_new_privs busybox`. -More details of OCI implementation can be checked [here](https://github.com/opencontainers/runtime-spec/pull/290). +Docker provides via their Go api an object named `ContainerCreateConfig` to +configure container creation parameters. In this object, there is a string +array `HostConfig.SecurityOpt` to specify the security options. Client can +utilize this field to specify the arguments for security options while +creating new containers. -###SecurityContext in Kubernetes +This field did not scale well for the Docker client, so it's suggested that +Kubernetes does not follow that design. -Kubernetes defines `SecurityContext` for `Container` and `PodSecurityContext` for `PodSpec`. `SecurityContext` objects define the related security options for Kubernetes containers, e.g. selinux options. +This is not on by default in Docker. -While creating a container, kubelet parses the security context object and formats the security option strings for Docker. The security options strings will finally be inserted into `ContainerCreateConfig.HostConfig.SecurityOpt` and passed to Docker. Different Kubernetes runtimes now are using different methods to parse and format the security option strings: -* method `#getSecurityOpts` in `docker_mager_xxxx.go` for Docker runtime -* method `#getContainerSecurityOpts` in `docker_container.go` for CRI +More details of the Docker implementation can be read +[here](https://github.com/moby/moby/pull/20727) as well as the original +discussion [here](https://github.com/moby/moby/issues/20329). +### Support in rkt -##Proposal to support "no new privileges" +Since rkt v1.26.0, the `NoNewPrivileges` option has been enabled in rkt. -To support "no new privileges" options in Kubernetes, it is proposed to make the following changes: +More details of the rkt implementation can be read +[here](https://github.com/rkt/rkt/pull/2677). -###Changes of SecurityContext objects +### Support in OCI runtimes -Add a new bool type field named `noNewPrivileges` to both `SecurityContext` definition and `PodSecurityContext` definition: -* `noNewPrivileges=true` in `PodSecurityContext` means that all the containers in the pod should be run with `no-new-privileges` enabled. This should be a pod level control of `no-new-privileges` flag. -* `noNewPrivileges` in `SecurityContext` is a container level control of `no-new-privileges` flag, and can override the pod level `noNewPrivileges` setting. +Since version 0.3.0 of the OCI runtime specification, a user can specify the +`noNewPrivs` boolean flag in the configuration file. -By default, `noNewPrivileges` is `false`. +More details of the OCI implementation can be read +[here](https://github.com/opencontainers/runtime-spec/pull/290). -The change of security context API objects requires the update of corresponding Kubernetes documents, need to submit another PR to track this. +## Existing SecurityContext objects -###Changes of docker runtime +Kubernetes defines `SecurityContext` for `Container` and `PodSecurityContext` +for `PodSpec`. `SecurityContext` objects define the related security options +for Kubernetes containers, e.g. selinux options. -When parsing the new `SecurityContext` object, kubelet has to take care of `noNewPrivileges` field from security context objects. Once `noNewPrivileges` is `true`, kubelet needs to change `#getSecurityOpts` method in `docker_manager_xxx.go` to add `no-new-privileges` option to `ContainerCreateConfig.HostConfig.SecurityOpt` +To support "no new privileges" options in Kubernetes, it is proposed to make +the following changes: -###Changes of CRI runtime +## Changes of SecurityContext objects -When parsing the new `SecurityContext` object, kubelet has to take care of `noNewPrivileges` field from security context objects. Once `noNewPrivileges` is `true`, kubelet needs to change `#getContainerSecurityOpts` method in `docker_container.go` to add `no-new-privileges` option to `ContainerCreateConfig.HostConfig.SecurityOpt` +Add a new `*bool` type field named `allowPrivilegeEscalation` to the `SecurityContext` +definition. -###Changes of kubectl +By default, ie when `allowPrivilegeEscalation=nil`, we will set `no_new_privs=true` +with the following exceptions: -This is an additional proposal for kubectl. To improve kubectl user experience, we can add a new flag for kubectl command named `--security-opt`. This flag allows user to create pod with security options configured when using `kubectl run` command. For example, if user issues command like `kubectl run busybox --image=busybox --security-opt=no-new-privileges -- top`, kubernetes shall create a pod with `noNewPrivileges` enabled. +- when a container is `privileged` +- when `CAP_SYS_ADMIN` is added to a container +- when a container is not run as root, uid `0` (to prevent breaking suid + binaries) -If the proposal of kubectl changes is accepted, the patch can also be submitted as a separate PR. +The API will reject as invalid `privileged=true` and +`allowPrivilegeEscalation=false`, as well as `capAdd=CAP_SYS_ADMIN` and +`allowPrivilegeEscalation=false.` + +When `allowPrivilegeEscalation` is set to `false` it will enable `no_new_privs` +for that container. + +`allowPrivilegeEscalation` in `SecurityContext` provides container level +control of the `no_new_privs` flag and can override the default in both directions +of the `allowPrivilegeEscalation` setting. + +This requires changes to the Docker, rkt, and CRI runtime integrations so that +kubelet will add the specific `no_new_privs` option. + +## Pod Security Policy changes + +The default can be set via a new `*bool` type field named `defaultAllowPrivilegeEscalation` +in a Pod Security Policy. +This would allow users to set `defaultAllowPrivilegeEscalation=false`, overriding the +default `nil` behavior of `no_new_privs=false` for containers +whose uids are not 0. + +This would also keep the behavior of setting the security context as +`allowPrivilegeEscalation=true` +for privileged containers and those with `capAdd=CAP_SYS_ADMIN`. + +To recap, below is a table defining the default behavior at the pod security +policy level and what can be set as a default with a pod security policy. + +| allowPrivilegeEscalation setting | uid = 0 or unset | uid != 0 | privileged/CAP_SYS_ADMIN | +|----------------------------------|--------------------|--------------------|--------------------------| +| nil | no_new_privs=true | no_new_privs=false | no_new_privs=false | +| false | no_new_privs=true | no_new_privs=true | no_new_privs=false | +| true | no_new_privs=false | no_new_privs=false | no_new_privs=false | + +A new `bool` field named `allowPrivilegeEscalation` will be added to the Pod +Security Policy as well to gate whether or not a user is allowed to set the +security context to `allowPrivilegeEscalation=true`. This field will default to +false.