|
| 1 | +# Separate yurt-manager clients |
| 2 | + |
| 3 | +| title | authors | reviewers | creation-date | last-updated | status | |
| 4 | +| :-----------------------------: | --------- | --------- | ------------- | ------------ | ------ | |
| 5 | +| Separate yurt-manager clients | @luc99hen | @rambohe-ch | 2024-05-17 | | | |
| 6 | + |
| 7 | +<!-- TOC --> |
| 8 | +* [Separate yurt-manager clients](#Separate-yurt-manager-clients) |
| 9 | + * [Summary](#summary) |
| 10 | + * [Motivation](#motivation) |
| 11 | + * [Goals](#goals) |
| 12 | + * [Non-Goals/Future Work](#non-goals) |
| 13 | + * [Proposal](#proposal) |
| 14 | + * [Implementation History](#implementation-history) |
| 15 | +<!-- TOC --> |
| 16 | + |
| 17 | +## Summary |
| 18 | + |
| 19 | +Yurt-manager is an important component in cloud environment for OpenYurt which contains multiple controllers and webhooks. Currently, those controllers and webhooks share one client and one set of RBAC (yurt-manager-role/yurt-manager-role-binding/yurt-manager-sa) which grows bigger as we add more function into yurt-manager. This mechanism makes a controller has access it shouldn't has. Furthermore, sharing one client and user agent makes it difficult to find out the request is from which controller from the audit logs. |
| 20 | + |
| 21 | +This proposal aims to address this issue by separating clients for different controllers and webhooks. |
| 22 | + |
| 23 | +## Motivation |
| 24 | + |
| 25 | +1. Sharing RBAC among controller/webhooks violates the principle of least authority. |
| 26 | +2. Sharing user agent is hard to trace the request from audit logs. |
| 27 | + |
| 28 | +### Goals |
| 29 | + |
| 30 | +1. Separate RBAC for different controllers and webhooks and make each has its own identity in Kubernetes. |
| 31 | +2. Separate User agents for different controllers and webhooks. |
| 32 | +3. Restrict each controller/webhook to only the permissions it may use. |
| 33 | + |
| 34 | +### Non-Goals |
| 35 | + |
| 36 | +1. Divide Yurt-manager into multiple components. |
| 37 | + |
| 38 | +## Proposal |
| 39 | + |
| 40 | +### Design principals |
| 41 | + |
| 42 | +1. Compatible with currently used controller-runtime framework. Developers of yurt-manager will not notice the changes underhood. |
| 43 | +2. Follow the principle of least authority. One controller/webhook should not be granted permissions it will not use. |
| 44 | + |
| 45 | +### Critical Questions |
| 46 | + |
| 47 | +1. What's the granularity of division? |
| 48 | + |
| 49 | +The structure of Yurt-manager is relatively complex. Yurt-manager is made up of controllers and webhooks. Some controllers may also have several sub-controllers. Also, for RBAC division, yurt-manager itself needs some base permissions to manage the whole component. |
| 50 | + |
| 51 | +``` |
| 52 | +yurt-manager (base) |
| 53 | +- controllers |
| 54 | + - yurt-app-set |
| 55 | + - yurt-coordiator |
| 56 | + - yurt-coordinator-cert |
| 57 | + - pod-binding |
| 58 | + - ... |
| 59 | +- webhooks |
| 60 | + - yurt-app-set |
| 61 | + - ... |
| 62 | +``` |
| 63 | + |
| 64 | +In order to be compatible with the existing system, and also easy to manage permissions, we decide to divide the RBAC in this pattern |
| 65 | + |
| 66 | +``` |
| 67 | +base: used in the entire life-cycle of yurt-manager, such as controller initialization and webhook server set up |
| 68 | +yurt-app-set(controller/webhook): used for yurt-app-set controller and webhook |
| 69 | +yurt-coordinator-cert(controller): used for yurt-coordinator-cert |
| 70 | +pod-binding(controller): used for pod-binding |
| 71 | +... |
| 72 | +``` |
| 73 | + |
| 74 | +2. How to use different RBAC for different controller/webhooks? |
| 75 | + |
| 76 | +After different RBACs are prepared, we should make sure they are properly used by different controllers. We consider two feasible approaches here to achieve this goal. |
| 77 | + |
| 78 | +1) User impersonation |
| 79 | + |
| 80 | +Kubernets provides a [mechanism](https://kubernetes.io/docs/reference/access-authn-authz/authentication/#user-impersonation) that one user can act as another user through impersonation headers. These let requests manually override the user info a request authenticates as. |
| 81 | + |
| 82 | +2) Token override |
| 83 | + |
| 84 | +We can override the client config' token with the prepared ServiceAccount token which has been bound to prescribed roles or cluster roles. |
| 85 | + |
| 86 | +Considering the additional complexity of the first approach, we choose the second one to override the token directly when building the client for each controller/webhook. |
| 87 | + |
| 88 | +### Solution Introduction |
| 89 | + |
| 90 | +The whole solution consists of two steps: |
| 91 | + |
| 92 | +1. Generate RBAC |
| 93 | + |
| 94 | +First, prepare RBAC artifacts including role/clusterrole, rolebindidng/clusterrolebinding and serviceaccount for controllers and webhooks in yurt-manager. The specific steps are as following: |
| 95 | + |
| 96 | +1) Use bash scripts to search through the pkg/yurt-manager folder and find the components which need independent RBAC. |
| 97 | +2) Generate clusterrole and roles for those components one by one with `controller-gen`. |
| 98 | +3) Use bash scripts to complement the corresponding rolebinding/clusterrolebinding and serviceaccount artifacts. |
| 99 | +4) Gather all generated materials into one yaml in charts/yurt-manager/templates/yurt-manager-auto-generated.yaml with `kustomize`. |
| 100 | + |
| 101 | +Those steps above are performed automatically when you run `make manifests` command. Basically, yurt-manager developers don't need to know these implementations underhood. |
| 102 | + |
| 103 | +2. Prepare client |
| 104 | + |
| 105 | +Currently, all controllers use a client from `manager.GetClient()` which is provided by controller runtime framework directly. However the framework [doesn't provide any interface](https://github.com/kubernetes-sigs/controller-runtime/issues/2822) to provide different clients by controllers. Therefore, we have to implement a client wrapper to to construct a new client for every component based on the basic client. |
| 106 | + |
| 107 | +```go |
| 108 | +func GetClientByControllerNameOrDie(mgr manager.Manager, controllerName, namespace string) client.Client { |
| 109 | + // if controllerName is empty, return the base client of manager |
| 110 | + if controllerName == "" { |
| 111 | + return mgr.GetClient() |
| 112 | + } |
| 113 | + |
| 114 | + clientStore.lock.Lock() |
| 115 | + defer clientStore.lock.Unlock() |
| 116 | + |
| 117 | + if cli, ok := clientStore.clientsByName[controllerName]; ok { |
| 118 | + return cli |
| 119 | + } |
| 120 | + |
| 121 | + // check if controller-specific ServiceAccount exist |
| 122 | + _, err := getOrCreateServiceAccount(mgr.GetClient(), namespace, controllerName) |
| 123 | + if err != nil { |
| 124 | + return nil |
| 125 | + } |
| 126 | + |
| 127 | + // get base config |
| 128 | + baseCfg := mgr.GetConfig() |
| 129 | + |
| 130 | + // rename cfg user-agent |
| 131 | + cfg := rest.CopyConfig(baseCfg) |
| 132 | + rest.AddUserAgent(cfg, controllerName) |
| 133 | + |
| 134 | + // add controller-specific token wrapper to cfg |
| 135 | + cachedTokenSource := transport.NewCachedTokenSource(&tokenSourceImpl{ |
| 136 | + namespace: namespace, |
| 137 | + serviceAccountName: controllerName, |
| 138 | + cli: mgr.GetClient(), |
| 139 | + expirationSeconds: defaultExpirationSeconds, |
| 140 | + leewayPercent: defaultLeewayPercent, |
| 141 | + }) |
| 142 | + cfg.Wrap(transport.ResettableTokenSourceWrapTransport(cachedTokenSource)) |
| 143 | + |
| 144 | + // construct client from cfg |
| 145 | + clientOptions := client.Options{ |
| 146 | + Scheme: mgr.GetScheme(), |
| 147 | + Mapper: mgr.GetRESTMapper(), |
| 148 | + // todo: this is just a default option, we should use mgr's cache options |
| 149 | + Cache: &client.CacheOptions{ |
| 150 | + Unstructured: false, |
| 151 | + Reader: mgr.GetCache(), |
| 152 | + }, |
| 153 | + } |
| 154 | + |
| 155 | + cli, err := client.New(cfg, clientOptions) |
| 156 | + if err != nil { |
| 157 | + panic(err) |
| 158 | + } |
| 159 | + clientStore.clientsByName[controllerName] = cli |
| 160 | + |
| 161 | + return cli |
| 162 | +} |
| 163 | + |
| 164 | +var ( |
| 165 | + // defaultExpirationSeconds defines the duration of a TokenRequest in seconds. |
| 166 | + defaultExpirationSeconds = int64(3600) |
| 167 | + // defaultLeewayPercent defines the percentage of expiration left before the client trigger a token rotation. |
| 168 | + // range[0, 100] |
| 169 | + defaultLeewayPercent = 20 |
| 170 | +) |
| 171 | + |
| 172 | +// migrate from kubernetes/staging/src/k8s.io/controller-manager/pkg/clientbuilder/client_builder_dynamic.go |
| 173 | +// change client to controller-runtime client |
| 174 | +type tokenSourceImpl struct { |
| 175 | + namespace string |
| 176 | + serviceAccountName string |
| 177 | + cli client.Client |
| 178 | + expirationSeconds int64 |
| 179 | + leewayPercent int |
| 180 | +} |
| 181 | + |
| 182 | +func (ts *tokenSourceImpl) Token() (*oauth2.Token, error) { |
| 183 | + retTokenRequest := &v1authenticationapi.TokenRequest{ |
| 184 | + Spec: v1authenticationapi.TokenRequestSpec{ |
| 185 | + ExpirationSeconds: utilpointer.Int64Ptr(ts.expirationSeconds), |
| 186 | + }, |
| 187 | + } |
| 188 | + |
| 189 | + backoff := wait.Backoff{ |
| 190 | + Duration: 500 * time.Millisecond, |
| 191 | + Factor: 2, // double the timeout for every failure |
| 192 | + Steps: 4, |
| 193 | + } |
| 194 | + if err := wait.ExponentialBackoff(backoff, func() (bool, error) { |
| 195 | + sa, inErr := getOrCreateServiceAccount(ts.cli, ts.namespace, ts.serviceAccountName) |
| 196 | + if inErr != nil { |
| 197 | + klog.Warningf("get or create service account failed: %v", inErr) |
| 198 | + return false, nil |
| 199 | + } |
| 200 | + |
| 201 | + if inErr = ts.cli.SubResource("token").Create(context.Background(), sa, retTokenRequest); inErr != nil { |
| 202 | + klog.Warningf("get token failed: %v", inErr) |
| 203 | + return false, nil |
| 204 | + } |
| 205 | + |
| 206 | + return true, nil |
| 207 | + }); err != nil { |
| 208 | + return nil, fmt.Errorf("failed to get token for %s/%s: %v", ts.namespace, ts.serviceAccountName, err) |
| 209 | + } |
| 210 | + |
| 211 | + if retTokenRequest.Spec.ExpirationSeconds == nil { |
| 212 | + return nil, fmt.Errorf("nil pointer of expiration in token request") |
| 213 | + } |
| 214 | + |
| 215 | + lifetime := retTokenRequest.Status.ExpirationTimestamp.Time.Sub(time.Now()) |
| 216 | + if lifetime < time.Minute*10 { |
| 217 | + // possible clock skew issue, pin to minimum token lifetime |
| 218 | + lifetime = time.Minute * 10 |
| 219 | + } |
| 220 | + |
| 221 | + leeway := time.Duration(int64(lifetime) * int64(ts.leewayPercent) / 100) |
| 222 | + expiry := time.Now().Add(lifetime).Add(-1 * leeway) |
| 223 | + |
| 224 | + return &oauth2.Token{ |
| 225 | + AccessToken: retTokenRequest.Status.Token, |
| 226 | + TokenType: "Bearer", |
| 227 | + Expiry: expiry, |
| 228 | + }, nil |
| 229 | +} |
| 230 | + |
| 231 | +func getOrCreateServiceAccount(cli client.Client, namespace, name string) (*v1.ServiceAccount, error) { |
| 232 | + sa := &v1.ServiceAccount{ |
| 233 | + ObjectMeta: metav1.ObjectMeta{ |
| 234 | + Name: name, |
| 235 | + Namespace: namespace, |
| 236 | + }, |
| 237 | + } |
| 238 | + err := cli.Get(context.TODO(), client.ObjectKey{Namespace: namespace, Name: name}, sa) |
| 239 | + if err == nil { |
| 240 | + return sa, nil |
| 241 | + } |
| 242 | + if !apierrors.IsNotFound(err) { |
| 243 | + return nil, err |
| 244 | + } |
| 245 | + |
| 246 | + // Create the namespace if we can't verify it exists. |
| 247 | + // Tolerate errors, since we don't know whether this component has namespace creation permissions. |
| 248 | + ns := &v1.Namespace{ |
| 249 | + ObjectMeta: metav1.ObjectMeta{ |
| 250 | + Name: namespace, |
| 251 | + }, |
| 252 | + } |
| 253 | + if err := cli.Get(context.TODO(), client.ObjectKey{}, ns); apierrors.IsNotFound(err) { |
| 254 | + if err = cli.Create(context.TODO(), ns); err != nil && !apierrors.IsAlreadyExists(err) { |
| 255 | + klog.Warningf("create non-exist namespace %s failed:%v", namespace, err) |
| 256 | + } |
| 257 | + } |
| 258 | + |
| 259 | + // Create the service account |
| 260 | + err = cli.Create(context.TODO(), sa) |
| 261 | + if apierrors.IsAlreadyExists(err) { |
| 262 | + // If we're racing to init and someone else already created it, re-fetch |
| 263 | + err = cli.Get(context.TODO(), client.ObjectKey{Namespace: namespace, Name: name}, sa) |
| 264 | + return sa, err |
| 265 | + } |
| 266 | + return sa, err |
| 267 | +} |
| 268 | + |
| 269 | +``` |
| 270 | + |
| 271 | +If you are developing a new controller/webhook in yurt-manager, you should use `GetClientByControllerNameOrDie(mgr, controllerName)` instead of `mgr.GetClient()` to apply your own RBAC. |
| 272 | + |
| 273 | +## Implementation History |
| 274 | + |
| 275 | +* [ ] 05/17/2024: Draft proposal created; |
0 commit comments