Skip to content

Commit e4740ba

Browse files
authored
Proposal: separate yurt-manager clients (#2051)
Co-authored-by: luchen <[email protected]>
1 parent 9e54b53 commit e4740ba

File tree

1 file changed

+275
-0
lines changed

1 file changed

+275
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,275 @@
1+
# Separate yurt-manager clients
2+
3+
| title | authors | reviewers | creation-date | last-updated | status |
4+
| :-----------------------------: | --------- | --------- | ------------- | ------------ | ------ |
5+
| Separate yurt-manager clients | @luc99hen | @rambohe-ch | 2024-05-17 | | |
6+
7+
<!-- TOC -->
8+
* [Separate yurt-manager clients](#Separate-yurt-manager-clients)
9+
* [Summary](#summary)
10+
* [Motivation](#motivation)
11+
* [Goals](#goals)
12+
* [Non-Goals/Future Work](#non-goals)
13+
* [Proposal](#proposal)
14+
* [Implementation History](#implementation-history)
15+
<!-- TOC -->
16+
17+
## Summary
18+
19+
Yurt-manager is an important component in cloud environment for OpenYurt which contains multiple controllers and webhooks. Currently, those controllers and webhooks share one client and one set of RBAC (yurt-manager-role/yurt-manager-role-binding/yurt-manager-sa) which grows bigger as we add more function into yurt-manager. This mechanism makes a controller has access it shouldn't has. Furthermore, sharing one client and user agent makes it difficult to find out the request is from which controller from the audit logs.
20+
21+
This proposal aims to address this issue by separating clients for different controllers and webhooks.
22+
23+
## Motivation
24+
25+
1. Sharing RBAC among controller/webhooks violates the principle of least authority.
26+
2. Sharing user agent is hard to trace the request from audit logs.
27+
28+
### Goals
29+
30+
1. Separate RBAC for different controllers and webhooks and make each has its own identity in Kubernetes.
31+
2. Separate User agents for different controllers and webhooks.
32+
3. Restrict each controller/webhook to only the permissions it may use.
33+
34+
### Non-Goals
35+
36+
1. Divide Yurt-manager into multiple components.
37+
38+
## Proposal
39+
40+
### Design principals
41+
42+
1. Compatible with currently used controller-runtime framework. Developers of yurt-manager will not notice the changes underhood.
43+
2. Follow the principle of least authority. One controller/webhook should not be granted permissions it will not use.
44+
45+
### Critical Questions
46+
47+
1. What's the granularity of division?
48+
49+
The structure of Yurt-manager is relatively complex. Yurt-manager is made up of controllers and webhooks. Some controllers may also have several sub-controllers. Also, for RBAC division, yurt-manager itself needs some base permissions to manage the whole component.
50+
51+
```
52+
yurt-manager (base)
53+
- controllers
54+
- yurt-app-set
55+
- yurt-coordiator
56+
- yurt-coordinator-cert
57+
- pod-binding
58+
- ...
59+
- webhooks
60+
- yurt-app-set
61+
- ...
62+
```
63+
64+
In order to be compatible with the existing system, and also easy to manage permissions, we decide to divide the RBAC in this pattern
65+
66+
```
67+
base: used in the entire life-cycle of yurt-manager, such as controller initialization and webhook server set up
68+
yurt-app-set(controller/webhook): used for yurt-app-set controller and webhook
69+
yurt-coordinator-cert(controller): used for yurt-coordinator-cert
70+
pod-binding(controller): used for pod-binding
71+
...
72+
```
73+
74+
2. How to use different RBAC for different controller/webhooks?
75+
76+
After different RBACs are prepared, we should make sure they are properly used by different controllers. We consider two feasible approaches here to achieve this goal.
77+
78+
1) User impersonation
79+
80+
Kubernets provides a [mechanism](https://kubernetes.io/docs/reference/access-authn-authz/authentication/#user-impersonation) that one user can act as another user through impersonation headers. These let requests manually override the user info a request authenticates as.
81+
82+
2) Token override
83+
84+
We can override the client config' token with the prepared ServiceAccount token which has been bound to prescribed roles or cluster roles.
85+
86+
Considering the additional complexity of the first approach, we choose the second one to override the token directly when building the client for each controller/webhook.
87+
88+
### Solution Introduction
89+
90+
The whole solution consists of two steps:
91+
92+
1. Generate RBAC
93+
94+
First, prepare RBAC artifacts including role/clusterrole, rolebindidng/clusterrolebinding and serviceaccount for controllers and webhooks in yurt-manager. The specific steps are as following:
95+
96+
1) Use bash scripts to search through the pkg/yurt-manager folder and find the components which need independent RBAC.
97+
2) Generate clusterrole and roles for those components one by one with `controller-gen`.
98+
3) Use bash scripts to complement the corresponding rolebinding/clusterrolebinding and serviceaccount artifacts.
99+
4) Gather all generated materials into one yaml in charts/yurt-manager/templates/yurt-manager-auto-generated.yaml with `kustomize`.
100+
101+
Those steps above are performed automatically when you run `make manifests` command. Basically, yurt-manager developers don't need to know these implementations underhood.
102+
103+
2. Prepare client
104+
105+
Currently, all controllers use a client from `manager.GetClient()` which is provided by controller runtime framework directly. However the framework [doesn't provide any interface](https://github.com/kubernetes-sigs/controller-runtime/issues/2822) to provide different clients by controllers. Therefore, we have to implement a client wrapper to to construct a new client for every component based on the basic client.
106+
107+
```go
108+
func GetClientByControllerNameOrDie(mgr manager.Manager, controllerName, namespace string) client.Client {
109+
// if controllerName is empty, return the base client of manager
110+
if controllerName == "" {
111+
return mgr.GetClient()
112+
}
113+
114+
clientStore.lock.Lock()
115+
defer clientStore.lock.Unlock()
116+
117+
if cli, ok := clientStore.clientsByName[controllerName]; ok {
118+
return cli
119+
}
120+
121+
// check if controller-specific ServiceAccount exist
122+
_, err := getOrCreateServiceAccount(mgr.GetClient(), namespace, controllerName)
123+
if err != nil {
124+
return nil
125+
}
126+
127+
// get base config
128+
baseCfg := mgr.GetConfig()
129+
130+
// rename cfg user-agent
131+
cfg := rest.CopyConfig(baseCfg)
132+
rest.AddUserAgent(cfg, controllerName)
133+
134+
// add controller-specific token wrapper to cfg
135+
cachedTokenSource := transport.NewCachedTokenSource(&tokenSourceImpl{
136+
namespace: namespace,
137+
serviceAccountName: controllerName,
138+
cli: mgr.GetClient(),
139+
expirationSeconds: defaultExpirationSeconds,
140+
leewayPercent: defaultLeewayPercent,
141+
})
142+
cfg.Wrap(transport.ResettableTokenSourceWrapTransport(cachedTokenSource))
143+
144+
// construct client from cfg
145+
clientOptions := client.Options{
146+
Scheme: mgr.GetScheme(),
147+
Mapper: mgr.GetRESTMapper(),
148+
// todo: this is just a default option, we should use mgr's cache options
149+
Cache: &client.CacheOptions{
150+
Unstructured: false,
151+
Reader: mgr.GetCache(),
152+
},
153+
}
154+
155+
cli, err := client.New(cfg, clientOptions)
156+
if err != nil {
157+
panic(err)
158+
}
159+
clientStore.clientsByName[controllerName] = cli
160+
161+
return cli
162+
}
163+
164+
var (
165+
// defaultExpirationSeconds defines the duration of a TokenRequest in seconds.
166+
defaultExpirationSeconds = int64(3600)
167+
// defaultLeewayPercent defines the percentage of expiration left before the client trigger a token rotation.
168+
// range[0, 100]
169+
defaultLeewayPercent = 20
170+
)
171+
172+
// migrate from kubernetes/staging/src/k8s.io/controller-manager/pkg/clientbuilder/client_builder_dynamic.go
173+
// change client to controller-runtime client
174+
type tokenSourceImpl struct {
175+
namespace string
176+
serviceAccountName string
177+
cli client.Client
178+
expirationSeconds int64
179+
leewayPercent int
180+
}
181+
182+
func (ts *tokenSourceImpl) Token() (*oauth2.Token, error) {
183+
retTokenRequest := &v1authenticationapi.TokenRequest{
184+
Spec: v1authenticationapi.TokenRequestSpec{
185+
ExpirationSeconds: utilpointer.Int64Ptr(ts.expirationSeconds),
186+
},
187+
}
188+
189+
backoff := wait.Backoff{
190+
Duration: 500 * time.Millisecond,
191+
Factor: 2, // double the timeout for every failure
192+
Steps: 4,
193+
}
194+
if err := wait.ExponentialBackoff(backoff, func() (bool, error) {
195+
sa, inErr := getOrCreateServiceAccount(ts.cli, ts.namespace, ts.serviceAccountName)
196+
if inErr != nil {
197+
klog.Warningf("get or create service account failed: %v", inErr)
198+
return false, nil
199+
}
200+
201+
if inErr = ts.cli.SubResource("token").Create(context.Background(), sa, retTokenRequest); inErr != nil {
202+
klog.Warningf("get token failed: %v", inErr)
203+
return false, nil
204+
}
205+
206+
return true, nil
207+
}); err != nil {
208+
return nil, fmt.Errorf("failed to get token for %s/%s: %v", ts.namespace, ts.serviceAccountName, err)
209+
}
210+
211+
if retTokenRequest.Spec.ExpirationSeconds == nil {
212+
return nil, fmt.Errorf("nil pointer of expiration in token request")
213+
}
214+
215+
lifetime := retTokenRequest.Status.ExpirationTimestamp.Time.Sub(time.Now())
216+
if lifetime < time.Minute*10 {
217+
// possible clock skew issue, pin to minimum token lifetime
218+
lifetime = time.Minute * 10
219+
}
220+
221+
leeway := time.Duration(int64(lifetime) * int64(ts.leewayPercent) / 100)
222+
expiry := time.Now().Add(lifetime).Add(-1 * leeway)
223+
224+
return &oauth2.Token{
225+
AccessToken: retTokenRequest.Status.Token,
226+
TokenType: "Bearer",
227+
Expiry: expiry,
228+
}, nil
229+
}
230+
231+
func getOrCreateServiceAccount(cli client.Client, namespace, name string) (*v1.ServiceAccount, error) {
232+
sa := &v1.ServiceAccount{
233+
ObjectMeta: metav1.ObjectMeta{
234+
Name: name,
235+
Namespace: namespace,
236+
},
237+
}
238+
err := cli.Get(context.TODO(), client.ObjectKey{Namespace: namespace, Name: name}, sa)
239+
if err == nil {
240+
return sa, nil
241+
}
242+
if !apierrors.IsNotFound(err) {
243+
return nil, err
244+
}
245+
246+
// Create the namespace if we can't verify it exists.
247+
// Tolerate errors, since we don't know whether this component has namespace creation permissions.
248+
ns := &v1.Namespace{
249+
ObjectMeta: metav1.ObjectMeta{
250+
Name: namespace,
251+
},
252+
}
253+
if err := cli.Get(context.TODO(), client.ObjectKey{}, ns); apierrors.IsNotFound(err) {
254+
if err = cli.Create(context.TODO(), ns); err != nil && !apierrors.IsAlreadyExists(err) {
255+
klog.Warningf("create non-exist namespace %s failed:%v", namespace, err)
256+
}
257+
}
258+
259+
// Create the service account
260+
err = cli.Create(context.TODO(), sa)
261+
if apierrors.IsAlreadyExists(err) {
262+
// If we're racing to init and someone else already created it, re-fetch
263+
err = cli.Get(context.TODO(), client.ObjectKey{Namespace: namespace, Name: name}, sa)
264+
return sa, err
265+
}
266+
return sa, err
267+
}
268+
269+
```
270+
271+
If you are developing a new controller/webhook in yurt-manager, you should use `GetClientByControllerNameOrDie(mgr, controllerName)` instead of `mgr.GetClient()` to apply your own RBAC.
272+
273+
## Implementation History
274+
275+
* [ ] 05/17/2024: Draft proposal created;

0 commit comments

Comments
 (0)