Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EVEREST-496 Multi namespace support #169

Merged
merged 42 commits into from
Nov 23, 2023
Merged

EVEREST-496 Multi namespace support #169

merged 42 commits into from
Nov 23, 2023

Conversation

gen1us2k
Copy link

@gen1us2k gen1us2k commented Nov 16, 2023

CHANGE DESCRIPTION

Problem:
EVEREST-496

That's a small PoC for cluster-wide operators that stores backup storage (the same applies for monitoring configs) in the percona-everest and replicates secrets across namespaces that it manages.

Pros:

  1. BE needs to manage only one BackupStorage or MonitoringConfig with the referenced secret in only one namespace
  2. Operator handles the state of the secret and updates it across all namespaces where it was created

Cons:

  1. Increased blast radius. When (not if, but when) we will release a broken version of the operator DBaaS features will be broken for all database clusters across all namespaces. Having it in the namespace mode locks it only for the one namespace and this approach is recommended by Percona. Read more https://www.percona.com/blog/percona-operators-deployment-explained-delving-into-cluster-wide-vs-namespace-scoped/
  2. Bloated service account permissions to run everest operator in the cluster-wide mode. Instead of having one SA per namespace with the least privileges. If percona-everest namespace is compromised, the attacker can get access to all namespaces that manage everest-operator
  3. Additional load to the QA team because they need to test two different modes of everest operator and this is an additional step for their workflow
  4. The configuration becomes more fragile because CLI needs to set env var for percona-everest namespace for two components: BE and operator
  5. During the update of a secret and when the operator fails to reconcile it in any namespace the end user will stay unnotified without any messaging from the everest side
  6. There is no separation of concerns for the sensitive data because everything is stored in the one namespace yet I believe we need to store it in different namespaces to follow security best practices

Yet we have an additional request for the cluster-wide operator to use admission webhooks to move validation logic from the BE side to the operator, however, the scope of change is huge because

  1. Would be best to implement multi-namespace mode for the operator without additional logic, ensure that everything works as expected
  2. Ensure that everest operator can work with the different versions of the upstream operators
  3. Implement reconcile loops for monitoring configs, and backup storages
  4. Implement admission webhoooks or any other stuff

CHECKLIST

Jira

  • Is the Jira ticket created and referenced properly?

Tests

  • Is an Integration test/test case added for the new feature/change?
  • Are unit tests added where appropriate?

api/v1alpha1/backupstorage_types.go Outdated Show resolved Hide resolved
api/v1alpha1/backupstorage_types.go Outdated Show resolved Hide resolved
api/v1alpha1/backupstorage_types.go Show resolved Hide resolved
config/manager/manager.yaml Outdated Show resolved Hide resolved
controllers/databasecluster_controller.go Outdated Show resolved Hide resolved
controllers/databasecluster_controller.go Outdated Show resolved Hide resolved
controllers/databasecluster_controller.go Show resolved Hide resolved
main.go Outdated Show resolved Hide resolved
Copy link
Contributor

@oksana-grishchenko oksana-grishchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general the idea with targetNamespaces for the allowed list and status.namespaces for the actual list looks good, however to keep the lists consistent we would need to handle the deletion of

  • the targetNamespaces
  • the actual namespaces in the cluster

controllers/databasecluster_controller.go Outdated Show resolved Hide resolved
api/v1alpha1/backupstorage_types.go Outdated Show resolved Hide resolved
api/v1alpha1/backupstorage_types.go Show resolved Hide resolved
api/v1alpha1/backupstorage_types.go Outdated Show resolved Hide resolved
@gen1us2k gen1us2k changed the title EVEREST-496 Playing with reconcilers EVEREST-496 Multi namespace support Nov 20, 2023
Copy link
Contributor

@oksana-grishchenko oksana-grishchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation itself looks good, my concerns are resolved. Approving the PR to unblock it in case the massive "Cons" section is considered by the product as the limitations we can live with.

api/v1alpha1/backupstorage_types.go Show resolved Hide resolved
Copy link
Collaborator

@recharte recharte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the PR looks like it is functionally correct however, I think the followed approach is overly complex.

I argue that we don't need the BackupStorage controller at all. We already have an index and watcher configured in the DBC controller that runs the reconciliation loop every time a secret referenced by a BS triggers an event.
Therefore, we can use this same mechanism to create or update the end secret without needing to keep track of any namespace. In a sense we can think of this secret as being linked to a specific DBC than to the generic BS.

Moreover, we already use this approach for PG because the secret format needs to be different than the one specified in the BS which is the same for PXC/PSMDB. See here and here

@gen1us2k
Copy link
Author

I argue that we don't need the BackupStorage controller at all. We already have an index and watcher configured in the DBC controller that runs the reconciliation loop every time a secret referenced by a BS triggers an event.
Therefore, we can use this same mechanism to create or update the end secret without needing to keep track of any namespace. In a sense we can think of this secret as being linked to a specific DBC than to the generic BS.

Moreover, we already use this approach for PG because the secret format needs to be different than the one specified in the BS which is the same for PXC/PSMDB. See here and here

Well, this was my first approach to solving the issue. This works fine for the backupStorages or monitoringConfigs in the single namespace scenario.

Since the idea is to create backupstorages or monitoring configs in the default namespace (percona-everest) the existence of the objects is only in the default namespace. However, databasecluster controller replicates secrets across namespaces (since they are namespaced) there's no way (at least I know) to bind that secret with the backupStorage or monitoringConfig. Since the logic for backup storage and monitoring config is the same I'll describe the flow based on backup storage

Let's take a look at the backup storage In the default namespace. To ensure that the secret will be deleted we can use ownerreferences here and we don't quite need a controller here. The backend or some other controller can create it for you yet, once the backupstorage is deleted, the secret will be deleted as well.

Since secrets are namespaced to enable the multi namespace support in the cluster-wide mode we need to replicate them to the targetNamespace during the reconciliation of database clusters(because this is the only place when we need to ensure that the secret exists). We can't use ownerreferences here because they are not designed for that and we need to come up with a way to store the information where the secret was created. Here are the labels to link the secret in the different namespace with the existing backup storage.

After testing this using indexers and watchers, when the user tries to delete the secret from a different namespace the reconciliation loop does not start because there's no event generated/passed to the operator after the deletion and it ends in the inconsistent state.

Hence, we need to have an additional controller to ensure that after backup storage deletion, all secrets across all namespaces will be created and here we have finalizers as an additional handle to block the deletion. We need to delete all secrets across all namespaces except the default one (because we use owner references here and the deletion might block here because of cross-linking between entities)

Since the backupstorage controller is namespaced (because we create backup storages only in the default namespace) we need to have a way to watch for secrets and run reconciliation of the backup storage entity on the secret change. For the default secret in the default namespace we have Owns for that and this works fine. For all other namespaces, we need to listen to secrets and run a reconcile loop only for labeled secrets.

I hope that it explains the solution after a couple of days of trying to implement as simply as I can.

@gen1us2k gen1us2k marked this pull request as ready for review November 22, 2023 16:15
@gen1us2k gen1us2k requested a review from a user November 22, 2023 16:15
Copy link
Collaborator

@recharte recharte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is such an important change that we can't merge it without proper tests for it.

Copy link
Collaborator

@recharte recharte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While reviewing I found a couple of tiny code style improvements, these are just some nitpicks so feel free to ignore them.

controllers/databasecluster_controller.go Outdated Show resolved Hide resolved
controllers/databasecluster_controller.go Outdated Show resolved Hide resolved
controllers/databasecluster_controller.go Outdated Show resolved Hide resolved
controllers/databasecluster_controller.go Outdated Show resolved Hide resolved
@gen1us2k gen1us2k requested a review from recharte November 23, 2023 13:11
@gen1us2k gen1us2k enabled auto-merge (squash) November 23, 2023 14:34
Copy link
Collaborator

@recharte recharte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding the tests 🤗.
I think that the only important case we're missing would be to validate the TargetNamespaces feature works as intended but that's part of the error cases which we aren't testing anywhere so I can't say this is mandatory for now.

@gen1us2k gen1us2k merged commit 6e8aea6 into main Nov 23, 2023
3 of 4 checks passed
@gen1us2k gen1us2k deleted the EVEREST-496 branch November 23, 2023 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants