-
Notifications
You must be signed in to change notification settings - Fork 8
fix: move credential processing after cluster creation #161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Previously, ProcessCreds() was called before reconcilePatroniCoreCluster(), causing the operator to crash when trying to execute SQL on a non-existent database during initial bootstrap. This resulted in: - Nil pointer dereference at pkg/client/client.go:90 - "context deadline exceeded" errors during helm deployments - No PostgreSQL StatefulSets being created Now ProcessCreds() is called after the cluster is successfully created, allowing proper bootstrap of new PostgreSQL clusters. Also updated helmfile chart paths from ./charts/ to ./operator/charts/ to match the new repository structure after rebase. Fixes: Initial cluster bootstrap failure
1d1a1c4 to
6c392f8
Compare
Add comprehensive test to validate operator doesn't crash during cluster
bootstrap when credentials are changed before the database exists.
Test Scenario:
1. Starts with a running Kubernetes cluster and operator
2. Creates postgres-credentials-old secret (backup copy)
3. Patches postgres-credentials with new password
4. Forces operator reconciliation via CR annotation
5. Monitors for StatefulSet creation (proves cluster bootstrap succeeded)
6. Validates operator health (no crashes/restarts)
7. Checks operator logs for panic/nil pointer errors
This test would FAIL with the old code where ProcessCreds() was called
before reconcilePatroniCoreCluster(), causing nil pointer dereference
when trying to execute ALTER ROLE on non-existent database.
How to Run:
PGSSLMODE=disable INTERNAL_TLS_ENABLED=false \
robot -i check_operator_bootstrap tests/robot/check_installation/
Test Features:
- Idempotent: automatically cleans up postgres-credentials-old
- Robust: retries log retrieval if pod is in transitional state
- Clear output: BDD-style Given/When/Then structure with checkmarks
f4c88b0 to
bb5f799
Compare
|
Hi, how to reproduce this issue? If we do not change password before cluster reconciliation, we have to pass the whole reconcile cycle with newly set non actual postgres password from the secret. it looks dangerous and for some cases like major upgrade this probably won't work. |
|
The PR includes a test that fails without the code change, and the test succeeds with the code change. |
|
Yes, in tests with manual credential manipulation with non existing cluster, but what is the real case? How credential can be different in postgres-credentials and postgres-credentials-old during initial installation, if postgres-credentials-old created as a copy of postgres-credentials? |
In my case, I face this issue every time when trying to install pgskipper-operator to a clean k8s (OrbStack) |
Previously,
ProcessCreds()was called beforereconcilePatroniCoreCluster(), causing the operator to crash when trying to execute SQL on a non-existent database during initial bootstrap.This resulted in:
pkg/client/client.go:90Now
ProcessCreds()is called after the cluster is successfully created, allowing proper bootstrap of new PostgreSQL clusters.Fixes: Initial cluster bootstrap failure