Skip to content

Conversation

@vlsi
Copy link
Contributor

@vlsi vlsi commented Dec 23, 2025

Previously, ProcessCreds() was called before reconcilePatroniCoreCluster(), causing the operator to crash when trying to execute SQL on a non-existent database during initial bootstrap.

This resulted in:

  • Nil pointer dereference at pkg/client/client.go:90
  • "context deadline exceeded" errors during helm deployments
  • No PostgreSQL StatefulSets being created

Now ProcessCreds() is called after the cluster is successfully created, allowing proper bootstrap of new PostgreSQL clusters.

Fixes: Initial cluster bootstrap failure

Previously, ProcessCreds() was called before reconcilePatroniCoreCluster(),
causing the operator to crash when trying to execute SQL on a non-existent
database during initial bootstrap.

This resulted in:
- Nil pointer dereference at pkg/client/client.go:90
- "context deadline exceeded" errors during helm deployments
- No PostgreSQL StatefulSets being created

Now ProcessCreds() is called after the cluster is successfully created,
allowing proper bootstrap of new PostgreSQL clusters.

Also updated helmfile chart paths from ./charts/ to ./operator/charts/
to match the new repository structure after rebase.

Fixes: Initial cluster bootstrap failure
@vlsi vlsi force-pushed the cred_processing branch 2 times, most recently from 1d1a1c4 to 6c392f8 Compare December 24, 2025 10:58
Add comprehensive test to validate operator doesn't crash during cluster
bootstrap when credentials are changed before the database exists.

Test Scenario:
1. Starts with a running Kubernetes cluster and operator
2. Creates postgres-credentials-old secret (backup copy)
3. Patches postgres-credentials with new password
4. Forces operator reconciliation via CR annotation
5. Monitors for StatefulSet creation (proves cluster bootstrap succeeded)
6. Validates operator health (no crashes/restarts)
7. Checks operator logs for panic/nil pointer errors

This test would FAIL with the old code where ProcessCreds() was called
before reconcilePatroniCoreCluster(), causing nil pointer dereference
when trying to execute ALTER ROLE on non-existent database.

How to Run:
  PGSSLMODE=disable INTERNAL_TLS_ENABLED=false \
    robot -i check_operator_bootstrap tests/robot/check_installation/

Test Features:
- Idempotent: automatically cleans up postgres-credentials-old
- Robust: retries log retrieval if pod is in transitional state
- Clear output: BDD-style Given/When/Then structure with checkmarks
@vlsi vlsi force-pushed the cred_processing branch 3 times, most recently from f4c88b0 to bb5f799 Compare December 24, 2025 13:14
@Tvion
Copy link
Collaborator

Tvion commented Jan 8, 2026

Hi, how to reproduce this issue?
Query execution part of credential change logic should not be triggered if password wasn't changed.

https://github.com/Netcracker/qubership-credential-manager/blob/07d3af9e48271f9487b389e2dae2bf924bceca1d/pkg/manager/manager.go#L98

If we do not change password before cluster reconciliation, we have to pass the whole reconcile cycle with newly set non actual postgres password from the secret. it looks dangerous and for some cases like major upgrade this probably won't work.

@vlsi
Copy link
Contributor Author

vlsi commented Jan 8, 2026

The PR includes a test that fails without the code change, and the test succeeds with the code change.

@Tvion
Copy link
Collaborator

Tvion commented Jan 12, 2026

Yes, in tests with manual credential manipulation with non existing cluster, but what is the real case?

How credential can be different in postgres-credentials and postgres-credentials-old during initial installation, if postgres-credentials-old created as a copy of postgres-credentials?

@vlsi
Copy link
Contributor Author

vlsi commented Jan 12, 2026

but what is the real case?

In my case, I face this issue every time when trying to install pgskipper-operator to a clean k8s (OrbStack)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants