Add Spring Boot sample application for testing failover scenarios #109

vlsi · 2025-10-09T18:08:28Z

The PR adds helmfile configuration which deploys pgskipper and a sample application to verify connectivity and failover behavior.

Here's what I get with macOS + OrbStack:

$ USE_LOCAL_IMAGES=true helmfile -e orbstack sync
...
hook[postsync] logs | Waiting for application pods to be ready...
hook[postsync] logs | pod/failover-test-postgresql-failover-test-7c8589c549-6pb45 condition met
hook[postsync] logs | pod/failover-test-postgresql-failover-test-7c8589c549-qhtjw condition met
hook[postsync] logs | Application is ready!
hook[postsync] logs |
hook[postsync] logs | =====================================
hook[postsync] logs | Deployment Complete!
hook[postsync] logs | =====================================
hook[postsync] logs |
hook[postsync] logs | PostgreSQL Cluster Status:
hook[postsync] logs |
hook[postsync] logs | Application Status:
hook[postsync] logs | NAME                                                      READY   STATUS    RESTARTS   AGE
hook[postsync] logs | failover-test-postgresql-failover-test-7c8589c549-6pb45   1/1     Running   0          34s
hook[postsync] logs | failover-test-postgresql-failover-test-7c8589c549-qhtjw   1/1     Running   0          34s
hook[postsync] logs |
hook[postsync] logs | Services:
hook[postsync] logs | NAME                   TYPE        CLUSTER-IP        EXTERNAL-IP   PORT(S)                      AGE
hook[postsync] logs | monitoring-collector   ClusterIP   192.168.194.171   <none>        8000/TCP,9273/TCP            76s
hook[postsync] logs | patroni-headless       ClusterIP   None              <none>        5432/TCP,8008/TCP            61s
hook[postsync] logs | pg-patroni             ClusterIP   192.168.194.203   <none>        5432/TCP,8008/TCP,22/TCP     61s
hook[postsync] logs | pg-patroni-api         ClusterIP   192.168.194.234   <none>        5432/TCP,8008/TCP,22/TCP     31h
hook[postsync] logs | pg-patroni-ro          ClusterIP   192.168.194.178   <none>        5432/TCP,8008/TCP,22/TCP     31h
hook[postsync] logs | postgres-operator      ClusterIP   192.168.194.145   <none>        8383/TCP,8080/TCP,8443/TCP   31h
hook[postsync] logs |
hook[postsync] logs | Next steps:
hook[postsync] logs | 1. Check application logs: kubectl logs -f -n default -l app.kubernetes.io/name=postgresql-failover-test
hook[postsync] logs | 2. Test failover: ./scripts/trigger-failover.sh
hook[postsync] logs | 3. Monitor reconnection: ./scripts/test-reconnection.sh

Previously, ProcessCreds() was called before reconcilePatroniCoreCluster(), causing the operator to crash when trying to execute SQL on a non-existent database during initial bootstrap. This resulted in: - Nil pointer dereference at pkg/client/client.go:90 - "context deadline exceeded" errors during helm deployments - No PostgreSQL StatefulSets being created Now ProcessCreds() is called after the cluster is successfully created, allowing proper bootstrap of new PostgreSQL clusters. Also updated helmfile chart paths from ./charts/ to ./operator/charts/ to match the new repository structure after rebase. Fixes: Initial cluster bootstrap failure

Add comprehensive test to validate operator doesn't crash during cluster bootstrap when credentials are changed before the database exists. Test Scenario: 1. Starts with a running Kubernetes cluster and operator 2. Creates postgres-credentials-old secret (backup copy) 3. Patches postgres-credentials with new password 4. Forces operator reconciliation via CR annotation 5. Monitors for StatefulSet creation (proves cluster bootstrap succeeded) 6. Validates operator health (no crashes/restarts) 7. Checks operator logs for panic/nil pointer errors This test would FAIL with the old code where ProcessCreds() was called before reconcilePatroniCoreCluster(), causing nil pointer dereference when trying to execute ALTER ROLE on non-existent database. How to Run: PGSSLMODE=disable INTERNAL_TLS_ENABLED=false \ robot -i check_operator_bootstrap tests/robot/check_installation/ Test Features: - Idempotent: automatically cleans up postgres-credentials-old - Robust: retries log retrieval if pod is in transitional state - Clear output: BDD-style Given/When/Then structure with checkmarks

Replace LISTEN_ADDR (pod IP) with POD_DNS_NAME (DNS FQDN) for Patroni REST API and PostgreSQL connect addresses to enable stable addressing across pod restarts. Changes: - Add POD_NAME, HEADLESS_SERVICE, and POD_DNS_NAME environment variables to Patroni StatefulSet pods - Create patroni-headless Service for DNS-based pod discovery - Update patroni.config.yaml to use ${POD_DNS_NAME} for pod_ip, connect_address (PostgreSQL), and connect_address (REST API) - Register patroni-headless service creation in reconciler Reasons: - Pod IPs are ephemeral and change on restarts, causing connection issues - DNS names (pod-name.service.namespace.svc.cluster.local) are stable - Improves reliability of Patroni DCS registration and cluster communication - Aligns with Kubernetes best practices for StatefulSet networking 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

…arse Previously, a generic panic was logged, so it was hard to figure out the cause

vlsi force-pushed the sb_failover branch from 26a99fc to 2a9415f Compare October 9, 2025 18:13

mrMigles added the enhancement New feature or request label Oct 13, 2025

vlsi and others added 9 commits December 23, 2025 10:56

fix: log explicit messages when bootstrap.dcs.standby_cluster can't p…

db7875a

…arse Previously, a generic panic was logged, so it was hard to figure out the cause

test: add example with spring boot application failover

66e320a

fix: factor helmfile into several files

753d252

fix: helmfile

13d894a

fix: helmfile

2efc1a8

document helmfile destroy

e6db39a

vlsi force-pushed the sb_failover branch from 2a9415f to e6db39a Compare December 25, 2025 13:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Spring Boot sample application for testing failover scenarios #109

Add Spring Boot sample application for testing failover scenarios #109

Uh oh!

vlsi commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add Spring Boot sample application for testing failover scenarios #109

Are you sure you want to change the base?

Add Spring Boot sample application for testing failover scenarios #109

Uh oh!

Conversation

vlsi commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants