Skip to content

Conversation

@qjsoq
Copy link

@qjsoq qjsoq commented Oct 24, 2025

This pull request introduces a new, comprehensive Helm chart named valkey-cluster for deploying a clustered Valkey instance on Kubernetes.
This chart is designed to handle the complexities of a clustered setup by leveraging a StatefulSet and an intelligent initialization script.

What This PR Does:

  • StatefulSet Deployment: Ensures stable network identifiers and ordered deployment, which is critical for a clustered database.
  • Introduces Dynamic Per-Pod PVCs: Implements volumeClaimTemplates to automatically provision a unique PersistentVolumeClaim for each cluster node, ensuring data persistence and isolation.
  • Automated Cluster Initialization: A new init-cluster.sh script (managed via cluster_config.yaml) is introduced. This script handles complex cluster bootstrapping, node discovery, and dynamic joining logic for new or replacement pods.
  • Graceful Master Failover: Adds a default preStop lifecycle hook. This critical feature ensures that when a master pod is scheduled for termination (e.g., during a rolling update), it first triggers a controlled failover to one of its slaves before shutting down, minimizing downtime.
  • Adds Headless Service: Creates the necessary headless service for stable network discovery between pods within the StatefulSet.
  • Exposes Metrics Exporter: Adds a dedicated Kubernetes Service for the Prometheus metrics exporter sidecar. This makes cluster metrics easily discoverable and scrape-able for monitoring systems.
  • Adds a PodDisruptionBudget to protect the cluster during voluntary disruptions.

How the Auto-Clustering Works
The core of the cluster logic resides in the init-cluster.sh script, which runs as a background process in the main container:

Peer Discovery: Each pod uses the Kubernetes headless service to discover its peers.

Joining an Existing Cluster: If a new pod starts and discovers an already healthy cluster, it uses the CLUSTER MEET command to join. It then intelligently finds a master with a deficit of replicas and assigns itself as a replica using CLUSTER REPLICATE.

Initial Cluster Creation: If no cluster is found, the first pod (-0) takes on the role of the initiator. It waits for all other pods to become ready and then bootstraps the entire cluster using the valkey-cli --cluster create command with the appropriate number of replicas.

Resiliency: The script can handle pods restarting by attempting to re-join the cluster and forget its old failed entry if necessary.

Here are some screenshots demonstrating the chart in action after a successful deployment.
For deployment these settings were set:

replicaCount: 6
metrics:
  enabled: true
dataStorage:
  enabled: true
  requestedSize: "8Gi"
  1. Cluster Nodes Output
image

This shows the successful formation of a 6-node cluster (3 masters, 3 replicas).

  1. Running Pods and PVCs
image image

This confirms that the StatefulSet has successfully created all pods and their corresponding PersistentVolumeClaims.

  1. Services
image

@qjsoq qjsoq changed the title Cluster enable branch Add cluster mode support Oct 24, 2025
@mk-raven mk-raven requested review from mk-raven and sgissi October 24, 2025 14:49
qjsoq added 14 commits October 24, 2025 17:49
Signed-off-by: Dmytro Artamonov <[email protected]>
Signed-off-by: Dmytro Artamonov <[email protected]>
Signed-off-by: Dmytro Artamonov <[email protected]>
Signed-off-by: Dmytro Artamonov <[email protected]>
Signed-off-by: Dmytro Artamonov <[email protected]>
Signed-off-by: Dmytro Artamonov <[email protected]>
Signed-off-by: Dmytro Artamonov <[email protected]>
Signed-off-by: Dmytro Artamonov <[email protected]>
Signed-off-by: Dmytro Artamonov <[email protected]>
Signed-off-by: Dmytro Artamonov <[email protected]>
Signed-off-by: Dmytro Artamonov <[email protected]>
Signed-off-by: Dmytro Artamonov <[email protected]>
Signed-off-by: Dmytro Artamonov <[email protected]>
Signed-off-by: Dmytro Artamonov <[email protected]>
@mk-raven mk-raven added the enhancement New feature or request label Oct 24, 2025
@qjsoq qjsoq force-pushed the cluster-enable-branch branch from 4691cd5 to e5d63e6 Compare October 24, 2025 14:49
@mk-raven mk-raven self-assigned this Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants