Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 18 additions & 5 deletions src/cluster_legacy.c
Original file line number Diff line number Diff line change
Expand Up @@ -4600,13 +4600,26 @@ void clusterSendPing(clusterLink *link, int type) {
* Since we have non-voting replicas that lower the probability of an entry
* to feature our node, we set the number of entries per packet as
* 10% of the total nodes we have. */
Comment on lines 4600 to 4602
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to update the comment?

wanted = floor(dictSize(server.cluster->nodes) / 10);
if (wanted < 3) wanted = 3;
if (wanted > freshnodes) wanted = freshnodes;
int overall = server.cluster_ping_message_gossip_max_count;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a it as a percentage? cluster_ping_message_gossip_max_perc

Also we are naming it to be max but will the number of nodes ever be less than that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was going to suggest the same. The default is a percentage so it seems appropriate to configure it as a percentage.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I like this. Will be easier for folks to deal with scale in/out situations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about supporting both options?

if (!overall) {
overall = floor(dictSize(server.cluster->nodes) / 10);
if (overall < 3) overall = 3;
}

/* Include all the nodes in PFAIL state, so that failure reports are
* faster to propagate to go from PFAIL to FAIL state. */
/* Prioritize pfail nodes over other nodes.
* Healthy nodes can communicate through direct ping/pong if required and failed node
* information would be broadcasted. */
int pfail_wanted = server.cluster->stats_pfail_nodes;
if (pfail_wanted >= overall) {
pfail_wanted = overall - 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we set the pfail_wanted = overall

why are we reserving one spot in overall for wanted?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I suggested that. Will update it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we foresee any regression we don't gossip healthy nodes at all? I am wondering in scenarios where PFAIL nodes are never actually marked as FAIL or healthy.

wanted = 1;
} else {
wanted = overall - pfail_wanted;
}

if (wanted > freshnodes) {
wanted = freshnodes;
}

/* Compute the maximum estlen to allocate our buffer. We'll fix the estlen
* later according to the number of gossip sections we really were able
Expand Down
3 changes: 3 additions & 0 deletions src/config.c
Original file line number Diff line number Diff line change
Expand Up @@ -3350,13 +3350,16 @@ standardConfig static_configs[] = {
createIntConfig("rdma-port", NULL, MODIFIABLE_CONFIG, 0, 65535, server.rdma_ctx_config.port, 0, INTEGER_CONFIG, NULL, updateRdmaPort),
createIntConfig("rdma-rx-size", NULL, IMMUTABLE_CONFIG, 64 * 1024, 16 * 1024 * 1024, server.rdma_ctx_config.rx_size, 1024 * 1024, INTEGER_CONFIG, NULL, NULL),
createIntConfig("rdma-completion-vector", NULL, IMMUTABLE_CONFIG, -1, 1024, server.rdma_ctx_config.completion_vector, -1, INTEGER_CONFIG, NULL, NULL),
createIntConfig("cluster-ping-message-gossip-max-count", NULL, MODIFIABLE_CONFIG, 0, 2000, server.cluster_ping_message_gossip_max_count, 0, INTEGER_CONFIG, NULL, NULL),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the max be a function of dictSize(server.cluster->nodes)? I mean it would be good to validate that we don't oversend the number of nodes in gossip.



/* Unsigned int configs */
createUIntConfig("maxclients", NULL, MODIFIABLE_CONFIG, 1, UINT_MAX, server.maxclients, 10000, INTEGER_CONFIG, NULL, updateMaxclients),
createUIntConfig("unixsocketperm", NULL, IMMUTABLE_CONFIG, 0, 0777, server.unix_ctx_config.perm, 0, OCTAL_CONFIG, NULL, NULL),
createUIntConfig("socket-mark-id", NULL, IMMUTABLE_CONFIG, 0, UINT_MAX, server.socket_mark_id, 0, INTEGER_CONFIG, NULL, NULL),
createUIntConfig("max-new-connections-per-cycle", NULL, MODIFIABLE_CONFIG, 1, 1000, server.max_new_conns_per_cycle, 10, INTEGER_CONFIG, NULL, NULL),
createUIntConfig("max-new-tls-connections-per-cycle", NULL, MODIFIABLE_CONFIG, 1, 1000, server.max_new_tls_conns_per_cycle, 1, INTEGER_CONFIG, NULL, NULL),

#ifdef LOG_REQ_RES
createUIntConfig("client-default-resp", NULL, IMMUTABLE_CONFIG | HIDDEN_CONFIG, 2, 3, server.client_default_resp, 2, INTEGER_CONFIG, NULL, NULL),
#endif
Expand Down
1 change: 1 addition & 0 deletions src/server.h
Original file line number Diff line number Diff line change
Expand Up @@ -2182,6 +2182,7 @@ struct valkeyServer {
int cluster_port; /* Set the cluster port for a node. */
mstime_t cluster_node_timeout; /* Cluster node timeout. */
mstime_t cluster_ping_interval; /* A debug configuration for setting how often cluster nodes send ping messages. */
int cluster_ping_message_gossip_max_count; /* A configuration for setting how many peer node information to be gossiped in ping/pong messages. */
char *cluster_configfile; /* Cluster auto-generated config file name. */
struct clusterState *cluster; /* State of the cluster */
int cluster_migration_barrier; /* Cluster replicas migration barrier. */
Expand Down
6 changes: 6 additions & 0 deletions valkey.conf
Original file line number Diff line number Diff line change
Expand Up @@ -2004,6 +2004,12 @@ aof-timestamp-enabled no
# In order to setup your cluster make sure to read the documentation
# available at https://valkey.io web site.

# Cluster nodes communicate with each other via PING/PONG message type. Each message carries a gossip section i.e.
# information about peer nodes. This configuration caps the maximum number of gossip information to carry in each message.
#
# Default value is set to 0 which implies the maximum count to be gossiped is set to 10% of all node count.
# cluster-ping-message-gossip-max-count 0

########################## CLUSTER DOCKER/NAT support ########################

# In certain deployments, cluster node's address discovery fails, because
Expand Down
Loading