Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: capping mechanism for relay and service connections #3184

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

darshankabariya
Copy link
Contributor

This PR prevents nodes from becoming isolated from the network by implementing a capping mechanism for service and relay peers. This ensures there is always room available for relay peers at any given time.

close #3163

Copy link

github-actions bot commented Nov 28, 2024

You can find the image built from this PR at

quay.io/wakuorg/nwaku-pr:3184

Built from a379b02

@darshankabariya darshankabariya changed the title chore: Capping mechanism for relay and service Connections chore: capping mechanism for relay and service connections Nov 28, 2024
@darshankabariya darshankabariya marked this pull request as draft December 3, 2024 10:45
@darshankabariya darshankabariya marked this pull request as ready for review December 9, 2024 15:00
Copy link
Contributor

@SionoiS SionoiS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

waku/node/peer_manager/peer_manager.nim Outdated Show resolved Hide resolved
waku/node/peer_manager/peer_manager.nim Show resolved Hide resolved
Copy link
Contributor

@gabrielmer gabrielmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks so much!

Added just a couple questions/comments

@@ -993,6 +999,24 @@ proc new*(
# Leave by default 20% of connections for service peers
maxRelayPeersValue = maxConnections - (maxConnections div 5)

var maxServicePeersValue = 0
var d_low = 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't hardcode this and take it directly from WakuRelay

# relay has higher priority then service peers
else:
maxServicePeersValue =
max(maxConnections - maxRelayPeersValue, maxConnections div 5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the logic of this line? Why the maxConnections div 5?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My logic is to reserve 20% for service peers if the user does not provide maxService, as we discuss in nwaku pm we need to change whole logic

Copy link
Collaborator

@Ivansete-status Ivansete-status left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR doesn't compile snif snif :s
Re-thinking about this, I suggest doing the following:
We need to keep maxConnections config param because this needs to be passed to the nim-libp2p-Switch object.
On the other hand, I would only add an additional parameter, named protoConnWeights , a.k.a. proto-conn-weights, with the following logic:

If proto-conn-weights is not passed, then we apply the internal limits of 70% for relay, and the other 30% is evenly distributed for other protocols.

If proto-conn-weights=relay:80;services:20, we apply the internal limits of 80% for relay, and the other 20% is evenly distributed for other protocols.

We need to make sure the relay limit is bigger than DHigh ( we can create a
public Constant in waku_relay/protocol.nim )

wdyt @NagyZoltanPeter , @gabrielmer , @darshankabariya ?

@@ -993,6 +995,20 @@ proc new*(
# Leave by default 20% of connections for service peers
maxRelayPeersValue = maxConnections - (maxConnections div 5)

var maxServicePeersValue = 0
var d_low = 4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to use the d_low value extracted from waku_relay. Perhaps defining a const in waku_relay and use that value

@darshankabariya darshankabariya removed the request for review from NagyZoltanPeter December 12, 2024 21:37
@darshankabariya darshankabariya marked this pull request as draft December 12, 2024 21:37
@darshankabariya
Copy link
Contributor Author

The PR doesn't compile snif snif

My bad, I pushed the code without a local build. I’ll update this PR until we finalize it.

@darshankabariya
Copy link
Contributor Author

darshankabariya commented Dec 12, 2024

The PR doesn't compile snif snif :s Re-thinking about this, I suggest doing the following: We need to keep maxConnections config param because this needs to be passed to the nim-libp2p-Switch object. On the other hand, I would only add an additional parameter, named protoConnWeights , a.k.a. proto-conn-weights, with the following logic:

If proto-conn-weights is not passed, then we apply the internal limits of 70% for relay, and the other 30% is evenly distributed for other protocols.

If proto-conn-weights=relay:80;services:20, we apply the internal limits of 80% for relay, and the other 20% is evenly distributed for other protocols.

We need to make sure the relay limit is bigger than DHigh ( we can create a public Constant in waku_relay/protocol.nim )

wdyt @NagyZoltanPeter , @gabrielmer , @darshankabariya ?

At the moment, we have parameters like maxConnections, maxRelayPeers, maxServicePeers, and another one called protoConnWeight. This can make things more confusing, as users need to ensure that maxConnections is never greater than the sum of maxRelayPeers and maxServicePeers. Additionally, aligning proto-conn-weights with maxRelayPeers and maxServicePeers adds another layer of complexity.

Instead, I believe users generally don’t think in terms of absolute numbers for services and relays. From a psychological perspective, they might find it more intuitive to think in percentages—e.g., 80% for services, 20% for relays.

My suggestion is to simplify this structure. We should keep maxConnections bcz ( represent hardware capacity ? right ) , which users are familiar with, and deprecate maxRelayPeers and maxServicePeers. Instead, we can ask users to specify the percentage they want to allocate for relays and services. This feels more straightforward and user-friendly. maxConnection, relayWeight and serviceWeight will be deadly combination. the user to easily convey their priorities.

If users don’t provide this percentage, we could default to a ratio, as Ivan suggested. I believe this would simplify the experience while maintaining flexibility.

@gabrielmer @NagyZoltanPeter @Ivansete-status @SionoiS

@SionoiS
Copy link
Contributor

SionoiS commented Dec 13, 2024

I like your idea! IMO maxConnection and relayToServiceRatio (float) would be simpler.

I feel we still need hardcoded minimum still so that you can't, by mistake brick your node.

@gabrielmer
Copy link
Contributor

Same opinion here, having 2 params, a maximum and a ratio to know how many to allocate for each should make things simple yet customizable :)

@darshankabariya
Copy link
Contributor Author

darshankabariya commented Dec 14, 2024

I like your idea! IMO maxConnection and relayToServiceRatio (float) would be simpler.

I feel we still need hardcoded minimum still so that you can't, by mistake brick your node.

Same opinion here, having 2 params, a maximum and a ratio to know how many to allocate for each should make things simple yet customizable :)

Thanks @gabrielmer @SionoiS for reply ! i also like suggestion.

If this approach looks good to you, I can begin the implementation. Just to confirm, we're only discussing the deprecation of the external configuration parameter, correct? I believe we still need to maxRelayPeer and maxServicePeer within the peerManager.

Regarding your thoughts on the ratio, I have an idea: instead of using a float, we could represent it as a string, like 0.33 becoming 40:60 or 0.43 as 30:70. it's just need calculator 😂for some ratio, Let me know what you guys think, and I’ll implement accordingly.

@darshankabariya darshankabariya marked this pull request as ready for review December 16, 2024 08:15
@gabrielmer
Copy link
Contributor

If this approach looks good to you, I can begin the implementation. Just to confirm, we're only discussing the deprecation of the external configuration parameter, correct? I believe we still need to maxRelayPeer and maxServicePeer within the peerManager.

Regarding your thoughts on the ratio, I have an idea: instead of using a float, we could represent it as a string, like 0.33 becoming 40:60 or 0.43 as 30:70. it's just need calculator 😂for some ratio, Let me know what you guys think, and I’ll implement accordingly.

Yes! That would only be for the external configuration, we would then translate it into maxRelayPeers and maxServicePeers when configuring the Peer Manager

Regarding the format, I'm pretty neutral about it - no strong opinions. Would like to hear what others think :))

@SionoiS
Copy link
Contributor

SionoiS commented Dec 16, 2024

Regarding your thoughts on the ratio, I have an idea: instead of using a float, we could represent it as a string, like 0.33 becoming 40:60 or 0.43 as 30:70. it's just need calculator 😂for some ratio, Let me know what you guys think, and I’ll implement accordingly.

If that's simpler go for it! No strong opinion here either.

@darshankabariya
Copy link
Contributor Author

Just leaving a comment to inform, this PR is ready to be merged. Thank you!

Copy link
Collaborator

@Ivansete-status Ivansete-status left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work! I think we are getting even closer ;P
Kindly review my comments and let me know if something isn't clear or I'm missing anything
Thanks again 🙌 !


ok((float(relayRatio / total), float(serviceRatio / total)))
except Exception:
err("Invalid number format in ratio")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
err("Invalid number format in ratio")
return err("Invalid number format in ratio: " & getCurrentExceptionMsg())

if total == 0:
return err("Total ratio cannot be zero")

ok((float(relayRatio / total), float(serviceRatio / total)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick comment: I think is better to always use the return keyword

Suggested change
ok((float(relayRatio / total), float(serviceRatio / total)))
return ok((float(relayRatio / total), float(serviceRatio / total)))

@@ -49,3 +49,24 @@ proc parseCorrectMsgSize*(input: string): uint64 =
let ret = parseMsgSize(input).valueOr:
return 0
return ret

proc parseRelayServiceRatio*(ratio: string): Result[(float, float), string] =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to have a comment to explain the parameter and the returning values

try:
let elements = ratio.split(":")
if elements.len != 2:
return err("Invalid format for relay ratio expected relay:service")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return err("Invalid format for relay ratio expected relay:service")
return err("Invalid format for relay ratio expected relay:service, found: " & ratio)

serviceRatio = parseInt(elements[1])

if relayRatio < 0 or serviceRatio < 0:
return err("Relay and service ratios must be non-negative integers")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return err("Relay and service ratios must be non-negative integers")
return err("Relay and service ratios must be non-negative integers. Found this: " & ratio)

Comment on lines 118 to 123
let relayServiceRatio = parseRelayServiceRatio(relayServiceRatio)
if relayServiceRatio.isErr:
warn "Invalid relay service ratio", ratio = relayServiceRatio
return

let (relayRatio, serviceRatio) = relayServiceRatio.get
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let relayServiceRatio = parseRelayServiceRatio(relayServiceRatio)
if relayServiceRatio.isErr:
warn "Invalid relay service ratio", ratio = relayServiceRatio
return
let (relayRatio, serviceRatio) = relayServiceRatio.get
let (relayRatio, serviceRatio) = parseRelayServiceRatio(relayServiceRatio).valueOr:
error "Invalid relay service ratio", ratio = relayServiceRatio, error = error
return ## we should stop the execution in this point but for further PR's

var relayPeers = int(ceil(float(maxConnections) * relayRatio))
var servicePeers = int(floor(float(maxConnections) * serviceRatio))

if maxRelayPeers.isSome() or maxServicePeers.isSome():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not needed this if statement?

Comment on lines 1011 to 1015
if relayPeers + servicePeers > maxConnections:
warn "Max number of connections can't be greater than maxRelayPeers + maxServicePeers",
maxConnections = maxConnections,
maxRelayPeers = relayPeers,
maxServicePeers = servicePeers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block shouldn't be needed because we should validate that on startup

maxRelayPeers = relayPeers,
maxServicePeers = servicePeers

let minRelayPeers = 8
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value is correct nowadays:)
However, instead of using this hard-coded value, we should extract this value from the dHigh configured in WakuRelay.

Then, we should have something like:

Suggested change
let minRelayPeers = 8
let minRelayPeers = WakuRelay.getDHigh()

And, in waku_relay/protocol.nim, we should create the following simple proc that exposes the dHigh configured value:

proc getDHigh*(T: type WakuRelay): int =
  return GossipsubParameters.dHigh

Comment on lines 1018 to 1021
if relayPeers < minRelayPeers:
let newMaxConnections = int(float(minRelayPeers) / relayRatio)
warn "fullfill the minimum criteria you need to increase the max number of connections",
newMaxConnections = newMaxConnections
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this validation should be made earlier and, in case not configured properly, immediately stop the execution.

Copy link
Contributor Author

@darshankabariya darshankabariya Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this validation should be made earlier and, in case not configured properly, immediately stop the execution.

Thanks for all the comments! I've tried to address all of them all but instead of replying to each one individually (I’m just afraid it will scatter our discussion.), bcz I agree with most of your comments and have updated the PR accordingly.

However, regarding this suggestion:

I couldn't find a way to validate it earlier. While it's possible to add the validation in the parse function, doing so would require the maxConnection value within that function, which could overcomplicate the logic and violate the single responsibility principle.

Thank you very much !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

chore: Reserve % of max connections for relay
4 participants