Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify each node client with libp2p UserAgent #3

Closed
3 of 4 tasks
alrevuelta opened this issue Oct 26, 2022 · 14 comments
Closed
3 of 4 tasks

Identify each node client with libp2p UserAgent #3

alrevuelta opened this issue Oct 26, 2022 · 14 comments

Comments

@alrevuelta
Copy link
Collaborator

alrevuelta commented Oct 26, 2022

Background

Since we have multiple waku implementations that will eventually coexist in the same network, it really comes in handy to be able to identify the so-called "client diversity" (a term taken from the Ethereum community). This metric allows having an idea of the number of nodes of each type that the network contains: nwaku, js-waku, go-waku and waku-rs.

This metric can be used to estimate how diverse the network is, and can help in decision making:

  • What if just 1% of the nodes run nwaku, while being the reference implementation?
  • What if a critical bug is detected in a client run by 90% of the peers?
  • What if no one uses any client beyond nwaku?

Solution

In order to be able to estimate the diversity, it's suggested that each implementation advertises its client-id using the userAgent field from libp2p (i.e. .withAgentVersion(agentString) for nim-libp2p), where client-id is:

  • nwaku
  • js-waku
  • go-waku
  • waku-rs

Some nuances:

  • Since we want to be privacy-preserving and we don't want this information to be used against the node, just the client type is advertised, without its release, version, OS or any other sensitive information.
  • This follows the nimbus rationale (an ethereum consensus client)
  • A user of waku must be free to update this string easily if they doesn't want to display this information.
  • This will allow us to estimate the diversity of the network, but this data must be taken with a grain of salt.

Tracking issues

Acceptance

  • Advertised userAgent reflects the client (nwaky, js-waku, go-waku, waku-rs)
  • userAgent can be configured with a cli flag, in case someone doesn't want to reveal it.

cc @fryorcraken @richard-ramos @jm-clius @bernardoaraujor

@richard-ramos
Copy link
Member

go-waku: waku-org/go-waku#348

@fryorcraken
Copy link
Contributor

@kaiserd Any thoughts on this re privacy?

@alrevuelta we are explicitly excluding the version is this correct?

Also note that this is only useful when connecting directly to a node.
Which means it may not be that useful to have an idea of the network topology.

However, it could be useful to investigate odd behaviour encountered when monitoring our own nodes.

@alrevuelta
Copy link
Collaborator Author

@fryorcraken Yes, version and other sensitive information are not displayed, just if it's nim/go/rust/js. Its also a parameter than can be easily overridden with a config flag, so if anyone is really concerned, they can easily change it.

This follows what nimbus is doing with their ethereum client, and it really helps the Ethereum network to know the so-called client diversity. I believe at some point this can come in very handy.

Agree that we only get this information after connecting to a node, but the networkmonitor will take care of it. See waku-org/nwaku#1290

As discussed in waku-org/nwaku#1010 exposing the version is a no go for some of the stakeholders, so it won't be displayed, but note that in the protocol we advertise some kind of version, which ofc can't be avoided, i.e. /vac/waku/relay/2.0.0-beta2. Less verbose than the release version but still. But yeah, this goes beyond this :)

@kaiserd
Copy link

kaiserd commented Oct 27, 2022

At first glance, I do not see any directly exploitable issues.
Still, this gives attackers additional information.
It is basically a trade-off between benefits for logging and anonymity (though the anonymity cost is low).

I'd not publish the user-agent, but I do not have strong arguments against it.
(I would be strongly against publishing the version, but following the discussion, this is already ruled out as an option.)
I'd rather err on the side of not publishing information even though it might not be too useful for attacks.

Potential implications I can think of right now:

  • Published user-agent info can reduce the anonymity set when trying to track nodes over sessions
    • right now this can be done using the PeerID anyways, but in the future...
  • Published user-agent info might reduce a bit of attacker uncertainty in mass deanonymization attacks
    • if the full graph is not known to the attacker, knowing that nodes have labels can help
    • might also help in graph learning attacks, where the attacker tries to infer the gossip-sub topology
  • One specific implementation might be easier to exploit, regardless of the specific version
    • also, the latest version, or a specific version might be weak, and knowing the user-agent narrows down the search space

@fryorcraken

@alrevuelta
Copy link
Collaborator Author

@kaiserd Thanks for the input. imho the benefits of having this information totally outweigh the cons, as explained above in the Background section. The cost of anonymity is low and can be lowered to 0 since this is a flag that can be changed easily. If at some point we detect that no one is using the default nwaku/etc flag, we can remove it. But having this overview of what's in the network is really useful when scaling, and these metrics can help taking some strategic decisions.

@jm-clius
Copy link

IMO, I think the tradeoff is acceptable here if (a) we exclude any version information and (b) the setting is easily overridable in config - both of which this issue and implementations adhere to.

@kaiserd
Copy link

kaiserd commented Oct 27, 2022

The cost of anonymity is low and can be lowered to 0

the setting is easily overridable in config

This further reduces the anonymity set of nodes that do not override. (Just to be aware of, not arguing strongly against the proposal.)

@alrevuelta
Copy link
Collaborator Author

This further reduces the anonymity set of nodes that do not override. (Just to be aware of, not arguing strongly against the proposal.)

Yep good point, but assuming that only a very small subset uses the flag.

@bernardoaraujor
Copy link

waku-rs: bernardoaraujor/waku-rs#3

@fryorcraken
Copy link
Contributor

I am not sure I see a benefit in setting a permanent, overridable flag here.

What if just 1% of the nodes run nwaku, while being the reference implementation?

  • and? What action/decision would you take? The only way to receive actionable feedback here is for
    • us using nwaku and dogfooding so we are aware of caveats/flaws
    • discussing with user base to understand choice/preferences

What if a critical bug is detected in a client run by 90% of the peers?

  • Are you saying that if a critical bug is found in a client not run by 90% of the peers then maybe we'll delay the resolution?

What if no one uses any client beyond nwaku?

Same than first point


User agent in Ethereum beacon node makes sense to help each different team measure their market share. Client diversity is important for Ethereum for sustainability and robustness. I don't think this is the case for Waku.

I would prefer if we expose an API to override the default user agent value while keep the current user agent value as it is (or/end override it to waku for everyone).

Once the API is exposed, I would actually like to encourage Status client team to set the status-web, status-desktop and status-mobile value when the software is build in debug/develop/dogfooding mode so we can easily investigate odd queries (e.g. Waku Store) and now from which client it comes from.

@alrevuelta
Copy link
Collaborator Author

What if just 1% of the nodes run nwaku, while being the reference implementation?

and? What action/decision would you take? The only way to receive actionable feedback here is for
us using nwaku and dogfooding so we are aware of caveats/flaws
discussing with user base to understand choice/preferences

Isn't this a metric useful to know the state of the network? Not saying 1% is good or bad, but can help us raise the issue in the community. Difficult to configure? bad performance? missing features? It's not about taking a specific decision, is about awareness.

Are you saying that if a critical bug is found in a client not run by 90% of the peers then maybe we'll delay the resolution?

Of course, we won't delay the solution. But a bug is not fixed immediately. And during the time the bug is present in the networking, knowing the amount/share of clients it affects, is useful imho. For this use case having the release v.x.y.z of each client in the userAgent would be really useful, and it's something that really helps the Ethereum network when forking. But it was agreed that the version is too much and was discarded.

User agent in Ethereum beacon node makes sense to help each different team measure their market share. Client diversity is important for Ethereum for sustainability and robustness. I don't think this is the case for Waku.

As we scale and onboard new operators I think it's also the case for Waku. Beyond client diversity (Ethereum aims for 20% share of each) in our case perhaps we don't care about an equal share for each client, but having this kind of overview of whats in the network is useful. Same as knowing the amount of peers, their location, etc.

I would prefer if we expose an API to override the default user agent value while keep the current user agent value as it is (or/end override it to waku for everyone).

In the proposed solution, the user agent agent can be configured with a cli flag, but I would like to insist that the default value is different for each client. Anyone is free to overide it with just one flag.

Once the API is exposed, I would actually like to encourage Status client team to set the status-web, status-desktop and status-mobile value when the software is build in debug/develop/dogfooding mode so we can easily investigate odd queries (e.g. Waku Store) and now from which client it comes from.

Perhaps this goes against what was discussed in #1242. I think it's too specific. Regardless, I think it's out of scope for this and I would not try to enforce it, up to them.

@alrevuelta
Copy link
Collaborator Author

Thanks everyone! Only waku-rs is left but don't see a lot of activity in the repo. Not sure if we should close this.

@jm-clius
Copy link

Let's close, as the three main clients have been updated.

@bernardoaraujor
Copy link

Thanks everyone! Only waku-rs is left but don't see a lot of activity in the repo. Not sure if we should close this.

sorry everyone! waku-rs is a side-project for me, and I haven't had the time to tackle this yet.

feel free to close it, I'll report here when I implement it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants