Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wRPC transport "upgrade" #136

Open
rvolosatovs opened this issue Jun 17, 2024 · 3 comments
Open

wRPC transport "upgrade" #136

rvolosatovs opened this issue Jun 17, 2024 · 3 comments

Comments

@rvolosatovs
Copy link
Member

rvolosatovs commented Jun 17, 2024

Currently NATS transport uses "handshake" procedure to establish a two-way, point-to-point communication channel, it achieves this by setting up two NATS inboxes, one per peer. That is done, because there's no way to know if the peer is reachable by any other means than via the NATS broker. This works great for a single message exchange. However, in async scenarios or when payloads are large and do not fit in a single message, it would be great to have a way to negotiate a more efficient communication channel after the initial exchange. This basically means that we could conveniently do service discovery over NATS and then switch to a more efficient channel for actual data transfer.

Keeping the "0-rtt"-esque semantics of existing NATS transport, the way that a NATS -> QUIC upgrade could work is:

  1. Client sends the (potentially truncated) parameter payload to $prefix.foo.bar. If the client is reachable on a particular UDP endpoint, it could send that endpoint as the "reply" header. (the "reply" header should probably be an ordered list containing the NATS inbox as the "fallback")
  2. If the client's UDP endpoint is reachable by the server, it could reply directly on that endpoint. If not - it would just fall back to NATS. Similarly to the client, the server could now communicate the ordered list of endpoints that it would listen on.
  3. Client continues data transfer over the more efficient transport, if the endpoint is reachable and otherwise falls back to NATS. In case of QUIC, it could also simply use the connection established by the server if the client communicated a UDP endpoint to the server

This way we could eliminate the middleman (NATS) allowing for efficient data transfer without sacrificing the ease of service discovery that NATS gives us.

This mechanism does not seem to be specific to NATS, but seems to be beneficial for any transport, which relies on some kind of broker service in the middle

@raskyld
Copy link

raskyld commented Nov 30, 2024

Hey, could we also think about having a "QUIC-like over NATS" protocol as a fallback?

The thing is, UDP connectivity is just one part of the problem: how do you upgrade the connection while maintaining authentication and confidentiality? Those are managed by NATS: inter-server is encryptable and you can create a unique NKEY per client inside the cluster. Wouldn't you need to exchange some keypair over NATS before the upgrade to create a secure connection? In a lot of setup you could end-up not having direct UDP connectivity (e.g. maybe inter-server is tunneled, etc..)

On the other hand, a less efficient approach but easier would be to implement what we miss from QUIC in our NATS transport layer:

  • Acknowledgements: When using NATS Core, you have a at-most-once QoS so we need some form of acknowledgement during transport if you want wRPC over NATS to be more reliable.
  • Flow Control (maybe we could use that and issue a message to ask publisher to slow down?)

Potentially, we could also implement timeouts and P2P termination. Likely, we could use framing or exchange Flow Control, ACK and other "control plane" message through a dedicated subject?

@rvolosatovs
Copy link
Member Author

First, to make sure we're on the same page, this only applies in the context of a single invocation, i.e. a single function call, there's no "connection" concept on the wRPC protocol level.

The thing is, UDP connectivity is just one part of the problem: how do you upgrade the connection while maintaining authentication and confidentiality? Those are managed by NATS: inter-server is encryptable and you can create a unique NKEY per client inside the cluster. Wouldn't you need to exchange some keypair over NATS before the upgrade to create a secure connection?

One way this could work, would be the following:

  1. Invoker sends the initial payload chunk (I'd expect this to be the whole payload in most use cases) over NATS. This message carries additional metadata, which will be used by the responder to send an encrypted message back:
    • Invoker's public address (whether that's an IP address or a DNS name or something else entirely depends on the protocol) - potentially there are multiple addresses
    • Invoker's public key or some derivation of it
    • (optional) nonce, which will be used to verify the authenticity of the response
  2. Responder sends an encrypted message (using the invoker's public key and nonce) to the address(es) specified by the invoker.

Assuming that NATS connection is secure and in the trust domain:

  • Confidentiality of the response is ensured by the fact that it is encrypted towards the invoker's public key
  • Authenticity of the response is ensured by the nonce or some derivation of it being present in the response - this, in turn, also prevents reply attacks
  • (optional) Response digest could be included to ensure integrity

This approach is in no way specific to QUIC, UDP or anything else, in fact this probably would work best with a direct TCP or even UDP (with some sort of ACK) connection.

I'd expect overwhelming majority of calls to never actually need it - I'd expect most wRPC-over-NATS use cases to rely on mostly synchronous functionality with payload sizes that fit in NATS server message payload limit. For such use cases, the complete payload would be sent in the original NATS publish message and sending response directly using the NATS inbox may actually be the most optimal solution.

Perhaps one way to iterate on this would be to simultaneously attempt to send response over multiple communication channels, e.g. send a response over NATS, but also encrypted over TCP and/or UDP. The invoker then could take care of deduplication.

Of course, we would need to gather some real-world benchmarks to validate that it's indeed a good idea.

In any case, this all likely only really matters for 2 use cases:

  1. async function invocation
  2. large payloads, which do not fit in NATS server payload limits

I'd expect only such cases to really benefit from switching to "point-to-point" connection in context of a single invocation.

So with that, I'm not entirely sure I understand the "QUIC-like over NATS" suggestion. The goal here is to eliminate the middleman (NATS broker) for improved performance. If the invocation keeps relying on NATS for all messages, then I don't think we need to do anything here, since we're assuming NATS to be in trust domain for that purpose.

@raskyld
Copy link

raskyld commented Dec 3, 2024

So with that, I'm not entirely sure I understand the "QUIC-like over NATS" suggestion.

I was referring to Core NATS (not JetStreams) which is not reliable per-se (it is at most one), if you want strong guarantee, you likely need to have some ACK / retry and other reliability mechanisms on top of NATS Core.

I thought wRPC may send multiple message over different subjects to do streaming etc. so I was wondering how we can make that reliable over Core NATS.

But yeah I am also questioning whether we should do that at wRPC level: since it is supposed to be "transport-agnostic" shouldn't this upgrade mechanism be implemented by end-users?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants