-
Notifications
You must be signed in to change notification settings - Fork 96
feat(backend): multi-hop payments with static routing #3566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for brilliant-pasca-3e80ec canceled.
|
🚀 Performance Test ResultsTest Configuration:
Test Metrics:
📜 Logs
|
Just checking in with some observations to follow up on the issues @sanducb mentioned in the call last week. These were:
I ran the tenanted open payments flow, which always completed but sometimes showed these error logs:
I rang the non-tenanted open payments flow and the first time saw the create quote command take ~10 seconds then return an
Then a successive try to create quote worked. The rest of the flow worked as well. Then I tried again, and the entire flow worked. I tore everything down including the volumes and retried and saw an error in bruno on the grant request for incoming payment ( |
Changes proposed in this pull request
outgoing-payment
,incoming-payment
,routing
orunknown
(rate probes fall into this category)Context
Closes #3444
Overall setup and routing logic
The setup creates 5 instances where instance A is peered with B, B is peered with C, C is peered with D and D is peered with E (please check the setup for exact instance names).
Payments should be successful from A -> B -> C -> D -> E by using the existing Bruno collection. Instances A and E were kept as
cloud-nine
andhappy-life-bank
in order to have minimal changes of the existing setup.At startup of a Rafiki instance, routes are loaded from the database and stored in the in memory routing table. All subsequent peer updates will also refresh the routing table. For backwards compatibility, if no routes exist then direct peers' address and asset id will be used to populate the routing table.
A routing table entry has the following structure:
|
tenantId:destination
|next hop
|asset id
|where:
tenantId
is the tenant id of the callerdestination
is the static ILP address of the payment receiver.next hop
is the peer id of the direct peer that will either route or be the destination of the packetasset id
is the asset id of the next hop peer -> this field is mandatory when adding/removing a route but not when querying for the next hop, as one could or could not be interested in what asset the peering relationship has when forwarding the packet.tenantId:destination
is calledprefix
in the implementation and is the key of the table. Longest prefix matching is done against this key.The routing logic is now also responsible for resolving the peering asymmetry issue described here in a multi-tenanted environment.
Telemetry
There are 2 key metrics added in this PR:
ilp_prepare_packet_processing_ms
: Measures the time it takes to process individual ILP prepare packets through the connector middleware and is a histogram with a label that denotes the operation of the packet (outgoing_payment
,incoming_payment
,routing
,unknown
-> which includes rate probes). In the ILP metrics Grafana dashboard you can see P50 and P95 percentiles panels for tracking latency.ilp_payment_round_trip_ms
: Measures the round-trip time for completing ILP payment (on the sender side). This one is also a histogram and the average round-trip time can be seen in the dashboard.Local testing
Spin up and down the multi-tenanted multi-hop setup with 5 instances along with telemetry by using these commands:
pnpm localenv:compose:multitenancy:multihop:telemetry up
pnpm localenv:compose:multitenancy:multihop:telemetry down --volumes
Use the tenanted Open Payments Bruno collection as-is to test this flow.
Notes
I am not too comfortable with the current localenv test setup even though it works, because I think it should be separated completely from the multitenancy-only setup. I leave it up for discussion to find the most ergonomic way we can do this.