Skip to content

feat(backend): multi-hop payments with static routing #3566

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sanducb
Copy link
Contributor

@sanducb sanducb commented Jul 18, 2025

Changes proposed in this pull request

  • Adds multihop payment support through static routing -> longest prefix match
  • Telemetry support for ILP packets processing time per operation i.e. outgoing-payment, incoming-payment, routing or unknown (rate probes fall into this category)
  • Telemetry support for ILP payment round trip time
  • Simplifies determining which tenant is the destination of a payment introduced in this PR by making it part of the routing logic.

Context

Closes #3444

Overall setup and routing logic

The setup creates 5 instances where instance A is peered with B, B is peered with C, C is peered with D and D is peered with E (please check the setup for exact instance names).
Payments should be successful from A -> B -> C -> D -> E by using the existing Bruno collection. Instances A and E were kept as cloud-nine and happy-life-bank in order to have minimal changes of the existing setup.

At startup of a Rafiki instance, routes are loaded from the database and stored in the in memory routing table. All subsequent peer updates will also refresh the routing table. For backwards compatibility, if no routes exist then direct peers' address and asset id will be used to populate the routing table.

A routing table entry has the following structure:

| tenantId:destination | next hop | asset id |

where:

  • tenantId is the tenant id of the caller
  • destination is the static ILP address of the payment receiver.
  • next hop is the peer id of the direct peer that will either route or be the destination of the packet
  • asset id is the asset id of the next hop peer -> this field is mandatory when adding/removing a route but not when querying for the next hop, as one could or could not be interested in what asset the peering relationship has when forwarding the packet.

tenantId:destination is called prefix in the implementation and is the key of the table. Longest prefix matching is done against this key.

The routing logic is now also responsible for resolving the peering asymmetry issue described here in a multi-tenanted environment.

Telemetry

There are 2 key metrics added in this PR:

  • ilp_prepare_packet_processing_ms: Measures the time it takes to process individual ILP prepare packets through the connector middleware and is a histogram with a label that denotes the operation of the packet (outgoing_payment, incoming_payment, routing, unknown -> which includes rate probes). In the ILP metrics Grafana dashboard you can see P50 and P95 percentiles panels for tracking latency.
  • ilp_payment_round_trip_ms: Measures the round-trip time for completing ILP payment (on the sender side). This one is also a histogram and the average round-trip time can be seen in the dashboard.

Local testing

Spin up and down the multi-tenanted multi-hop setup with 5 instances along with telemetry by using these commands:

pnpm localenv:compose:multitenancy:multihop:telemetry up

pnpm localenv:compose:multitenancy:multihop:telemetry down --volumes

Use the tenanted Open Payments Bruno collection as-is to test this flow.

Notes

I am not too comfortable with the current localenv test setup even though it works, because I think it should be separated completely from the multitenancy-only setup. I leave it up for discussion to find the most ergonomic way we can do this.

Copy link

netlify bot commented Jul 18, 2025

Deploy Preview for brilliant-pasca-3e80ec canceled.

Name Link
🔨 Latest commit 57f4af2
🔍 Latest deploy log https://app.netlify.com/projects/brilliant-pasca-3e80ec/deploys/687a37721baa180007e577df

@github-actions github-actions bot added type: tests Testing related pkg: backend Changes in the backend package. pkg: frontend Changes in the frontend package. type: source Changes business logic pkg: mock-ase pkg: mock-account-service-lib labels Jul 18, 2025
@sanducb sanducb changed the title feat: implement multihop static routing feat(backend): multi-hop payments with static routing Jul 18, 2025
Copy link

🚀 Performance Test Results

Test Configuration:

  • VUs: 4
  • Duration: 1m0s

Test Metrics:

  • Requests/s: 42.07
  • Iterations/s: 14.04
  • Failed Requests: 0.00% (0 of 2532)
📜 Logs

> [email protected] run-tests:testenv /home/runner/work/rafiki/rafiki/test/performance
> ./scripts/run-tests.sh -e test "-k" "-q" "--vus" "4" "--duration" "1m"

Cloud Nine GraphQL API is up: http://localhost:3101/graphql
Cloud Nine Wallet Address is up: http://localhost:3100/
Happy Life Bank Address is up: http://localhost:4100/
cloud-nine-wallet-test-backend already set
cloud-nine-wallet-test-auth already set
happy-life-bank-test-backend already set
happy-life-bank-test-auth already set
     data_received..................: 914 kB 15 kB/s
     data_sent......................: 1.9 MB 32 kB/s
     http_req_blocked...............: avg=6.07µs   min=2.33µs   med=5.27µs   max=444.26µs p(90)=6.19µs   p(95)=6.66µs  
     http_req_connecting............: avg=320ns    min=0s       med=0s       max=195.8µs  p(90)=0s       p(95)=0s      
     http_req_duration..............: avg=94.46ms  min=7.08ms   med=80.38ms  max=675.38ms p(90)=158.92ms p(95)=180.17ms
       { expected_response:true }...: avg=94.46ms  min=7.08ms   med=80.38ms  max=675.38ms p(90)=158.92ms p(95)=180.17ms
     http_req_failed................: 0.00%  ✓ 0         ✗ 2532
     http_req_receiving.............: avg=91.39µs  min=28.42µs  med=79.4µs   max=2.14ms   p(90)=117.62µs p(95)=149.68µs
     http_req_sending...............: avg=33.62µs  min=9.73µs   med=27.67µs  max=1.33ms   p(90)=38.19µs  p(95)=51.12µs 
     http_req_tls_handshaking.......: avg=0s       min=0s       med=0s       max=0s       p(90)=0s       p(95)=0s      
     http_req_waiting...............: avg=94.33ms  min=6.95ms   med=80.24ms  max=675.28ms p(90)=158.8ms  p(95)=180.07ms
     http_reqs......................: 2532   42.066285/s
     iteration_duration.............: avg=284.66ms min=180.41ms med=267.53ms max=1.25s    p(90)=358.81ms p(95)=391.41ms
     iterations.....................: 845    14.038709/s
     vus............................: 4      min=4       max=4 
     vus_max........................: 4      min=4       max=4 

@BlairCurrey
Copy link
Contributor

Just checking in with some observations to follow up on the issues @sanducb mentioned in the call last week.

These were:

  • some random socket hangups by some backends requiring a container restart
  • open payments flow in bruno example intermittently failing

I ran the tenanted open payments flow, which always completed but sometimes showed these error logs:

global-bank-backend-1          | {"level":20,"time":1753127207814,"pid":30,"hostname":"global-bank-backend","service":"RouterService","destination":"test.happy-life-bank.MJHzYR7_ogW2GlMTr5teAEO8KgPMf4I6cFW1oUJkmNdrteSvdOc75l6o_rmTa_A4Z8hl1-9z5j9OcCOz8DzEdQ2eGKZxOeA","prefix":"test","tenantId":"53f2d913-e98a-40b9-b270-372d0547f23e","selectedPeer":"8e8aaed3-761f-4050-8de2-7b094df64b4b","msg":"found next hop"}
global-bank-backend-1          | {"level":50,"time":1753127207816,"pid":30,"hostname":"global-bank-backend","service":"ConnectorService","module":"balance-middleware","transferOptions":{"sourceAccount":{"id":"8e8aaed3-761f-4050-8de2-7b094df64b4b","assetId":"d5002e16-bc22-46f1-bc3f-a0b9d3c60e96","maxPacketAmount":null,"staticIlpAddress":"test.intergalactic-bank","name":null,"createdAt":"2025-07-18T19:06:32.900Z","updatedAt":"2025-07-18T19:06:32.900Z","liquidityThreshold":"1000000","tenantId":"53f2d913-e98a-40b9-b270-372d0547f23e","routes":["test.intergalactic-bank","test.intergalactic-bank","test"],"http":{"outgoing":{"authToken":"global-to-intergalactic","endpoint":"http://intergalactic-bank-backend:3002"}},"asset":{"id":"d5002e16-bc22-46f1-bc3f-a0b9d3c60e96","ledger":1,"code":"USD","scale":2,"withdrawalThreshold":null,"createdAt":"2025-07-18T19:06:32.780Z","updatedAt":"2025-07-18T19:06:32.780Z","liquidityThreshold":"10000000","deletedAt":null,"tenantId":"53f2d913-e98a-40b9-b270-372d0547f23e"}},"destinationAccount":{"id":"8e8aaed3-761f-4050-8de2-7b094df64b4b","assetId":"d5002e16-bc22-46f1-bc3f-a0b9d3c60e96","maxPacketAmount":null,"staticIlpAddress":"test.intergalactic-bank","name":null,"createdAt":"2025-07-18T19:06:32.900Z","updatedAt":"2025-07-18T19:06:32.900Z","liquidityThreshold":"1000000","tenantId":"53f2d913-e98a-40b9-b270-372d0547f23e","routes":["test.intergalactic-bank","test.intergalactic-bank","test"],"http":{"outgoing":{"authToken":"[Redacted]","endpoint":"http://intergalactic-bank-backend:3002"}},"asset":{"id":"d5002e16-bc22-46f1-bc3f-a0b9d3c60e96","ledger":1,"code":"USD","scale":2,"withdrawalThreshold":null,"createdAt":"2025-07-18T19:06:32.780Z","updatedAt":"2025-07-18T19:06:32.780Z","liquidityThreshold":"10000000","deletedAt":null,"tenantId":"53f2d913-e98a-40b9-b270-372d0547f23e"}},"sourceAmount":"100","destinationAmount":"100","transferType":"TRANSFER","timeout":5},"transferError":"SameAccounts","msg":"Could not create transfer"}
global-bank-backend-1          | {"level":30,"time":1753127207816,"pid":30,"hostname":"global-bank-backend","service":"ConnectorService","err":{"type":"InternalServerError","message":"[object Object]","stack":"InternalServerError: [object Object]\n    at ctxThrow (/home/rafiki/node_modules/.pnpm/[email protected]/node_modules/koa/lib/context.js:97:11)\n    at createPendingTransfer (/home/rafiki/packages/backend/src/payment-method/ilp/connector/core/middleware/balance.ts:100:13)\n    at processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at /home/rafiki/packages/backend/src/payment-method/ilp/connector/core/middleware/balance.ts:115:19\n    at ildcp (/home/rafiki/packages/backend/src/payment-method/ilp/connector/core/middleware/ildcp.ts:19:7)\n    at /home/rafiki/packages/backend/src/payment-method/ilp/connector/core/middleware/throughput.ts:90:5\n    at /home/rafiki/packages/backend/src/payment-method/ilp/connector/core/middleware/rate-limit.ts:54:5\n    at /home/rafiki/packages/backend/src/payment-method/ilp/connector/core/middleware/max-packet-amount.ts:30:5\n    at account (/home/rafiki/packages/backend/src/payment-method/ilp/connector/core/middleware/account.ts:137:5)\n    at /home/rafiki/packages/backend/src/payment-method/ilp/connector/core/middleware/stream-address.ts:38:5","status":500,"statusCode":500,"expose":false},"msg":"Error thrown in incoming pipeline"}
global-bank-backend-1          | {"level":50,"time":1753127207817,"pid":30,"hostname":"global-bank-backend","service":"ConnectorService","err":{"type":"InternalServerError","message":"[object Object]","stack":"InternalServerError: [object Object]\n    at ctxThrow (/home/rafiki/node_modules/.pnpm/[email protected]/node_modules/koa/lib/context.js:97:11)\n    at createPendingTransfer (/home/rafiki/packages/backend/src/payment-method/ilp/connector/core/middleware/balance.ts:100:13)\n    at processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at /home/rafiki/packages/backend/src/payment-method/ilp/connector/core/middleware/balance.ts:115:19\n    at ildcp (/home/rafiki/packages/backend/src/payment-method/ilp/connector/core/middleware/ildcp.ts:19:7)\n    at /home/rafiki/packages/backend/src/payment-method/ilp/connector/core/middleware/throughput.ts:90:5\n    at /home/rafiki/packages/backend/src/payment-method/ilp/connector/core/middleware/rate-limit.ts:54:5\n    at /home/rafiki/packages/backend/src/payment-method/ilp/connector/core/middleware/max-packet-amount.ts:30:5\n    at account (/home/rafiki/packages/backend/src/payment-method/ilp/connector/core/middleware/account.ts:137:5)\n    at /home/rafiki/packages/backend/src/payment-method/ilp/connector/core/middleware/stream-address.ts:38:5","status":500,"statusCode":500,"expose":false},"msg":"unexpected internal error"}

I rang the non-tenanted open payments flow and the first time saw the create quote command take ~10 seconds then return an Internal Server Error with these logs:

cloud-nine-backend-1           | {"level":50,"time":1753127354732,"pid":30,"hostname":"cloud-nine-wallet-backend","service":"QuoteService","err":{"type":"PaymentMethodHandlerError","message":"Received error during ILP quoting","stack":"PaymentMethodHandlerError: Received error during ILP quoting\n    at getQuote (/home/rafiki/packages/backend/src/payment-method/ilp/service.ts:155:13)\n    at runNextTicks (node:internal/process/task_queues:60:5)\n    at processTimers (node:internal/timers:516:9)\n    at createQuote (/home/rafiki/packages/backend/src/open_payments/quote/service.ts:208:15)\n    at createQuote (/home/rafiki/packages/backend/src/open_payments/quote/routes.ts:98:22)\n    at getWalletAddressForSubresource (/home/rafiki/packages/backend/src/open_payments/wallet_address/middleware.ts:121:3)\n    at httpsigMiddleware (/home/rafiki/packages/backend/src/open_payments/auth/middleware.ts:264:3)\n    at /home/rafiki/packages/backend/src/open_payments/auth/middleware.ts:165:5\n    at getWalletAddressUrlFromRequestBody (/home/rafiki/packages/backend/src/open_payments/wallet_address/middleware.ts:19:3)\n    at /home/rafiki/node_modules/.pnpm/@[email protected]/node_modules/@interledger/openapi/dist/middleware.js:27:9","name":"PaymentMethodHandlerError","description":"RateProbeFailed","retryable":true},"msg":"error creating a quote"}
cloud-nine-backend-1           | 
cloud-nine-backend-1           |   InternalServerError: Internal Server Error
cloud-nine-backend-1           |       at Object.throw (/home/rafiki/node_modules/.pnpm/[email protected]/node_modules/koa/lib/context.js:97:11)
cloud-nine-backend-1           |       at openPaymentsServerErrorMiddleware (/home/rafiki/packages/backend/src/open_payments/route-errors.ts:105:14)
cloud-nine-backend-1           |       at processTicksAndRejections (node:internal/process/task_queues:95:5)
cloud-nine-backend-1           |       at bodyParser (/home/rafiki/node_modules/.pnpm/[email protected]/node_modules/koa-bodyparser/index.js:78:5)
cloud-nine-backend-1           |       at cors (/home/rafiki/node_modules/.pnpm/@[email protected]/node_modules/@koa/cors/index.js:109:16)
cloud-nine-backend-1           | 
cloud-nine-backend-1           | {"level":50,"time":1753127354735,"pid":30,"hostname":"cloud-nine-wallet-backend","method":"POST","path":"/438fa74a-fa7d-4317-9ced-dde32ece1787/quotes","err":{"type":"PaymentMethodHandlerError","message":"Received error during ILP quoting","stack":"PaymentMethodHandlerError: Received error during ILP quoting\n    at getQuote (/home/rafiki/packages/backend/src/payment-method/ilp/service.ts:155:13)\n    at runNextTicks (node:internal/process/task_queues:60:5)\n    at processTimers (node:internal/timers:516:9)\n    at createQuote (/home/rafiki/packages/backend/src/open_payments/quote/service.ts:208:15)\n    at createQuote (/home/rafiki/packages/backend/src/open_payments/quote/routes.ts:98:22)\n    at getWalletAddressForSubresource (/home/rafiki/packages/backend/src/open_payments/wallet_address/middleware.ts:121:3)\n    at httpsigMiddleware (/home/rafiki/packages/backend/src/open_payments/auth/middleware.ts:264:3)\n    at /home/rafiki/packages/backend/src/open_payments/auth/middleware.ts:165:5\n    at getWalletAddressUrlFromRequestBody (/home/rafiki/packages/backend/src/open_payments/wallet_address/middleware.ts:19:3)\n    at /home/rafiki/node_modules/.pnpm/@[email protected]/node_modules/@interledger/openapi/dist/middleware.js:27:9","name":"PaymentMethodHandlerError","description":"RateProbeFailed","retryable":true},"msg":"Received unhandled error in Open Payments request"}

Then a successive try to create quote worked. The rest of the flow worked as well. Then I tried again, and the entire flow worked. I tore everything down including the volumes and retried and saw an error in bruno on the grant request for incoming payment (Error invoking remote method 'send-http-request': Error: socket hang up) although I dont see any stopped containers. Then I tore down (removing volumes) and rebuilt several times now see the socket hang up errors for the egt wallet address requests too. With one or many mock ase's down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg: backend Changes in the backend package. pkg: frontend Changes in the frontend package. pkg: mock-account-service-lib pkg: mock-ase type: source Changes business logic type: tests Testing related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Static routing implementation in connector
2 participants