Skip to content

Development::Performance::Measurements (draft)

Lincon Vidal edited this page Mar 20, 2025 · 4 revisions

Welcome to the cardano-rosetta-java wiki!

Here we will put scalability tests measurements of each endpoints and in the future SLA (Service Level Agreements).


Stability Testing Approach and SLA Definition

Our goal is to determine how many concurrent users each API endpoint can support while meeting our defined latency SLA—specifically, maintaining response times under 1 second for p95 and p99. The endpoint with the lowest concurrency at which the SLA fails (p95/p99 ≥ 1s) becomes our initial concurrency SLA limit. This gives us a clear baseline of performance and highlights immediate bottlenecks.

Refined Approach

We now propose using a structured, concurrency-focused approach aligned with industry best practices:

  • Stepwise Incremental Concurrency: We run phased tests, gradually increasing the number of concurrent virtual users, clearly defined through Artillery's existing arrivalRate + maxVusers.
  • Warm-up Phases (60 sec): To allow the system to stabilize at each concurrency level.
  • Measurement Phases (3 min): To collect stable, meaningful metrics, specifically focusing on p95 and p99 latency.

Practical Implementation Example

An example Artillery test for clearly measuring concurrency at specific levels:

config:
  target: '<http://127.0.0.1:8082>'
  http:
    timeout: 240
    defaults:
      headers:
        Content-Type: application/json

phases:

# ---- Concurrency Level: 1 User ----
- duration: 60
  arrivalRate: 1
  rampTo: 100
  maxVusers: 1
  name: 'Warm-up: Concurrency 1'
- duration: 180
  arrivalRate: 100
  maxVusers: 1
  name: 'Measure: Concurrency stable at 1'

# Concurrency = 2
- duration: 60
  arrivalRate: 100
  rampTo: 200
  maxVusers: 2
  name: 'Warm-up: 2 concurrent users'
- duration: 180
  arrivalRate: 200
  maxVusers: 2
  name: 'Measure: Concurrency stable at 2'

# Concurrency = 4
- duration: 60
  arrivalRate: 200
  rampTo: 400
  maxVusers: 4
  name: 'Warm-up: 4 concurrent users'
- duration: 180
  arrivalRate: 400
  maxVusers: 4
  name: 'Measure: Concurrency stable at 4'

# Concurrency = 8
- duration: 60
  arrivalRate: 400
  rampTo: 800
  maxVusers: 8
  name: 'Warm-up: 8 concurrent users'
- duration: 180
  arrivalRate: 800
  maxVusers: 8
  name: 'Measure: Concurrency stable at 8'

# Concurrency = 12
- duration: 60
  arrivalRate: 800
  rampTo: 1200
  maxVusers: 12
  name: 'Warm-up: 12 concurrent users'
- duration: 180
  arrivalRate: 1200
  maxVusers: 12
  name: 'Measure: Concurrency stable at 12'

...

Interpreting the Results

Running artillery test using this structured scenario, and we'll end up with clear data like this:

Endpoint Users (concurrency) p95 response time (ms) Meets SLA (p95 < 1s)?
/block 10 ✅ 200ms ✅ Yes
/block 20 ✅ 400ms ✅ Yes
/block 50 ⚠️ 1200ms ❌ No

Conclusion: Endpoint /block supports ~20 concurrent users at SLA p95 < 1s, but fails at 50.

Repeat for all endpoints to find bottlenecks and document clearly:

Endpoint Max Concurrency (p95 < 1 sec)
/block/transaction 10
/block 20
/search/transactions 5
/account/balance 50

This concise table makes it immediately obvious which endpoint is the limiting factor (in this example, /search/transactions at just 5 concurrent users), enabling focused optimization.

Below is an example for performance metrics table showing max concurrency and response times for different endpoints:

ID Release Endpoint Max Concurrency p95 Response Time (ms) p99 Response Time (ms)
1 1.2.4 /block/transaction 25 950 1100
2 1.2.4 /block 50 800 950
3 1.2.4 /search/transactions 15 950 1200
Clone this wiki locally