-
Notifications
You must be signed in to change notification settings - Fork 12
Development::Performance::Measurements (draft)
Welcome to the cardano-rosetta-java wiki!
Here we will put scalability tests measurements of each endpoints and in the future SLA (Service Level Agreements).
Our goal is to determine how many concurrent users each API endpoint can support while meeting our defined latency SLA—specifically, maintaining response times under 1 second for p95 and p99. The endpoint with the lowest concurrency at which the SLA fails (p95/p99 ≥ 1s) becomes our initial concurrency SLA limit. This gives us a clear baseline of performance and highlights immediate bottlenecks.
We now propose using a structured, concurrency-focused approach aligned with industry best practices:
-
Stepwise Incremental Concurrency: We run phased tests, gradually increasing the number of concurrent virtual users, clearly defined through Artillery's existing
arrivalRate + maxVusers. - Warm-up Phases (60 sec): To allow the system to stabilize at each concurrency level.
- Measurement Phases (3 min): To collect stable, meaningful metrics, specifically focusing on p95 and p99 latency.
An example Artillery test for clearly measuring concurrency at specific levels:
config:
target: '<http://127.0.0.1:8082>'
http:
timeout: 240
defaults:
headers:
Content-Type: application/json
phases:
# ---- Concurrency Level: 1 User ----
- duration: 60
arrivalRate: 1
rampTo: 100
maxVusers: 1
name: 'Warm-up: Concurrency 1'
- duration: 180
arrivalRate: 100
maxVusers: 1
name: 'Measure: Concurrency stable at 1'
# Concurrency = 2
- duration: 60
arrivalRate: 100
rampTo: 200
maxVusers: 2
name: 'Warm-up: 2 concurrent users'
- duration: 180
arrivalRate: 200
maxVusers: 2
name: 'Measure: Concurrency stable at 2'
# Concurrency = 4
- duration: 60
arrivalRate: 200
rampTo: 400
maxVusers: 4
name: 'Warm-up: 4 concurrent users'
- duration: 180
arrivalRate: 400
maxVusers: 4
name: 'Measure: Concurrency stable at 4'
# Concurrency = 8
- duration: 60
arrivalRate: 400
rampTo: 800
maxVusers: 8
name: 'Warm-up: 8 concurrent users'
- duration: 180
arrivalRate: 800
maxVusers: 8
name: 'Measure: Concurrency stable at 8'
# Concurrency = 12
- duration: 60
arrivalRate: 800
rampTo: 1200
maxVusers: 12
name: 'Warm-up: 12 concurrent users'
- duration: 180
arrivalRate: 1200
maxVusers: 12
name: 'Measure: Concurrency stable at 12'
...
Running artillery test using this structured scenario, and we'll end up with clear data like this:
| Endpoint | Users (concurrency) | p95 response time (ms) | Meets SLA (p95 < 1s)? |
|---|---|---|---|
| /block | 10 | ✅ 200ms | ✅ Yes |
| /block | 20 | ✅ 400ms | ✅ Yes |
| /block | 50 | ❌ No |
Conclusion: Endpoint /block supports ~20 concurrent users at SLA p95 < 1s, but fails at 50.
Repeat for all endpoints to find bottlenecks and document clearly:
| Endpoint | Max Concurrency (p95 < 1 sec) |
|---|---|
| /block/transaction | 10 |
| /block | 20 |
| /search/transactions | 5 |
| /account/balance | 50 |
This concise table makes it immediately obvious which endpoint is the limiting factor (in this example, /search/transactions at just 5 concurrent users), enabling focused optimization.
Below is an example for performance metrics table showing max concurrency and response times for different endpoints:
| ID | Release | Endpoint | Max Concurrency | p95 Response Time (ms) | p99 Response Time (ms) |
|---|---|---|---|---|---|
| 1 | 1.2.4 | /block/transaction | 25 | 950 | 1100 |
| 2 | 1.2.4 | /block | 50 | 800 | 950 |
| 3 | 1.2.4 | /search/transactions | 15 | 950 | 1200 |