codeq ships two complementary perf harnesses:
- k6 HTTP scenarios in
loadtest/k6/— exercise the public HTTP API end-to-end (TCP, JSON, Gin router). These match the scenarios defined in Issue #30 and are the closest analogue to a production smoke test. - In-process Go benchmarks in
internal/bench/— drive the producer + worker streams directly against an embeddedpkg/app.Application(loopback gRPC, no Docker). These are the canonical numbers cited throughout the docs.
Use k6 to check API-shaped regressions (request latency, error budget, JSON encoding cost). Use the Go benchmarks to check raw queue throughput and to compare branches before a release.
graph LR
subgraph k6[k6 scenarios]
K6S[loadtest/k6/*.js]
end
subgraph Compose[docker compose local-dev]
HTTP[codeq HTTP :8080]
PB[Pebble embedded]
end
subgraph GoBench[internal/bench]
PROF[TestProfile_FullCycle]
PSAT[TestSaturation_ProducerStreamPath]
WSAT[TestSaturation_StreamPath]
end
subgraph InProc[in-process server]
APP[pkg/app.Application]
GRPC[gRPC stream loopback]
PEB[Pebble shards]
end
K6S -->|HTTP REST| HTTP
HTTP --> KV
PROF --> APP
PSAT --> APP
WSAT --> APP
APP --> GRPC
GRPC --> PEB
The scripts live in loadtest/k6/ and share a small helper
library under loadtest/k6/lib/:
| Script | Workload | Key env vars |
|---|---|---|
01_sustained_throughput.js |
1000 tasks/s for 1h (default) | RATE, DURATION, WORKER_VUS, CLAIM_P99_MS |
02_burst_10k_10s.js |
10k tasks in 10s, then drain | RATE, BURST_DURATION, DRAIN_DURATION, WORKER_VUS |
03_many_workers.js |
100+ concurrent claimers | WORKER_VUS, DURATION, PRODUCER_RATE |
04_prefill_queue.js |
Fills the queue with 100k+ pending, no workers | TASKS, VUS |
05_mixed_priorities.js |
50% high, 30% medium, 20% low | RATE, DURATION, WORKER_VUS |
06_delayed_tasks.js |
50% with delaySeconds |
RATE, DURATION, DELAY_PCT, MIN_DELAY_SECONDS, MAX_DELAY_SECONDS |
Start a local stack:
COMPOSE="docker compose -f deploy/docker-compose/local-dev/compose.yaml -f deploy/docker-compose/local-dev/compose.override.yaml"
$COMPOSE up -dOptional observability stack (Prometheus + Grafana on the obs
profile):
$COMPOSE --profile obs up -dRun a scenario:
$COMPOSE --profile loadtest run --rm k6 run /scripts/01_sustained_throughput.jsScale knobs via env:
RATE=1000 DURATION=1h WORKER_VUS=300 \
$COMPOSE --profile loadtest run --rm k6 run /scripts/01_sustained_throughput.js| Var | Default | Notes |
|---|---|---|
CODEQ_BASE_URL |
http://localhost:8080 (script), http://codeq:8080 (Compose) |
Override with -e to docker compose run |
CODEQ_PRODUCER_TOKEN |
dev-token |
Matches deploy/docker-compose/local-dev/compose.yaml |
CODEQ_WORKER_TOKEN |
dev-token |
Same |
CODEQ_COMMANDS |
GENERATE_MASTER |
Comma-separated; targets these commands |
CLAIM_P99_MS |
100 |
Threshold for 01_sustained_throughput.js |
- P99 claim latency below
CLAIM_P99_MS(default 100ms) in01_sustained_throughput.js. - Queue depth visible via
/v1/codeq/admin/queues/:commandand thecodeq_queue_depth{command,queue}Prometheus metric.
These benchmarks boot a real pkg/app.Application with Pebble in a
t.TempDir(), wire producer + worker gRPC clients on loopback, and
report tasks/s for a fixed window. They run without Docker, k6, or any
external broker. This is where the throughput numbers cited in
_STYLE.md § Comparativos
come from.
| File | Test | What it measures |
|---|---|---|
profile_full_cycle_test.go |
TestProfile_FullCycle |
Full create → claim → complete cycle, writes pprof to /tmp/codeq-profiles/ |
producer_stream_vs_rest_test.go |
TestProducerThroughput_StreamBatchPath |
Producer-only batched stream creates/s |
producer_stream_vs_rest_test.go |
TestProducerThroughput_RESTPath |
Producer-only REST baseline for comparison |
worker_stream_saturation_test.go |
TestSaturation_StreamPath |
Worker concurrency sweep (1 → 512) |
producer_stream_saturation_test.go |
TestSaturation_ProducerStreamPath |
Producer concurrency sweep |
worker_stream_vs_rest_bench_test.go |
TestThroughput_StreamPath / _RESTPath |
Side-by-side worker stream vs REST |
cluster_throughput_test.go |
TestClusterThroughput_StairStep / _VsSingleNode |
Multi-node consistent-hash cluster |
http_bench_test.go |
BenchmarkHTTP_CreateClaimComplete |
Legacy miniredis baseline (go test -bench) |
The Pebble-backed harnesses read three env vars at boot. Combine them to reproduce the numbers in docs/30-performance-baselines.md:
PHASE8_SHARDS— number of intra-process Pebble shards (default0, meaning single shard). Set to1,2,4,6, or8to reproduce the shard sweep.4is the recommended single-node default.PHASE6_BATCH— workerBatchSize(default0, single-task claim).32batches claims and ACKs, which is where the 83k full-cycle and 23k worker-saturation numbers come from.PHASE6_PROD_BATCH— producer-side batch size (default0).8is the configuration used for the 83k full-cycle baseline.
PHASE8_SHARDS=4 PHASE6_BATCH=32 PHASE6_PROD_BATCH=8 \
go test -v -run='^TestProfile_FullCycle$' -count=1 -timeout=180s ./internal/bench/...This is the harness cited in
_STYLE.md § Voice and tone.
The same command writes CPU, alloc, block, and mutex profiles to
/tmp/codeq-profiles/:
go tool pprof -top -cum /tmp/codeq-profiles/cpu.pb.gz
go tool pprof -alloc_space -top -cum /tmp/codeq-profiles/alloc.pb.gzWorker saturation sweep (1 → 512 concurrency, batched):
PHASE6_BATCH=32 \
go test -v -run='^TestSaturation_StreamPath$' -count=1 -timeout=300s ./internal/bench/...Producer-only batched stream:
PHASE8_SHARDS=4 PHASE6_PROD_BATCH=8 \
go test -v -run='^TestProducerThroughput_StreamBatchPath$' -count=1 -timeout=120s ./internal/bench/...Legacy in-process HTTP cycle (miniredis-backed, fast regression check):
go test ./internal/bench -bench BenchmarkHTTP_CreateClaimComplete -benchtime=30sNote: most
Test*benchmarks skip under-short. Always omit-shortand use-count=1so the Go test cache does not serve a stale run.
Compare runs against the canonical table in docs/30-performance-baselines.md and the catalog at _STYLE.md § Numbers must come from measurement. The reference box used in those tables is a 12-core Linux host (kernel 5.15, WSL2-compatible), Go 1.25.0, local Pebble, loopback gRPC, no fsync.
What to look for:
- k6:
http_req_durationp95/p99,checksrate, custom thresholds in each script. Cross-check with/metricscounters (codeq_task_created_total,codeq_task_claimed_total,codeq_task_completed_total). - Go bench: each saturation step logs
rate=<N> tasks/s— plot the curve, look for the plateau. If the plateau is below baseline by more than 5%, capture the pprof and bisect. - Memory and GC:
BenchmarkGCPressure_*ingc_pressure_bench_test.goreports allocs/op. A regression there usually shows up as a throughput drop inTestProfile_FullCycleunder sustained load.
Warning: Pebble harnesses write to
t.TempDir()and clean up on success. Akill -9mid-run leaves a directory under/tmp— reclaim withrm -rf /tmp/TestProfile_FullCycle*if your CI runner is disk-bound.
- Load testing guide — narrative version of this page with screenshots.
- Performance baselines — historical numbers and CI thresholds.
- Performance tuning — knobs that move the bench numbers.
- _STYLE.md § Numbers must come from measurement — citation format for any new throughput claim.