|
| 1 | +# OTel Observability Performance Overhead Benchmarking |
| 2 | + |
| 3 | +This directory contains a comprehensive benchmarking suite to measure the performance overhead of the OpenTelemetry (OTel) observability implementation in go-redis. |
| 4 | + |
| 5 | +## 📋 Overview |
| 6 | + |
| 7 | +The benchmarking suite performs a **3-way comparison** to measure: |
| 8 | + |
| 9 | +1. **Baseline Performance** - Upstream master without any OTel code |
| 10 | +2. **Dormant Code Overhead** - Current branch with OTel code present but disabled |
| 11 | +3. **Active Metrics Overhead** - Current branch with OTel metrics enabled |
| 12 | + |
| 13 | +## 🎯 Goals |
| 14 | + |
| 15 | +### Primary Goals |
| 16 | +- **Prove zero overhead when disabled**: The no-op pattern should add negligible overhead (<1%) when metrics are disabled |
| 17 | +- **Measure active overhead**: Quantify the performance cost when metrics are actively collected |
| 18 | +- **Validate production readiness**: Ensure overhead is acceptable for production use |
| 19 | + |
| 20 | +### Success Criteria |
| 21 | +- **Disabled vs Master**: ~0% overhead (within statistical noise) |
| 22 | +- **Enabled vs Disabled**: <5-10% overhead for most operations |
| 23 | +- **Memory allocations**: Minimal increase in allocations per operation |
| 24 | + |
| 25 | +## 📁 Files |
| 26 | + |
| 27 | +### Core Files |
| 28 | +- **`benchmark_overhead_test.go`** - Go benchmark suite with table-driven tests |
| 29 | +- **`compare_perf.sh`** - Automated comparison script |
| 30 | +- **`BENCHMARK_OVERHEAD.md`** - This documentation |
| 31 | + |
| 32 | +### Generated Files (after running benchmarks) |
| 33 | +- **`benchmark_results_*/`** - Results directory with timestamp |
| 34 | + - `current_branch.txt` - Raw results from current branch |
| 35 | + - `upstream_master.txt` - Raw results from upstream/master |
| 36 | + - `otel_enabled.txt` - Extracted enabled results |
| 37 | + - `otel_disabled.txt` - Extracted disabled results |
| 38 | + - `comparison_*.txt` - benchstat comparison reports |
| 39 | + - `README.md` - Summary of the benchmark run |
| 40 | + |
| 41 | +## 🚀 Quick Start |
| 42 | + |
| 43 | +### Prerequisites |
| 44 | + |
| 45 | +1. **Redis server running**: |
| 46 | + ```bash |
| 47 | + docker run -d -p 6379:6379 redis:latest |
| 48 | + ``` |
| 49 | + |
| 50 | +2. **benchstat installed** (script will auto-install if missing): |
| 51 | + ```bash |
| 52 | + go install golang.org/x/perf/cmd/benchstat@latest |
| 53 | + ``` |
| 54 | + |
| 55 | +### Running the Full Comparison |
| 56 | + |
| 57 | +```bash |
| 58 | +# Run with default settings (5 iterations, 10s per benchmark) |
| 59 | +./compare_perf.sh |
| 60 | + |
| 61 | +# Run with custom settings |
| 62 | +BENCHMARK_COUNT=10 BENCHMARK_TIME=30s ./compare_perf.sh |
| 63 | + |
| 64 | +# Run specific benchmarks only |
| 65 | +BENCHMARK_FILTER="BenchmarkOTelOverhead/.*Ping" ./compare_perf.sh |
| 66 | +``` |
| 67 | + |
| 68 | +### Running Individual Benchmarks |
| 69 | + |
| 70 | +```bash |
| 71 | +# Run all OTel overhead benchmarks |
| 72 | +go test -bench=BenchmarkOTelOverhead -benchmem -benchtime=10s -count=5 |
| 73 | + |
| 74 | +# Run specific scenario |
| 75 | +go test -bench=BenchmarkOTelOverhead/OTel_Enabled -benchmem -benchtime=10s -count=5 |
| 76 | + |
| 77 | +# Run specific operation |
| 78 | +go test -bench=BenchmarkOTelOverhead/OTel_Enabled/Ping -benchmem -benchtime=10s -count=5 |
| 79 | + |
| 80 | +# Run connection pool benchmarks |
| 81 | +go test -bench=BenchmarkOTelOverhead_ConnectionPool -benchmem -benchtime=10s -count=5 |
| 82 | +``` |
| 83 | + |
| 84 | +## 📊 Understanding the Results |
| 85 | + |
| 86 | +### benchstat Output Format |
| 87 | + |
| 88 | +``` |
| 89 | +name old time/op new time/op delta |
| 90 | +Ping-8 156µs ± 2% 158µs ± 3% +1.28% (p=0.008 n=5+5) |
| 91 | +
|
| 92 | +name old alloc/op new alloc/op delta |
| 93 | +Ping-8 112B ± 0% 112B ± 0% ~ (all equal) |
| 94 | +
|
| 95 | +name old allocs/op new allocs/op delta |
| 96 | +Ping-8 4.00 ± 0% 4.00 ± 0% ~ (all equal) |
| 97 | +``` |
| 98 | + |
| 99 | +### Interpreting Results |
| 100 | + |
| 101 | +- **`~`** - No statistically significant difference (excellent for disabled mode!) |
| 102 | +- **`+X%`** - Slower by X% (overhead) |
| 103 | +- **`-X%`** - Faster by X% (unlikely, usually measurement variance) |
| 104 | +- **`p-value`** - Statistical significance (p < 0.05 means difference is real) |
| 105 | +- **`n=X+Y`** - Number of samples used for comparison |
| 106 | + |
| 107 | +### What to Look For |
| 108 | + |
| 109 | +#### Comparison 1: Master vs Disabled |
| 110 | +``` |
| 111 | +✅ GOOD: ~0% difference, p > 0.05 (no significant difference) |
| 112 | +❌ BAD: >1% overhead with p < 0.05 (dormant code has measurable cost) |
| 113 | +``` |
| 114 | + |
| 115 | +#### Comparison 2: Disabled vs Enabled |
| 116 | +``` |
| 117 | +✅ GOOD: <5% overhead for simple operations (Ping, Get, Set) |
| 118 | +✅ ACCEPTABLE: <10% overhead for complex operations (Pipeline) |
| 119 | +⚠️ REVIEW: >10% overhead (may need optimization) |
| 120 | +``` |
| 121 | + |
| 122 | +#### Comparison 3: Master vs Enabled |
| 123 | +``` |
| 124 | +✅ GOOD: Total overhead <10% for production workloads |
| 125 | +⚠️ REVIEW: >15% overhead (consider if metrics value justifies cost) |
| 126 | +``` |
| 127 | + |
| 128 | +## 🔬 Benchmark Coverage |
| 129 | + |
| 130 | +### Operations Tested |
| 131 | + |
| 132 | +1. **Ping** - Simplest operation, measures baseline overhead |
| 133 | +2. **Set** - Write operation with key generation |
| 134 | +3. **Get** - Read operation with cache hits |
| 135 | +4. **SetGet_Mixed** - Realistic workload (70% reads, 30% writes) |
| 136 | +5. **Pipeline** - Batch operations (10 commands per pipeline) |
| 137 | + |
| 138 | +### Scenarios Tested |
| 139 | + |
| 140 | +1. **OTel_Enabled** - Full metrics collection |
| 141 | +2. **OTel_Disabled** - Code present but disabled |
| 142 | +3. **No_OTel** - Baseline from upstream/master |
| 143 | + |
| 144 | +### Concurrency |
| 145 | + |
| 146 | +All benchmarks use `b.RunParallel()` to simulate real-world concurrent access patterns. |
| 147 | + |
| 148 | +## 🛠️ Customization |
| 149 | + |
| 150 | +### Environment Variables |
| 151 | + |
| 152 | +```bash |
| 153 | +# Number of benchmark iterations (default: 5) |
| 154 | +BENCHMARK_COUNT=10 ./compare_perf.sh |
| 155 | + |
| 156 | +# Time per benchmark (default: 10s) |
| 157 | +BENCHMARK_TIME=30s ./compare_perf.sh |
| 158 | + |
| 159 | +# Filter benchmarks by name (default: BenchmarkOTelOverhead) |
| 160 | +BENCHMARK_FILTER="BenchmarkOTelOverhead/.*Ping" ./compare_perf.sh |
| 161 | + |
| 162 | +# Upstream remote name (default: upstream) |
| 163 | +UPSTREAM_REMOTE=origin ./compare_perf.sh |
| 164 | + |
| 165 | +# Upstream branch name (default: master) |
| 166 | +UPSTREAM_BRANCH=main ./compare_perf.sh |
| 167 | +``` |
| 168 | + |
| 169 | +### Combining Options |
| 170 | + |
| 171 | +```bash |
| 172 | +# Run 10 iterations of 30s each, only Ping benchmarks |
| 173 | +BENCHMARK_COUNT=10 \ |
| 174 | +BENCHMARK_TIME=30s \ |
| 175 | +BENCHMARK_FILTER="BenchmarkOTelOverhead/.*Ping" \ |
| 176 | +./compare_perf.sh |
| 177 | +``` |
| 178 | + |
| 179 | +## 📈 Example Results |
| 180 | + |
| 181 | +### Expected Results (Hypothetical) |
| 182 | + |
| 183 | +#### Comparison 1: Master vs Disabled (Dormant Code Overhead) |
| 184 | +``` |
| 185 | +name old time/op new time/op delta |
| 186 | +OTelOverhead/OTel_Disabled/Ping-8 156µs ± 2% 157µs ± 3% ~ (p=0.234 n=5+5) |
| 187 | +OTelOverhead/OTel_Disabled/Set-8 189µs ± 1% 190µs ± 2% ~ (p=0.421 n=5+5) |
| 188 | +OTelOverhead/OTel_Disabled/Get-8 145µs ± 2% 146µs ± 1% ~ (p=0.548 n=5+5) |
| 189 | +
|
| 190 | +name old alloc/op new alloc/op delta |
| 191 | +OTelOverhead/OTel_Disabled/Ping-8 112B ± 0% 112B ± 0% ~ (all equal) |
| 192 | +``` |
| 193 | +**✅ Result: No measurable overhead when disabled** |
| 194 | + |
| 195 | +#### Comparison 2: Disabled vs Enabled (Active Metrics Overhead) |
| 196 | +``` |
| 197 | +name old time/op new time/op delta |
| 198 | +OTelOverhead/OTel_Enabled/Ping-8 157µs ± 3% 164µs ± 2% +4.46% (p=0.008 n=5+5) |
| 199 | +OTelOverhead/OTel_Enabled/Set-8 190µs ± 2% 199µs ± 3% +4.74% (p=0.016 n=5+5) |
| 200 | +OTelOverhead/OTel_Enabled/Get-8 146µs ± 1% 153µs ± 2% +4.79% (p=0.008 n=5+5) |
| 201 | +
|
| 202 | +name old alloc/op new alloc/op delta |
| 203 | +OTelOverhead/OTel_Enabled/Ping-8 112B ± 0% 128B ± 0% +14.29% (p=0.000 n=5+5) |
| 204 | +``` |
| 205 | +**✅ Result: ~5% latency overhead, acceptable for production** |
| 206 | + |
| 207 | +## 🔍 Troubleshooting |
| 208 | + |
| 209 | +### Redis Not Running |
| 210 | +``` |
| 211 | +❌ Redis is not running on localhost:6379 |
| 212 | +💡 Start Redis with: docker run -d -p 6379:6379 redis:latest |
| 213 | +``` |
| 214 | + |
| 215 | +### benchstat Not Found |
| 216 | +The script will auto-install benchstat. If it fails: |
| 217 | +```bash |
| 218 | +go install golang.org/x/perf/cmd/benchstat@latest |
| 219 | +export PATH=$PATH:$(go env GOPATH)/bin |
| 220 | +``` |
| 221 | + |
| 222 | +### Benchmark Timeout |
| 223 | +Increase the timeout: |
| 224 | +```bash |
| 225 | +# In compare_perf.sh, modify the timeout flag: |
| 226 | +go test -bench=... -timeout=60m ... |
| 227 | +``` |
| 228 | + |
| 229 | +### High Variance in Results |
| 230 | +- Ensure system is not under load |
| 231 | +- Increase `BENCHMARK_COUNT` for more samples |
| 232 | +- Increase `BENCHMARK_TIME` for longer runs |
| 233 | +- Close other applications |
| 234 | + |
| 235 | +## 📝 Best Practices |
| 236 | + |
| 237 | +### Before Running Benchmarks |
| 238 | + |
| 239 | +1. **Close unnecessary applications** to reduce system noise |
| 240 | +2. **Ensure stable system load** (no background tasks) |
| 241 | +3. **Use consistent Redis configuration** (same version, same settings) |
| 242 | +4. **Run multiple iterations** (at least 5) for statistical significance |
| 243 | + |
| 244 | +### Interpreting Results |
| 245 | + |
| 246 | +1. **Focus on p-values** - Only trust differences with p < 0.05 |
| 247 | +2. **Look at trends** - Consistent overhead across operations is more meaningful |
| 248 | +3. **Consider absolute values** - 10% of 1µs is less concerning than 10% of 1ms |
| 249 | +4. **Check allocations** - Memory overhead can be as important as latency |
| 250 | + |
| 251 | +### Reporting Results |
| 252 | + |
| 253 | +When sharing benchmark results: |
| 254 | +1. Include system information (CPU, RAM, OS) |
| 255 | +2. Include Redis version and configuration |
| 256 | +3. Include full benchstat output |
| 257 | +4. Note any anomalies or special conditions |
| 258 | +5. Include multiple runs to show consistency |
| 259 | + |
| 260 | +## 🎓 Advanced Usage |
| 261 | + |
| 262 | +### Profiling |
| 263 | + |
| 264 | +```bash |
| 265 | +# CPU profile |
| 266 | +go test -bench=BenchmarkOTelOverhead/OTel_Enabled/Ping \ |
| 267 | + -cpuprofile=cpu.prof -benchtime=30s |
| 268 | + |
| 269 | +# Memory profile |
| 270 | +go test -bench=BenchmarkOTelOverhead/OTel_Enabled/Ping \ |
| 271 | + -memprofile=mem.prof -benchtime=30s |
| 272 | + |
| 273 | +# Analyze profiles |
| 274 | +go tool pprof cpu.prof |
| 275 | +go tool pprof mem.prof |
| 276 | +``` |
| 277 | + |
| 278 | +### Comparing Specific Commits |
| 279 | + |
| 280 | +```bash |
| 281 | +# Benchmark commit A |
| 282 | +git checkout commit-a |
| 283 | +go test -bench=BenchmarkOTelOverhead -benchmem -count=5 > commit-a.txt |
| 284 | + |
| 285 | +# Benchmark commit B |
| 286 | +git checkout commit-b |
| 287 | +go test -bench=BenchmarkOTelOverhead -benchmem -count=5 > commit-b.txt |
| 288 | + |
| 289 | +# Compare |
| 290 | +benchstat commit-a.txt commit-b.txt |
| 291 | +``` |
| 292 | + |
| 293 | +## 📚 References |
| 294 | + |
| 295 | +- [Go Benchmarking Guide](https://pkg.go.dev/testing#hdr-Benchmarks) |
| 296 | +- [benchstat Documentation](https://pkg.go.dev/golang.org/x/perf/cmd/benchstat) |
| 297 | +- [OpenTelemetry Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/) |
| 298 | +- [go-redis Documentation](https://redis.uptrace.dev/) |
| 299 | + |
| 300 | +## 🤝 Contributing |
| 301 | + |
| 302 | +When adding new benchmarks: |
| 303 | +1. Follow the existing naming convention |
| 304 | +2. Use `b.RunParallel()` for realistic concurrency |
| 305 | +3. Use `b.ReportAllocs()` to track memory |
| 306 | +4. Add documentation to this file |
| 307 | +5. Update the comparison script if needed |
| 308 | + |
| 309 | +## 📄 License |
| 310 | + |
| 311 | +Same as go-redis (BSD 2-Clause License) |
| 312 | + |
0 commit comments