This project is a C++23-based OpenBLAS performance testing framework designed for evaluating and analyzing the performance of BLAS (Basic Linear Algebra Subprograms) functions across different levels (Level 1, 2, 3).
Core Features:
- Benchmark performance of BLAS Level 1, 2, and 3 functions
- Support custom problem sizes, thread counts, and test iterations
- Include warmup runs and cache flush for accurate measurements
- Output results in Markdown or CSV format
- Built with C++23 standard and Xmake build system (Recommend Ninja)
- Currently supports CPU CBLAS framework, with future support planned for Fortran and XPU heterogeneous BLAS testing
Environment: Pure command-line application, primarily supports Linux systems (e.g., Ubuntu 24.04).
- Test performance of BLAS functions at different levels (L1, L2, L3)
- Select partial functions per level via
config.toml - Support single-threaded and multi-threaded execution
- Test multiple problem sizes
- Calculate and output min/avg/max execution time (ms) and GFLOPS
- Cache flush option for cold-cache performance measurement
- Problem Size:
- Level 1 (Vectors): e.g.,
10^4,10^7elements. Use--level1 <num1> - Level 2 (Matrix-Vector): e.g.,
128x128,1024x1024matrices. Use--level2 <num1,num2> - Level 3 (Matrix-Matrix): e.g.,
(128,128,128),(4096,4096,4096). Use--level3 <num1,num2,num3>
- Level 1 (Vectors): e.g.,
- Thread Configuration:
-t,--threads <num>(default: 1 thread, overridesOPENBLAS_NUM_THREADS) - Iterations:
-c,--cycle <num>test repetitions for averaging - Warmup Runs:
-w,--warmup <num>(default: 3)
- stdout: Default Markdown-formatted results
- File Output: Save to file via
-o,--output <filename> - Output Format: Select format with
-f,--format <markdown|csv>
Performance Table Columns:
| Column | Description |
|---|---|
| Function | BLAS function name (e.g., ddot, dgemm) |
| Config | Test parameters (e.g., M=1024,N=1024,K=1024) |
| Threads | Thread count used |
| Min Time (ms) | Minimum execution time (milliseconds) |
| Avg Time (ms) | Average execution time (milliseconds) |
| Max Time (ms) | Maximum execution time (milliseconds) |
| GFLOPS | Performance metric based on FLOPs/time |
- OpenBLAS: High-performance BLAS implementation
# Ubuntu/Debian sudo apt install libopenblas-dev
- CLI11: Command-line parsing (header-only)
- toml++: TOML configuration parsing (header-only)
- spdlog: Logging library (header-only mode)
# Clone with submodules
git clone --recursive https://github.com/Degeneracy-Evil/blas_benchmark.git
# Or if already cloned, initialize submodules
git submodule update --init --recursiveUses Xmake build system:
# Install Xmake (see https://xmake.io/#/guide/installation)
# Configure (Release mode recommended)
xmake f -m release
# Build
xmake
# Clean build artifacts
xmake clean
# Clean all (including cache)
xmake cleanall# Show help
xmake run cblas_benchmark --help
# Basic run
xmake run cblas_benchmark --level3 1024,1024,1024
# Full example
xmake run cblas_benchmark -t 8 -c 10 --level3 2048,2048,2048 -f csv -o results.csv
# Show system info only
xmake run cblas_benchmark -s- System State: Ensure low system load and stable CPU frequency (consider
cpupowerperformance mode) - Warmup: Framework includes warmup. For strict tests, pre-run full test set
- Size Selection: Cover ranges from L1 cache to main memory (e.g., 4096x4096x4096 for Level 3 peak performance)
- Threads: Test single-thread (
--threads 1) and multi-thread (e.g.,--threads $(nproc))
| Function | FLOPS Formula |
|---|---|
| ddot | |
| daxpy | |
| dscal |
| Function | FLOPS Formula |
|---|---|
| dgemv |
| Function | FLOPS Formula |
|---|---|
| dgemm |
GFLOPS Calculation:
# config.toml
# BLAS Benchmark Configuration File
[functions]
level1 = ["cblas_ddot", "cblas_daxpy", "cblas_dscal"]
level2 = ["cblas_dgemv"]
level3 = ["cblas_dgemm"]
[weights.level1]
cblas_ddot = 1.0
cblas_daxpy = 1.0
cblas_dscal = 0.5
[weights.level2]
cblas_dgemv = 1.5
[weights.level3]
cblas_dgemm = 2.0
[defaults]
threads = 1
warmup = 3
cycles = 5
flush_cache = true
level1_size = 1000000
level2_m = 1024
level2_n = 1024
level3_m = 1024
level3_n = 1024
level3_k = 1024blas_benchmark/
├── config.toml # Default config
├── LICENSE
├── README.md
├── README.zh-CN.md
├── PROGRESS.md # Development progress
├── AGENTS.md # AI agent guide
├── src
│ ├── main.cpp # Entry point + CLI
│ ├── benchmark/
│ │ ├── benchmark.cpp # Core benchmarking
│ │ ├── benchmark.h
│ │ ├── blas_functions.cpp # BLAS wrapper + benchmarks
│ │ └── blas_functions.h
│ ├── config/
│ │ ├── config_parser.cpp # TOML parsing
│ │ └── config_parser.h
│ └── utils/
│ ├── system_info.cpp # System info collection
│ ├── system_info.h
│ ├── timer.cpp # High-precision timer
│ └── timer.h
├── thirdparty/ # Git submodules
│ ├── CLI11/ # Command-line parsing
│ ├── tomlplusplus/ # TOML parsing
│ └── spdlog/ # Logging
└── xmake.lua # Build config- C++23 support required
- Clang 18+ recommended (tested)
- GCC 13+ should work but not tested
Personal project for learning OpenBLAS, C++23, Git, Xmake, toml++, CLI11, spdlog, ninja.