This benchmark script tests the performance of Storybook CLI initialization across different configurations. The project is built with TypeScript and uses a modular service-based architecture.
The benchmark measures Storybook CLI initialization performance across multiple dimensions:
- Storybook versions (configurable, default:
10.0.7andcanary) - Package managers: yarn, yarn2 (berry), npm, bun
- Feature combinations: a11y, a11y+test, a11y+docs, a11y+test+docs
- Cache configurations: with/without package manager cache, with/without Playwright cache
- Multiple iterations per configuration for statistical analysis
- Dynamic Version Selection: Choose from default versions or add custom Storybook versions
- Flexible Test Configuration: Run all tests or configure specific combinations
- Resume Functionality: Continue from where you left off if the script is interrupted
- Rerun Failed Tests: Automatically detect and rerun failed tests, replacing entries step-by-step as they succeed
- Smoke Tests: Optional verification that Storybook builds successfully after initialization
- Statistical Analysis: Calculates mean, median, min, max, and standard deviation across multiple iterations
- CSV Export: Results saved in CSV format optimized for Notion and Google Sheets
- Performance Summary: Generate markdown reports comparing versions with percentage differences
- Graceful Shutdown: Saves partial results on interruption (Ctrl+C)
- TypeScript: Fully typed codebase with type safety
- Biome Linting: Code quality and formatting with Biome
-
Install dependencies (only for the benchmark script):
cd benchmark npm install # or pnpm install # or bun install
-
Make sure the React project dependencies are NOT installed (the benchmark script will handle this)
cd benchmark
npm run benchmark
# or
bun benchmark.tsThe script will guide you through several prompts:
-
Version Selection:
- Choose from default versions (
10.0.7,canary) - Or add custom versions (e.g.,
latest,0.0.0-pr-32717-sha-47ba2989)
- Choose from default versions (
-
Test Mode Selection:
- Run all tests: Executes all possible combinations
- Configure specific combinations: Select specific versions, package managers, features, and cache options
- Rerun failed tests only: Automatically detects failed tests from CSV and reruns them (only appears if failed tests exist)
-
Iterations: Specify how many times each test configuration should run (default: 1)
-
Smoke Tests: Choose whether to run
storybook -- --smoke-testafter each successful init (runs outside performance measurement) -
Resume Options (if existing results found):
- Continue from last test: Skip already completed tests
- Rerun failed tests only: Rerun only tests that previously failed
- Rerun failed tests and continue: Rerun failed tests and continue with incomplete ones
- Start from scratch: Overwrite existing results
When rerunning failed tests:
- Failed entries remain in the CSV initially
- As each test completes successfully, its entry is replaced immediately in the CSV
- If a rerun still fails, the original failed entry is kept
- Other successful entries are preserved
The benchmark can run up to 128 test configurations (if all options are selected):
- Versions: Configurable (default: 2)
- Package Managers: yarn, yarn2, npm, bun (4)
- Features:
- a11y
- a11y + test
- a11y + docs
- a11y + test + docs (4)
- Cache Options: with cache, without cache (2)
- Playwright Cache: with cache, without cache (2) - only applies when
testfeature is selected
Total: Versions Γ 4 Γ 4 Γ 2 Γ 2 = up to 128 tests per version
Results are saved to benchmark-results.csv in the project root with the following columns:
- Test ID: Unique identifier (e.g.,
test-0001) - Version: Storybook version tested
- Package Manager: yarn/yarn2/npm/bun
- Features: Feature combination name
- With Cache: yes/no
- With Playwright Cache: yes/no (or
-if test feature not selected) - Iterations: Number of times the test ran
- Success Count: Number of successful iterations
- Duration Mean (s): Average duration across iterations
- Duration Median (s): Median duration
- Duration Min (s): Minimum duration
- Duration Max (s): Maximum duration
- Duration StdDev (s): Standard deviation
- Success: yes/no/partial
- Error: Error message if any
- Smoke Test Status: passed/failed/partial/- (if smoke tests enabled)
- Smoke Test Pass Count: Number of successful smoke tests
- Smoke Test Fail Count: Number of failed smoke tests
This format is optimized for importing into Notion or Google Sheets.
After running benchmarks, you can generate a markdown summary report that compares performance across different Storybook versions.
cd benchmark
npm run summary
# or
bun generate-summary.tsThe script accepts optional arguments:
bun generate-summary.ts [csv-file] [output-file]- csv-file: Path to the CSV results file (defaults to
../benchmark-results.csv) - output-file: Path for the generated markdown file (defaults to
benchmark-summary.md)
The generated markdown report includes:
-
Configuration-Based Comparisons: Groups results by:
- Package manager (yarn, yarn2, npm, bun)
- Feature combinations (a11y+docs, a11y+test+docs, etc.)
- Cache settings (with/without cache, with/without Playwright cache)
-
Version Comparison Tables: For each configuration group, shows:
- Duration metrics (mean, median, min, max, standard deviation)
- Percentage difference vs baseline version
- Visual indicators:
- π’ = faster (negative percentage)
- π΄ = slower (positive percentage)
- βͺ = same (0%)
-
Overall Statistics: Aggregated performance metrics:
- Average duration by version across all configurations
- Comparison vs fastest version
- Number of configurations and tests per version
### bun - a11y+docs (without cache)
| Version | Mean (s) | Median (s) | vs Baseline |
|---------|---------|------------|-------------|
| **10.0.7** (baseline) | 14.80 | 14.80 | - |
| 0.0.0-pr-32717-sha-47ba2989 | 13.13 | 13.13 | π’ -11.28% |
| 10.1.0-alpha.10 | 15.13 | 15.13 | π΄ +2.23% |The summary helps identify:
- Performance improvements or regressions between versions
- Best-performing versions for specific configurations
- Overall performance trends across the test matrix
The script automatically manages caches:
Without cache: Removes cache directories before every iteration
- npm:
~/.npm/_npx,~/.npm/_cacache - yarn/yarn2:
~/.yarn/cache,~/.yarn/berry/cache - bun:
~/.bun/install/cache
With cache: Uses existing caches (populated during first iteration)
Only managed when test feature is selected
- Without cache: Removes
~/Library/Caches/ms-playwrightbefore every iteration - With cache: Uses existing Playwright cache
- Not applicable: Shows
-in CSV when test feature is not selected
Tests always run without cache first, then with cache to ensure caches are properly populated.
benchmark/
βββ benchmark.ts # Main entry point
βββ config.ts # Configuration constants
βββ tsconfig.json # TypeScript configuration
βββ biome.json # Biome linting configuration
βββ package.json # Dependencies and scripts
βββ types/
β βββ index.ts # TypeScript type definitions
βββ services/
βββ cacheService.ts # Cache management
βββ configService.ts # Test configuration generation
βββ csvService.ts # CSV read/write/parse operations
βββ fileSystemService.ts # File operations
βββ loggerService.ts # Console logging
βββ processService.ts # Child process management
βββ promptService.ts # User interaction prompts
βββ statisticsService.ts # Statistical calculations
βββ testRunnerService.ts # Benchmark execution logic
npm run benchmark- Run the benchmarknpm run summary- Generate markdown performance summary from CSV resultsnpm run lint- Check for linting issuesnpm run lint:fix- Auto-fix linting issuesnpm run format- Format code with Biome
The project uses:
- TypeScript for type safety
- Biome for linting and formatting
- Service-based architecture for modularity and testability
The script uses different commands based on package manager:
- bun:
bunx storybook@<version> init --flags - yarn2 (berry):
yarn dlx storybook@<version> init --flags - yarn (classic):
npx storybook@<version> init --flags - pnpm:
pnpm create storybook@<version> --flags - npm:
npm create storybook@<version> --flags
Note: The -- separator is used for npm create and pnpm create commands to properly pass flags to the underlying script.
When testing with yarn2, the script automatically:
- Runs
yarn set version berry - Creates
.yarnrc.ymlwithnodeLinker: node-modules - Runs
yarn install --mode=update-lockfile
- Child processes are tracked and can be killed gracefully
- On interruption (Ctrl+C), all child processes are terminated
- Partial results are saved automatically
CI=trueis set during all benchmark runs- Temporary cache directories are used for "without cache" tests
- Package manager-specific cache environment variables are configured
- Node.js 18+ (for running the benchmark script)
- TypeScript 5.0+
- The package managers you want to test (yarn, npm, bun) should be installed
- Sufficient disk space for temporary test projects in
benchmark-runs/
- Each test creates a fresh copy of the React project in
benchmark-runs/directory - Tests run sequentially (not in parallel) to ensure accurate timing
- The script cleans up test projects after each run
- Progress is displayed during execution with detailed statistics
- Smoke tests run outside the performance measurement time
- Failed test entries are replaced step-by-step as reruns succeed