- PID: 22035
- Log file:
stage2_training.log - Started: December 14, 2025 (6:59 PM)
- Expected completion: ~24 hours
- Expected output: ~3,000 examples
✓ E-commerce: users (10K), orders (50K)
✓ Social network: posts (20K), comments (100K)
✓ HR system: employees (5K) - self-referencing
✓ Analytics: events (200K) - time-series
✓ Content: articles (10K), tags (100), junction (30K)
Total: 9 tables, 395K+ rows, diverse optimization patterns
# Emphasizes common 2-50x speedups over rare 1000x+ edge cases
reward = 0.3 + (log10(speedup) / 3.5)
Speedup Reward
1.5x → 0.25
2x → 0.45 ✓ Common (basic indexing)
5x → 0.60 ✓ Common (composite indexes)
10x → 0.70 ✓ Common (JOIN optimizations)
50x → 0.85 ✓ Rare (complex optimizations)
1000x+ → 0.95 Very rare (pathological cases)- Old: 0.5 (only accepted 10x+ speedups)
- New: 0.25 (accepts 1.5x+ speedups)
- Rationale: Real-world SQL optimizations are typically 2-20x, not 1000x+
- "Junior developer" persona
- Realistic business use cases
- "First solution that comes to mind" mindset
- Natural unoptimized patterns (not artificially slow)
See TRAINING_STRATEGY.md for full research notes.
Key insight: Train on realistic optimizations (2-50x) that developers encounter daily, not just pathological edge cases (1000x+).
| Component | Value | Notes |
|---|---|---|
| Iterations | 800 | ~3,000 examples total |
| Candidates/iteration | 5 | 4,000 total candidates |
| Reward threshold | 0.25 | Accepts 2x+ speedups |
| Database | Multi-schema | 5 domains, 9 tables |
| Seed data | 10 examples | Across 5 schemas |
| Diversity | 100% | All unique slow queries |
| Model | gpt-4o-mini | Cost: ~$0.80 |
ps -p $(cat stage2_training.pid)tail -f stage2_training.loggrep "Iteration.*Summary" stage2_training.log | tail -5ls -lht data/stage2/checkpoint_*.json | head -1| Iterations | Examples | Diversity | ETA |
|---|---|---|---|
| 100 | ~375 | 98%+ | ~3 hours |
| 200 | ~750 | 98%+ | ~6 hours |
| 400 | ~1,500 | 99%+ | ~12 hours |
| 800 | ~3,000 | 99%+ | ~24 hours |
Success rate: Expected 40-60% (more selective than before)
PYTHONPATH=. python3 scripts/augment_schemas.pyOutput: ~12,000 examples
Stage 1 (467) + Stage 2 (~3,000) = ~3,500 examples After augmentation: ~14,000 examples
PYTHONPATH=. python3 scripts/export_for_finetuning.py✅ Success rate: 100% (6/6 candidates)
✅ Diversity: 100% (16/16 unique queries)
✅ Avg reward: 0.96
✅ Optimization types: indexing (7), join (5), projection (1), combinations (3)
Speedups observed: 63x, 384x, 260x, 159x, 1.1Mx, 313x
Mix of realistic (63x, 159x) and pathological (1.1Mx) - exactly what we want!
| Aspect | Before | After |
|---|---|---|
| Schemas | 1 (e-commerce) | 5 (diverse domains) |
| Reward threshold | 0.5 (saturated at 10x) | 0.25 (realistic 2x+) |
| Reward function | speedup/10, capped at 1.0 | log-scaled, 2x-1Mx range |
| Success rate | 80-100% (too easy) | 40-60% (selective) |
| Avg reward | 1.00 (saturated) | 0.70-0.95 (spectrum) |
| Query quality | Artificially slow | Junior dev mistakes |
✅ Multi-schema database generated ✅ Seed data covers all 5 schemas ✅ Reward function tested (0.25-1.00 range) ✅ Test run successful (100% success, 100% diversity) ✅ Training started (PID: 22035)
Status: TRAINING IN PROGRESS 🚀
Monitor with: tail -f stage2_training.log