Commit c8a8214
spec: Add generator pipelines & data processing CLI tools specification
Add comprehensive specification for 6 data engineering CLI tools designed to
stress-test depyler's generator expression and lazy evaluation transpilation:
**Tools Portfolio (6)**:
1. csv_filter - Filter large CSV files using generator expressions
2. log_analyzer - Parse logs with yield and itertools.groupby
3. data_dedup - Find duplicates with stateful generators
4. json_to_csv - Convert JSONL to CSV with nested generators
5. data_aggregator - SQL-like GROUP BY with itertools.groupby
6. stream_merger - K-way merge with heapq and multiple files
**Key Focus Areas**:
- Generator expressions: (x for x in data)
- Generator functions: yield statements
- itertools module: groupby, chain, islice
- Streaming file I/O: line-by-line processing
- Memory efficiency: O(1) vs O(n) validation
- CSV/JSON/log processing at scale
**Testing Methodology**:
- Extreme TDD with 100% coverage
- Property-based tests with hypothesis
- Memory efficiency validation (peak RSS <100MB for 1GB files)
- Scientific benchmarking (reproducible measurements)
**Benchmarking Strategy**:
- Test data: 100 rows (correctness) → 1M rows (performance) → 10M rows (stress)
- Metrics: execution time, peak RSS, I/O operations
- Comparison: Python vs Rust (expected ≥2x speedup)
- Renacer integration: syscall-level memory profiling
**Rough Edges Discovery**:
- Expected to find ≥10 generator-specific transpilation challenges
- Document yield, groupby, heapq transpilation issues
- Create reproducible test cases for each rough edge
- Prioritize by impact (% of tools blocked)
**Success Criteria**:
- ≥4/6 tools transpile successfully (67%)
- Rust ≥2x faster on working tools
- Both Python and Rust maintain O(1) streaming memory
- Complete rough edges documentation for depyler improvements
Full specification: 800+ lines covering implementation roadmap,
quality gates, and scientific validation methodology.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>1 parent 2a39a55 commit c8a8214
File tree
1 file changed
+1574
-0
lines changed- docs/specifications
1 file changed
+1574
-0
lines changed
0 commit comments