diff --git a/CLAUDE.md b/CLAUDE.md index 91d7f55..ef49104 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -21,19 +21,30 @@ | Module | Responsibility | |---|---| | `src/lib.rs` | Library re-exports, public `parse()` entry point | -| `src/error.rs` | `ParseError`, `MatchedPairError` via thiserror | -| `src/token.rs` | `TokenType` enum, `Token` struct | -| `src/lexer.rs` | Hand-written context-sensitive tokenizer | -| `src/lexer_word.rs` | Word/expansion parsing (split for complexity) | -| `src/lexer_matched.rs` | Matched pair parsing for nested constructs | -| `src/ast.rs` | `Node` enum with all 50+ variants | -| `src/sexp.rs` | S-expression output via Display trait | -| `src/parser.rs` | Recursive descent parser (top-level) | -| `src/parser_compound.rs` | Compound commands: if/while/for/case/select/coproc | -| `src/parser_arith.rs` | Arithmetic expression parser | -| `src/parser_cond.rs` | Conditional expression parser `[[ ]]` | +| `src/ast.rs` | `Node` struct, `NodeKind` enum with 50+ variants, `Span` | +| `src/token.rs` | `TokenType` enum (62 variants), `Token` struct | +| `src/error.rs` | `RableError` (Parse / MatchedPair) via thiserror | +| `src/context.rs` | Parsing context and state management | +| `src/lexer/` | Hand-written context-sensitive tokenizer | +| `src/lexer/quotes.rs` | Quote and escape handling | +| `src/lexer/heredoc.rs` | Here-document processing | +| `src/lexer/words.rs` | Word boundary detection | +| `src/lexer/word_builder.rs` | Word assembly with segments | +| `src/lexer/expansions.rs` | Parameter, command, arithmetic expansion parsing | +| `src/lexer/operators.rs` | Operator recognition | +| `src/parser/` | Recursive descent parser (top-level) | +| `src/parser/compound.rs` | Compound commands: if/while/for/case/select/coproc | +| `src/parser/conditional.rs` | Conditional expression parser `[[ ]]` | +| `src/parser/helpers.rs` | Common parsing utilities | +| `src/parser/word_parts.rs` | Word segment processing | +| `src/sexp/` | S-expression output via Display trait | +| `src/sexp/word.rs` | Word segment formatting | +| `src/sexp/ansi_c.rs` | ANSI-C quoting escapes | +| `src/format/` | Canonical bash reformatter (command substitution content) | | `src/python.rs` | PyO3 bindings (feature-gated) | -| `tests/` | Integration tests using Parable's .tests format | +| `tests/integration.rs` | Integration test runner for .tests files | +| `tests/parable/` | Parable test corpus (36 files, 1,604 tests) | +| `tests/oracle/` | bash-oracle compatibility tests (11 files) | ## Code Limits @@ -75,3 +86,5 @@ bash source code ``` Run with `cargo test`. The test runner reads `.tests` files, parses input, compares S-expression output. + +Oracle tests in `tests/oracle/` provide additional coverage from bash-oracle fuzzing. Run with `just test-oracle`. diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 4590244..4fdf83c 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -9,13 +9,19 @@ Thank you for your interest in contributing to Rable! This project is a Rust rei git clone https://github.com/mpecan/rable.git cd rable -# Run all checks +# Run all checks (format, lint, test) just check # Full setup including Python environment just setup ``` +### Requirements + +- **Rust 1.93+** — pinned in `rust-toolchain.toml`, installed automatically by rustup +- **Python 3.12+** — needed for Python bindings and fuzzing tools +- **[just](https://github.com/casey/just)** — task runner (`cargo install just` or `brew install just`) + ## Development Workflow 1. **Make your changes** @@ -33,16 +39,16 @@ All PRs must: - Pass all unit tests - Not introduce `unwrap()`, `expect()`, `panic!()`, or `todo!()` calls -## Code Style +### Code Limits -| Limit | Value | -|---|---| -| Line width | 100 chars | -| Function length | 60 lines max | -| Cognitive complexity | 15 max | -| Function arguments | 5 max | +| Limit | Value | Enforced by | +|---|---|---| +| Line width | 100 chars | `.rustfmt.toml` | +| Function length | 60 lines max | `clippy.toml` | +| Cognitive complexity | 15 max | `clippy.toml` | +| Function arguments | 5 max | `clippy.toml` | -These are enforced by `clippy.toml` and `.rustfmt.toml`. +These are enforced by clippy and rustfmt — `just check` will catch violations. ## Adding Support for New Bash Syntax @@ -61,23 +67,58 @@ These are enforced by `clippy.toml` and `.rustfmt.toml`. 4. **Run the full suite**: `just test-parable` must show 1604/1604 (or more, if you added tests) +### Running a specific test file + +To iterate quickly on a particular area: + +```bash +just test-file 12_command_substitution # run a single test file +``` + +The `RABLE_TEST` environment variable filters which test file to run. + ## Project Structure ``` src/ - lib.rs Public API - ast.rs AST node types - token.rs Token types - error.rs Error types - lexer/ Context-sensitive tokenizer - parser/ Recursive descent parser - sexp/ S-expression output - format/ Canonical bash reformatter - python.rs PyO3 bindings (feature-gated) + lib.rs Public API: parse() entry point, re-exports + ast.rs AST node types (NodeKind enum with 50+ variants) + token.rs Token types and lexer output + error.rs Error types (ParseError, MatchedPairError) + context.rs Parsing context and state management + lexer/ Context-sensitive tokenizer + mod.rs Main lexer loop + quotes.rs Quote and escape handling + heredoc.rs Here-document processing + words.rs Word boundary detection + word_builder.rs Word assembly with segments + expansions.rs Parameter/command/arithmetic expansion parsing + operators.rs Operator recognition + tests.rs Lexer unit tests + parser/ Recursive descent parser + mod.rs Top-level parsing + compound.rs if/while/for/case/select/coproc + conditional.rs [[ ]] expression parsing + helpers.rs Common parsing utilities + word_parts.rs Word segment processing + tests.rs Parser unit tests + sexp/ S-expression output + mod.rs Main formatter + word.rs Word segment formatting + ansi_c.rs ANSI-C quoting escapes + format/ Canonical bash reformatter + mod.rs Used for command substitution content + python.rs PyO3 bindings (feature-gated under "python") tests/ - integration.rs Parable compatibility test runner - parable/ Test corpus from the Parable project - benchmark.py Performance benchmark + integration.rs Test runner for .tests files + parable/ Parable test corpus (36 files, 1,604 tests) + oracle/ bash-oracle compatibility tests (11 files) + run_tests.py Python test harness for Parable compatibility + benchmark.py Performance comparison vs Parable + fuzz.py Differential fuzzer (mutate/generate/minimize modes) + generate_oracle_tests.py Generate oracle tests from bash-oracle +examples/ + basic.rs Basic usage example ``` ## Python Bindings @@ -85,11 +126,41 @@ tests/ The Python bindings are feature-gated under `python` and built with [maturin](https://www.maturin.rs/): ```bash +just venv # create virtual environment (one-time) just develop # build and install in development mode just test-python # run Parable's own test runner against Rable just benchmark # compare performance ``` +## Fuzzing + +Rable includes a differential fuzzer that compares output against Parable to catch edge-case divergences: + +```bash +just setup # one-time Python environment setup +just fuzz-mutate 50000 # mutate existing test inputs (default: 10,000) +just fuzz-generate 10000 # generate random bash fragments (default: 5,000) +just fuzz-minimize 'input' # minimize a failing input +``` + +If you find a divergence, you can generate oracle tests from it: + +```bash +just fuzz-generate-tests # regenerate oracle test files (requires bash-oracle) +just test-oracle # run oracle test suite +``` + +## CI + +CI runs on every push to `main` and on pull requests. It includes: + +- **Lint** — format check + clippy +- **Test** — full Rust test suite + oracle compatibility report +- **Python** — build PyO3 bindings, run Python tests, Parable compatibility +- **Benchmark** — performance comparison vs Parable (PRs only) + +Run `just ci` locally to replicate what CI does. + ## Questions? Open an issue on GitHub. We're happy to help! diff --git a/README.md b/README.md index 5eb2d29..b4c92f2 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,11 @@ # Rable +[![CI](https://github.com/mpecan/rable/actions/workflows/ci.yml/badge.svg)](https://github.com/mpecan/rable/actions/workflows/ci.yml) +[![crates.io](https://img.shields.io/crates/v/rable.svg)](https://crates.io/crates/rable) +[![docs.rs](https://docs.rs/rable/badge.svg)](https://docs.rs/rable) +[![PyPI](https://img.shields.io/pypi/v/rable.svg)](https://pypi.org/project/rable/) +[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE) + **A complete GNU Bash 5.3-compatible parser, written in Rust.** Rable is a from-scratch reimplementation of [Parable](https://github.com/ldayton/Parable) — the excellent Python-based bash parser by [@ldayton](https://github.com/ldayton). It produces identical S-expression output and provides a drop-in replacement Python API via [PyO3](https://pyo3.rs). @@ -25,6 +31,8 @@ Rable exists because Parable showed the way. Thank you. | Parable test compatibility | **1,604 / 1,604 (100%)** | | Test files at 100% | **36 / 36** | | S-expression output | Identical to Parable | +| Minimum Rust version | 1.93 | +| Python version | 3.12+ | ## Performance @@ -67,17 +75,44 @@ just develop # rebuild after code changes ### Rust ```rust -use rable::parse; +use rable::{parse, NodeKind}; fn main() { + // Parse bash source into AST nodes let nodes = parse("echo hello | grep h", false).unwrap(); for node in &nodes { println!("{node}"); } // Output: (pipe (command (word "echo") (word "hello")) (command (word "grep") (word "h"))) + + // Inspect the AST via pattern matching + if let NodeKind::Pipeline { commands, .. } = &nodes[0].kind { + println!("Pipeline with {} stages", commands.len()); + } + + // Enable extended glob patterns (@(), ?(), *(), +(), !()) + let nodes = parse("echo @(foo|bar)", true).unwrap(); + println!("{}", nodes[0]); } ``` +**Error handling:** + +```rust +match rable::parse("if", false) { + Ok(nodes) => { /* use nodes */ } + Err(e) => { + eprintln!("line {}, pos {}: {}", e.line(), e.pos(), e.message()); + } +} +``` + +See [`examples/basic.rs`](examples/basic.rs) for a more complete example, or run it with: + +```bash +cargo run --example basic +``` + ### Python ```python @@ -109,26 +144,101 @@ from parable import parse, ParseError, MatchedPairError from rable import parse, ParseError, MatchedPairError ``` +## API Reference + +### `parse(source, extglob) -> Vec` + +The main entry point. Parses a bash source string into a list of top-level AST nodes. + +- **`source`** — the bash source code to parse +- **`extglob`** — set to `true` to enable extended glob patterns (`@()`, `?()`, `*()`, `+()`, `!()`) +- **Returns** — `Vec`, where each top-level command separated by newlines is a separate node +- **Errors** — `RableError::Parse` for syntax errors, `RableError::MatchedPair` for unclosed delimiters + +### AST Types + +The AST is built from `Node` structs, each containing a `NodeKind` variant and a source `Span`: + +```rust +use rable::{Node, NodeKind, Span}; +``` + +**Key `NodeKind` variants:** + +| Category | Variants | +|---|---| +| **Basic** | `Word`, `Command`, `Pipeline`, `List` | +| **Compound** | `If`, `While`, `Until`, `For`, `ForArith`, `Select`, `Case`, `Function`, `Subshell`, `BraceGroup`, `Coproc` | +| **Redirections** | `Redirect`, `HereDoc` | +| **Expansions** | `ParamExpansion`, `ParamLength`, `ParamIndirect`, `CommandSubstitution`, `ProcessSubstitution`, `ArithmeticExpansion`, `AnsiCQuote`, `LocaleString` | +| **Arithmetic** | `ArithmeticCommand`, `ArithNumber`, `ArithVar`, `ArithBinaryOp`, `ArithUnaryOp`, `ArithTernary`, `ArithAssign`, and more | +| **Conditionals** | `ConditionalExpr`, `UnaryTest`, `BinaryTest`, `CondAnd`, `CondOr`, `CondNot` | +| **Other** | `Negation`, `Time`, `Array`, `Comment`, `Empty` | + +Every node implements `Display`, producing S-expression output identical to Parable. + +### Error Types + +```rust +use rable::{RableError, Result}; +``` + +Both error variants provide `.line()`, `.pos()`, and `.message()` accessors: + +- **`RableError::Parse`** — syntax error (e.g., unexpected token, missing keyword) +- **`RableError::MatchedPair`** — unclosed delimiter (parenthesis, brace, bracket, or quote) + +## Architecture + +Rable is a hand-written recursive descent parser with a context-sensitive lexer: + +``` +Source string + -> Lexer (context-sensitive tokenizer) + -> Parser (recursive descent) + -> AST (Node tree) + -> S-expression output (via Display) +``` + +| Module | Responsibility | +|---|---| +| `lexer/` | Context-sensitive tokenizer with heredoc, quote, and expansion handling | +| `parser/` | Recursive descent parser for all bash constructs | +| `ast.rs` | 50+ AST node types covering the full bash grammar | +| `token.rs` | Token types and lexer output | +| `error.rs` | Error types with line/position information | +| `context.rs` | Parsing context and state management | +| `sexp/` | S-expression output with word segment processing | +| `format/` | Canonical bash reformatter (used for command substitution content) | +| `python.rs` | PyO3 bindings (feature-gated under `python`) | + +### Design principles + +1. **Compatibility is correctness** — output matches Parable's S-expressions exactly +2. **If it is not tested, it is not shipped** — 1,604 integration tests + unit tests +3. **Simplicity is king** — solve problems with least complexity +4. **Correctness over speed** — match bash-oracle behavior, optimize later + ## Development ### Prerequisites -- Rust 1.93+ (see `rust-toolchain.toml`) +- Rust 1.93+ (pinned in `rust-toolchain.toml`) - Python 3.12+ (for Python bindings) - [just](https://github.com/casey/just) (task runner) ### Quick start ```bash +git clone https://github.com/mpecan/rable.git +cd rable just # format, lint, test -just check # same as above -just test-parable # run full Parable compatibility suite -just setup # set up Python environment + benchmarks -just benchmark # compare performance vs Parable ``` ### Available commands +**Core development:** + | Command | Description | |---|---| | `just` | Format, lint, and test (default) | @@ -136,61 +246,81 @@ just benchmark # compare performance vs Parable | `just clippy` | Run clippy with strict settings | | `just test` | Run all Rust tests | | `just test-parable` | Run Parable compatibility suite | -| `just test-file NAME` | Run a specific test file | -| `just setup` | Full Python environment setup | -| `just develop` | Build and install Python bindings | -| `just test-python` | Run Parable's test runner with Rable | -| `just benchmark` | Performance benchmark vs Parable | +| `just test-file NAME` | Run a specific test file (e.g., `just test-file 12_command_substitution`) | +| `just check` | Same as `just` — format, lint, test | | `just ci` | Run exactly what CI runs | -| `just clean` | Clean build artifacts | -## Architecture +**Python bindings:** -Rable is a hand-written recursive descent parser with a context-sensitive lexer: +| Command | Description | +|---|---| +| `just setup` | Full setup: venv + bindings + Parable for comparison | +| `just venv` | Create Python virtual environment with maturin | +| `just develop` | Build and install Python bindings in dev mode | +| `just test-python` | Run Parable's test runner against Rable's Python bindings | +| `just benchmark` | Performance benchmark vs Parable | +| `just wheel` | Build a release Python wheel | -| Module | Responsibility | +**Fuzzing and oracle testing:** + +| Command | Description | |---|---| -| `lexer/` | Context-sensitive tokenizer with heredoc, quote, and expansion handling | -| `parser/` | Recursive descent parser for all bash constructs | -| `ast.rs` | 50+ AST node types covering the full bash grammar | -| `sexp/` | S-expression output with word segment processing | -| `format/` | Canonical bash reformatter (used for command substitution content) | -| `python.rs` | PyO3 bindings (feature-gated) | +| `just fuzz-mutate [N]` | Differential fuzzer: mutate existing test inputs (default 10,000 iterations) | +| `just fuzz-generate [N]` | Differential fuzzer: generate random bash fragments (default 5,000) | +| `just fuzz-minimize INPUT` | Minimize a failing fuzzer input to its smallest form | +| `just fuzz-generate-tests` | Regenerate oracle test cases from bash-oracle fuzzing | +| `just test-oracle` | Run the bash-oracle compatibility test suite | +| `just build-oracle` | Build bash-oracle from source (requires autotools) | -### Design principles +**Cleanup:** -1. **Compatibility is correctness** — output matches Parable's S-expressions exactly -2. **If it is not tested, it is not shipped** — 1,604 integration tests + unit tests -3. **Simplicity is king** — solve problems with least complexity -4. **Correctness over speed** — match bash-oracle behavior, optimize later +| Command | Description | +|---|---| +| `just clean` | Clean build artifacts and venv | -## Contributing +## Testing -Contributions are welcome! Please ensure: +### Test corpus -1. **All tests pass**: `just check` must succeed -2. **Parable compatibility**: `just test-parable` must show 1604/1604 -3. **Code quality**: No clippy warnings (`just clippy`) -4. **Formatting**: Code is formatted (`just fmt`) +Tests live in `tests/parable/` using Parable's `.tests` format: -### Adding new features +``` +=== test name +bash source code +--- +(expected s-expression) +--- +``` -If you're adding support for new bash syntax: +There are 36 test files covering words, commands, pipelines, lists, redirects, compound statements, loops, functions, expansions, arithmetic, here-documents, process substitution, conditionals, arrays, and more. -1. Add test cases to the appropriate `.tests` file in `tests/parable/` -2. Implement the lexer/parser changes -3. Verify S-expression output matches what Parable would produce -4. Run `just test-parable` to confirm no regressions +### Oracle tests -### Code limits +Additional tests in `tests/oracle/` are generated from `bash-oracle` differential fuzzing. These provide extra coverage beyond Parable's test suite and are run separately: -| Limit | Value | -|---|---| -| Line width | 100 chars | -| Function length | 60 lines | -| Cognitive complexity | 15 | -| Function arguments | 5 | -| Clippy | `deny(unwrap_used, expect_used, panic, todo)` | +```bash +just test-oracle +``` + +### Differential fuzzing + +The fuzzer (`tests/fuzz.py`) compares Rable's output against Parable on randomly generated or mutated bash inputs, catching edge-case divergences: + +```bash +just setup # one-time: set up Python environment +just fuzz-mutate 50000 # mutate existing test inputs +just fuzz-generate 10000 # generate random bash fragments +``` + +## Contributing + +Contributions are welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for the full guide. The short version: + +1. **All tests pass**: `just check` must succeed +2. **Parable compatibility**: `just test-parable` must show 1604/1604 +3. **Code quality**: No clippy warnings (`just clippy`) +4. **Formatting**: Code is formatted (`just fmt`) +5. **Commit style**: [Conventional Commits](https://www.conventionalcommits.org/) (`feat`, `fix`, `refactor`, `test`, `docs`, `chore`) ## License