bmb-test

Tests are how humans trust AI.

The human verification layer for BMB — because contracts speak machine, but tests speak human.

The Problem

BMB solves safety through contracts:

fn sort(arr: &[i32]) -> Vec<i32>
  post forall i: 0..ret.len()-1. ret[i] <= ret[i+1]
  post forall x in arr. count(ret, x) == count(arr, x)

The compiler proves this is safe. But can you verify it's correct?

Can you read forall i: 0..ret.len()-1 in 3 seconds?
Do you trust AI-generated contracts you can't understand?
How do you know the AI understood what you wanted?

If humans can't verify the code, humans can't trust the code.

The Solution

Tests speak human:

#[example]
fn sort_behavior() {
    assert(sort(&[3, 1, 2]) == [1, 2, 3]);
    assert(sort(&[]) == []);
    assert(sort(&[5, 5, 5]) == [5, 5, 5]);
}

Now you know what the code does. In 3 seconds.

Philosophy

Traditional Testing vs BMB Testing

Role	Traditional	BMB
Find bugs	Tests	Contracts (compile-time)
Prove safety	Tests	Contracts (Z3 verified)
Check bounds	Tests	Contracts (`pre idx < len`)
Null safety	Tests	Contracts (`T?` + `pre`)
Human understanding	—	Tests ← new role

In BMB, tests don't find bugs. Tests build trust.

Two Languages for Two Audiences

┌─────────────────────────────────────────────────────────┐
│                                                         │
│   Machine Language              Human Language          │
│                                                         │
│   post forall i: 0..n-1.       [3,1,2] → [1,2,3]       │
│     arr[i] <= arr[i+1]         [1] → [1]               │
│                                 [] → []                 │
│                                                         │
│   ↓                             ↓                       │
│   Z3 Solver verifies           Human brain verifies    │
│                                                         │
└─────────────────────────────────────────────────────────┘

Both must agree. Both must pass. Then you can trust.

Test Types

1. `#[example]` — The Core

Human-readable input/output pairs. The primary trust interface.

#[example]
fn binary_search_behavior() {
    let arr = [1, 3, 5, 7, 9];
    
    // Found
    assert(binary_search(&arr, 5) == Some(2));
    assert(binary_search(&arr, 1) == Some(0));
    
    // Not found
    assert(binary_search(&arr, 4) == None);
    assert(binary_search(&arr, 0) == None);
    
    // Edge cases
    assert(binary_search(&[], 1) == None);
    assert(binary_search(&[1], 1) == Some(0));
}

Rule: If a human can't verify the example in 3 seconds, it's too complex.

2. `#[scenario]` — Business Logic

Natural language-like flows for domain logic:

#[scenario]
fn user_checkout_flow() {
    let cart = Cart::new();
    
    // Add items
    cart.add("Apple", quantity: 3, price: 1000);
    cart.add("Banana", quantity: 2, price: 500);
    assert(cart.total() == 4000);
    
    // Apply discount
    cart.apply_coupon("SAVE10");  // 10% off
    assert(cart.total() == 3600);
    
    // Checkout
    let order = cart.checkout(user: "alice");
    assert(order.status == "confirmed");
    assert(cart.is_empty());
}

3. `#[equivalence]` — Reference Implementation

Compare AI-generated code against simple, obviously-correct reference:

#[equivalence]
fn sort_matches_reference() {
    // Simple bubble sort - slow but obviously correct
    fn reference_sort(arr: &[i32]) -> Vec<i32> {
        let mut v = arr.to_vec();
        for i in 0..v.len() {
            for j in i+1..v.len() {
                if v[i] > v[j] { 
                    swap(&mut v[i], &mut v[j]); 
                }
            }
        }
        return v;
    }
    
    // AI implementation must match reference
    assert_equivalent(
        optimized_sort,
        reference_sort,
        inputs: random_arrays(100)
    );
}

4. `#[contract_audit]` — Contract Readability

Verify that contracts actually express human intent:

#[contract_audit]
fn sort_contract_makes_sense() {
    // The contract says "sorted output"
    // Does it really mean what we think?
    
    // This should pass
    assert_contract_accepts(sort, 
        input: [3, 1, 2], 
        output: [1, 2, 3]
    );
    
    // This should fail
    assert_contract_rejects(sort,
        input: [3, 1, 2],
        output: [1, 2]  // Missing element!
    );
    
    // This should also fail
    assert_contract_rejects(sort,
        input: [3, 1, 2],
        output: [3, 2, 1]  // Wrong order!
    );
}

5. `#[golden]` — Snapshot Testing

For complex outputs that are hard to specify but easy to verify visually:

#[golden("snapshots/parse_json.golden")]
fn json_parser_output() {
    let input = r#"{"name": "alice", "age": 30}"#;
    return format_ast(parse_json(input));
}

// snapshots/parse_json.golden
Object {
  "name": String("alice"),
  "age": Number(30)
}

Human reviews the golden file once. Future changes diff against it.

Workflow

┌─────────────────────────────────────────────────────────┐
│                  STEP 1: Human writes tests             │
│                     (what I want)                       │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  #[example]                                             │
│  fn sort_behavior() {                                   │
│      assert(sort(&[3,1,2]) == [1,2,3]);                │
│  }                                                      │
│                                                         │
└───────────────────────────┬─────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────┐
│                  STEP 2: AI generates code              │
│                (implementation + contracts)             │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  fn sort(arr: &[i32]) -> Vec<i32>                      │
│    post forall i: 0..ret.len()-1. ret[i] <= ret[i+1]  │
│  { /* AI implementation */ }                           │
│                                                         │
└───────────────────────────┬─────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────┐
│                  STEP 3: Dual verification              │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  ┌─────────────────┐    ┌─────────────────┐            │
│  │  bmb verify     │    │  bmb test       │            │
│  │  (contracts)    │    │  (examples)     │            │
│  │                 │    │                 │            │
│  │  ✓ Z3 proof    │    │  ✓ Human intent │            │
│  │    complete     │    │    satisfied    │            │
│  └────────┬────────┘    └────────┬────────┘            │
│           │                      │                      │
│           └──────────┬───────────┘                      │
│                      ▼                                  │
│              BOTH PASS = TRUST                          │
│                                                         │
└─────────────────────────────────────────────────────────┘

Commands

# Run all example tests
bmb test

# Run specific test file
bmb test tests/sort_examples.bmb

# Run with coverage report (which examples cover which functions)
bmb test --coverage

# Verify examples align with contracts
bmb test --audit

# Generate missing examples from contracts (AI-assisted)
bmb test --generate-examples

# Watch mode
bmb test --watch

Coverage: Example Coverage, Not Line Coverage

Traditional coverage: "Which lines were executed?" BMB coverage: "Which behaviors are demonstrated?"

$ bmb test --coverage

sort:
  ✓ empty array
  ✓ single element
  ✓ already sorted
  ✓ reverse sorted
  ✓ duplicates
  ✗ large array (>1000 elements)  # missing example
  
binary_search:
  ✓ found at start
  ✓ found at middle
  ✓ found at end
  ✓ not found (too small)
  ✓ not found (too large)
  ✗ not found (between elements)  # missing example

Contract Audit Report

$ bmb test --audit

sort:
  Contract: post forall i: 0..ret.len()-1. ret[i] <= ret[i+1]
  
  ✓ Examples demonstrate: ascending order
  ✗ Missing demonstration: preserves all elements
    Suggestion: add example with duplicates
    
  Contract: post ret.len() == arr.len()
  
  ✓ Examples demonstrate: length preservation

Example Generation (AI-Assisted)

When you have contracts but no examples:

$ bmb test --generate-examples sort

Generated examples for sort:

#[example]
fn sort_generated() {
    // Basic case
    assert(sort(&[3, 1, 2]) == [1, 2, 3]);
    
    // Empty
    assert(sort(&[]) == []);
    
    // Single element
    assert(sort(&[42]) == [42]);
    
    // Already sorted
    assert(sort(&[1, 2, 3]) == [1, 2, 3]);
    
    // Duplicates
    assert(sort(&[2, 1, 2]) == [1, 2, 2]);
}

Review and save? [y/n]

Human reviews, possibly edits, then saves. AI proposes, human approves.

Best Practices

DO: Write Examples First

// 1. Human writes examples (the specification)
#[example]
fn factorial_behavior() {
    assert(factorial(0) == 1);
    assert(factorial(1) == 1);
    assert(factorial(5) == 120);
}

// 2. AI generates implementation + contracts
// 3. Both verify

DO: Keep Examples Simple

// ✓ Good - verifiable in 3 seconds
assert(sort(&[3, 1, 2]) == [1, 2, 3]);

// ✗ Bad - requires mental computation
assert(sort(&[9,2,7,4,1,8,3,6,5]) == [1,2,3,4,5,6,7,8,9]);

DO: Cover Edge Cases

#[example]
fn divide_examples() {
    // Normal
    assert(divide(10, 2) == 5);
    
    // Edge: zero numerator
    assert(divide(0, 5) == 0);
    
    // Edge: result is zero
    assert(divide(3, 5) == 0);  // integer division
    
    // Note: divide(x, 0) is prevented by contract
    // No example needed - it cannot happen
}

DON'T: Test What Contracts Guarantee

// ✗ Unnecessary - contract already guarantees this
#[test]
fn sort_returns_sorted() {
    let result = sort(&[3, 1, 2]);
    for i in 0..result.len()-1 {
        assert(result[i] <= result[i+1]);  // This IS the contract!
    }
}

// ✓ Useful - demonstrates behavior to humans
#[example]
fn sort_examples() {
    assert(sort(&[3, 1, 2]) == [1, 2, 3]);
}

Integration

With bmb-mcp (Chatter)

AI uses tests as specification when generating code:

Human: "Implement sort"
AI: [reads #[example] tests] → understands intent
AI: [generates code + contracts]
AI: [runs bmb test] → verifies against human intent

With CI (action-bmb)

- uses: lang-bmb/action-bmb@v1
  with:
    command: test --audit
    fail-if-missing-examples: true

With IDE (vscode-bmb)

Inline example results
"Generate example" code action
Contract ↔ Example alignment warnings

FAQ

Why not property-based testing?

Properties are powerful but unreadable:

// Property - correct but hard to verify
#[property]
fn sort_is_permutation(arr: Vec<i32>) {
    assert(is_permutation(sort(&arr), arr));
}

// Example - instantly verifiable
#[example]
fn sort_preserves_elements() {
    assert(sort(&[3, 1, 2]) == [1, 2, 3]);  // all elements present
}

Properties are for machines. Examples are for humans.

Why not mutation testing?

Mutation testing asks: "Would tests catch this bug?"

In BMB, contracts catch bugs. Tests build trust.

What about integration tests?

Integration tests remain valuable — they verify module interactions that contracts can't express:

#[integration]
fn api_returns_sorted_users() {
    let response = http_get("/api/users?sort=name");
    assert(response.status == 200);
    assert(is_sorted(response.body.users, by: "name"));
}

License

MIT

Note: Self-Hosting

bmb-test is written in BMB itself. This is intentional:

Dogfooding: Proves BMB is mature enough for real tools
Consistency: The entire BMB ecosystem uses one language
Demonstration: bmb-test source code serves as BMB examples

bmb-test tests itself using #[example] tests.

Contracts prove safety. Tests prove intent.
_{Both must pass before you trust.}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
docs		docs
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

bmb-test

The Problem

The Solution

Philosophy

Traditional Testing vs BMB Testing

Two Languages for Two Audiences

Test Types

1. #[example] — The Core

2. #[scenario] — Business Logic

3. #[equivalence] — Reference Implementation

4. #[contract_audit] — Contract Readability

5. #[golden] — Snapshot Testing

Workflow

Commands

Coverage: Example Coverage, Not Line Coverage

Contract Audit Report

Example Generation (AI-Assisted)

Best Practices

DO: Write Examples First

DO: Keep Examples Simple

DO: Cover Edge Cases

DON'T: Test What Contracts Guarantee

Integration

With bmb-mcp (Chatter)

With CI (action-bmb)

With IDE (vscode-bmb)

FAQ

Why not property-based testing?

Why not mutation testing?

What about integration tests?

License

Note: Self-Hosting

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

1. `#[example]` — The Core

2. `#[scenario]` — Business Logic

3. `#[equivalence]` — Reference Implementation

4. `#[contract_audit]` — Contract Readability

5. `#[golden]` — Snapshot Testing

Packages