Tests are how humans trust AI.
The human verification layer for BMB — because contracts speak machine, but tests speak human.
BMB solves safety through contracts:
fn sort(arr: &[i32]) -> Vec<i32>
post forall i: 0..ret.len()-1. ret[i] <= ret[i+1]
post forall x in arr. count(ret, x) == count(arr, x)
The compiler proves this is safe. But can you verify it's correct?
- Can you read
forall i: 0..ret.len()-1in 3 seconds? - Do you trust AI-generated contracts you can't understand?
- How do you know the AI understood what you wanted?
If humans can't verify the code, humans can't trust the code.
Tests speak human:
#[example]
fn sort_behavior() {
assert(sort(&[3, 1, 2]) == [1, 2, 3]);
assert(sort(&[]) == []);
assert(sort(&[5, 5, 5]) == [5, 5, 5]);
}
Now you know what the code does. In 3 seconds.
| Role | Traditional | BMB |
|---|---|---|
| Find bugs | Tests | Contracts (compile-time) |
| Prove safety | Tests | Contracts (Z3 verified) |
| Check bounds | Tests | Contracts (pre idx < len) |
| Null safety | Tests | Contracts (T? + pre) |
| Human understanding | — | Tests ← new role |
In BMB, tests don't find bugs. Tests build trust.
┌─────────────────────────────────────────────────────────┐
│ │
│ Machine Language Human Language │
│ │
│ post forall i: 0..n-1. [3,1,2] → [1,2,3] │
│ arr[i] <= arr[i+1] [1] → [1] │
│ [] → [] │
│ │
│ ↓ ↓ │
│ Z3 Solver verifies Human brain verifies │
│ │
└─────────────────────────────────────────────────────────┘
Both must agree. Both must pass. Then you can trust.
Human-readable input/output pairs. The primary trust interface.
#[example]
fn binary_search_behavior() {
let arr = [1, 3, 5, 7, 9];
// Found
assert(binary_search(&arr, 5) == Some(2));
assert(binary_search(&arr, 1) == Some(0));
// Not found
assert(binary_search(&arr, 4) == None);
assert(binary_search(&arr, 0) == None);
// Edge cases
assert(binary_search(&[], 1) == None);
assert(binary_search(&[1], 1) == Some(0));
}
Rule: If a human can't verify the example in 3 seconds, it's too complex.
Natural language-like flows for domain logic:
#[scenario]
fn user_checkout_flow() {
let cart = Cart::new();
// Add items
cart.add("Apple", quantity: 3, price: 1000);
cart.add("Banana", quantity: 2, price: 500);
assert(cart.total() == 4000);
// Apply discount
cart.apply_coupon("SAVE10"); // 10% off
assert(cart.total() == 3600);
// Checkout
let order = cart.checkout(user: "alice");
assert(order.status == "confirmed");
assert(cart.is_empty());
}
Compare AI-generated code against simple, obviously-correct reference:
#[equivalence]
fn sort_matches_reference() {
// Simple bubble sort - slow but obviously correct
fn reference_sort(arr: &[i32]) -> Vec<i32> {
let mut v = arr.to_vec();
for i in 0..v.len() {
for j in i+1..v.len() {
if v[i] > v[j] {
swap(&mut v[i], &mut v[j]);
}
}
}
return v;
}
// AI implementation must match reference
assert_equivalent(
optimized_sort,
reference_sort,
inputs: random_arrays(100)
);
}
Verify that contracts actually express human intent:
#[contract_audit]
fn sort_contract_makes_sense() {
// The contract says "sorted output"
// Does it really mean what we think?
// This should pass
assert_contract_accepts(sort,
input: [3, 1, 2],
output: [1, 2, 3]
);
// This should fail
assert_contract_rejects(sort,
input: [3, 1, 2],
output: [1, 2] // Missing element!
);
// This should also fail
assert_contract_rejects(sort,
input: [3, 1, 2],
output: [3, 2, 1] // Wrong order!
);
}
For complex outputs that are hard to specify but easy to verify visually:
#[golden("snapshots/parse_json.golden")]
fn json_parser_output() {
let input = r#"{"name": "alice", "age": 30}"#;
return format_ast(parse_json(input));
}
// snapshots/parse_json.golden
Object {
"name": String("alice"),
"age": Number(30)
}
Human reviews the golden file once. Future changes diff against it.
┌─────────────────────────────────────────────────────────┐
│ STEP 1: Human writes tests │
│ (what I want) │
├─────────────────────────────────────────────────────────┤
│ │
│ #[example] │
│ fn sort_behavior() { │
│ assert(sort(&[3,1,2]) == [1,2,3]); │
│ } │
│ │
└───────────────────────────┬─────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ STEP 2: AI generates code │
│ (implementation + contracts) │
├─────────────────────────────────────────────────────────┤
│ │
│ fn sort(arr: &[i32]) -> Vec<i32> │
│ post forall i: 0..ret.len()-1. ret[i] <= ret[i+1] │
│ { /* AI implementation */ } │
│ │
└───────────────────────────┬─────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ STEP 3: Dual verification │
├─────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ bmb verify │ │ bmb test │ │
│ │ (contracts) │ │ (examples) │ │
│ │ │ │ │ │
│ │ ✓ Z3 proof │ │ ✓ Human intent │ │
│ │ complete │ │ satisfied │ │
│ └────────┬────────┘ └────────┬────────┘ │
│ │ │ │
│ └──────────┬───────────┘ │
│ ▼ │
│ BOTH PASS = TRUST │
│ │
└─────────────────────────────────────────────────────────┘
# Run all example tests
bmb test
# Run specific test file
bmb test tests/sort_examples.bmb
# Run with coverage report (which examples cover which functions)
bmb test --coverage
# Verify examples align with contracts
bmb test --audit
# Generate missing examples from contracts (AI-assisted)
bmb test --generate-examples
# Watch mode
bmb test --watchTraditional coverage: "Which lines were executed?" BMB coverage: "Which behaviors are demonstrated?"
$ bmb test --coverage
sort:
✓ empty array
✓ single element
✓ already sorted
✓ reverse sorted
✓ duplicates
✗ large array (>1000 elements) # missing example
binary_search:
✓ found at start
✓ found at middle
✓ found at end
✓ not found (too small)
✓ not found (too large)
✗ not found (between elements) # missing example$ bmb test --audit
sort:
Contract: post forall i: 0..ret.len()-1. ret[i] <= ret[i+1]
✓ Examples demonstrate: ascending order
✗ Missing demonstration: preserves all elements
Suggestion: add example with duplicates
Contract: post ret.len() == arr.len()
✓ Examples demonstrate: length preservationWhen you have contracts but no examples:
$ bmb test --generate-examples sort
Generated examples for sort:
#[example]
fn sort_generated() {
// Basic case
assert(sort(&[3, 1, 2]) == [1, 2, 3]);
// Empty
assert(sort(&[]) == []);
// Single element
assert(sort(&[42]) == [42]);
// Already sorted
assert(sort(&[1, 2, 3]) == [1, 2, 3]);
// Duplicates
assert(sort(&[2, 1, 2]) == [1, 2, 2]);
}
Review and save? [y/n]Human reviews, possibly edits, then saves. AI proposes, human approves.
// 1. Human writes examples (the specification)
#[example]
fn factorial_behavior() {
assert(factorial(0) == 1);
assert(factorial(1) == 1);
assert(factorial(5) == 120);
}
// 2. AI generates implementation + contracts
// 3. Both verify
// ✓ Good - verifiable in 3 seconds
assert(sort(&[3, 1, 2]) == [1, 2, 3]);
// ✗ Bad - requires mental computation
assert(sort(&[9,2,7,4,1,8,3,6,5]) == [1,2,3,4,5,6,7,8,9]);
#[example]
fn divide_examples() {
// Normal
assert(divide(10, 2) == 5);
// Edge: zero numerator
assert(divide(0, 5) == 0);
// Edge: result is zero
assert(divide(3, 5) == 0); // integer division
// Note: divide(x, 0) is prevented by contract
// No example needed - it cannot happen
}
// ✗ Unnecessary - contract already guarantees this
#[test]
fn sort_returns_sorted() {
let result = sort(&[3, 1, 2]);
for i in 0..result.len()-1 {
assert(result[i] <= result[i+1]); // This IS the contract!
}
}
// ✓ Useful - demonstrates behavior to humans
#[example]
fn sort_examples() {
assert(sort(&[3, 1, 2]) == [1, 2, 3]);
}
AI uses tests as specification when generating code:
Human: "Implement sort"
AI: [reads #[example] tests] → understands intent
AI: [generates code + contracts]
AI: [runs bmb test] → verifies against human intent
- uses: lang-bmb/action-bmb@v1
with:
command: test --audit
fail-if-missing-examples: true- Inline example results
- "Generate example" code action
- Contract ↔ Example alignment warnings
Properties are powerful but unreadable:
// Property - correct but hard to verify
#[property]
fn sort_is_permutation(arr: Vec<i32>) {
assert(is_permutation(sort(&arr), arr));
}
// Example - instantly verifiable
#[example]
fn sort_preserves_elements() {
assert(sort(&[3, 1, 2]) == [1, 2, 3]); // all elements present
}
Properties are for machines. Examples are for humans.
Mutation testing asks: "Would tests catch this bug?"
In BMB, contracts catch bugs. Tests build trust.
Integration tests remain valuable — they verify module interactions that contracts can't express:
#[integration]
fn api_returns_sorted_users() {
let response = http_get("/api/users?sort=name");
assert(response.status == 200);
assert(is_sorted(response.body.users, by: "name"));
}
MIT
bmb-test is written in BMB itself. This is intentional:
- Dogfooding: Proves BMB is mature enough for real tools
- Consistency: The entire BMB ecosystem uses one language
- Demonstration: bmb-test source code serves as BMB examples
bmb-test tests itself using #[example] tests.
Contracts prove safety. Tests prove intent.
Both must pass before you trust.