forked from Plonky3/Plonky3
-
Notifications
You must be signed in to change notification settings - Fork 1
feat(lifted): FRI #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
adr1anh
wants to merge
70
commits into
adr1anh/main
Choose a base branch
from
adr1anh/lifted-fri
base: adr1anh/main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Introduces a StatefulSponge trait that enables incremental absorb/squeeze operations on sponge state. The trait provides default implementations that process input in chunks of RATE with zero-padding. - Add StatefulSponge trait with absorb() and squeeze() methods - Implement for PaddingFreeSponge maintaining backward compatibility - Absorb uses overwrite mode with automatic zero-padding for partial chunks - Squeeze extracts first OUT elements from state This enables stateful hashing for use cases like Merkle tree row hashing where large inputs need to be processed incrementally. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…tion - Add public `build_matrix_digest_layers` API with comprehensive docs - Optimize `compress_uniform` by checking `next_len < P::WIDTH` to eliminate unnecessary scalar fallback (powers of 2 guarantee no remainder) - Add detailed documentation for pack/unpack helper functions - Make internal functions private, expose only top-level API - Change `build_uniform_leaves` to take `&[M]` with runtime sort verification - Fix clippy warnings: remove redundant clones, use iterator patterns - Add compile-time assertions to StatefulSponge (RATE < WIDTH, OUT < WIDTH) - Simplify zero-padding with `fill()` method 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Introduce a new `UniformMerkleTree` for power-of-2 height matrices using incremental hashing via StatefulSponge. Key design decisions: - **Removed W and PW type parameters**: Since StatefulSponge operates on a single field type F (unlike CryptographicHasher which can hash F→W), we use F for both matrix elements and digests. This eliminates 2 type params and removes the need for F→W conversion overhead. - **Struct**: UniformMerkleTree<F, M, DIGEST_ELEMS> stores leaves and digest layers, matching MerkleTree's structure but with simplified generics. - **Constructor new()**: Inlines leaf digest building and layer compression. Verifies matrices are sorted (shortest→tallest) with power-of-2 heights. Uses state upsampling algorithm where sponge states duplicate as matrices grow in height, ensuring uniform hash state evolution. - **Method root()**: Returns Hash<F, F, DIGEST_ELEMS> containing root digest. Benefits over separate helper functions: - 2 fewer type parameters on struct (F, M vs F, W, M) - 1 fewer type parameter in new() (P vs P, PW) - No F→W conversion needed - Clearer semantics matching StatefulSponge's single-type operation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…rehensive tests State upsampling bug fix: - Hoisted state upsampling logic outside scalar/packed branch in build_uniform_leaves - Now applies to both paths when height > active_height - Fixes bug where small heights (< P::WIDTH) would incorrectly use default states instead of inheriting parent sponge states - Updated reference implementation with same fix Test improvements: - Added small_heights_regression test for heights below P::WIDTH - Added random_matrices_match_concatenated_reference with 5 test scenarios: * Various height progressions: [1,2,4,8], [2,4,8,16], etc. * Duplicate heights to test [1,1,2,4,8], [4,8,8,16] * Longer sequences up to height 32 * Random matrix data with varying widths (1-5 columns) * Compares against reference: pad → upsample → concatenate → compute leaves - Added build_reference_matrix and reference_leaves_from_single_matrix helpers LMCS module: - Added MerkleTreeLmcs struct for lifted mixed-matrix commitments - Implements Mmcs trait using UniformMerkleTree - Exported from lib.rs All tests passing ✅ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…roofs Implements an alternative leaf construction for uniform Merkle trees where matrices are "lifted" to the target height by cycling through rows (modulo arithmetic) rather than upsampling (duplicating rows). For a matrix of height h lifted to final height H, row index r reads from row r % h. This contrasts with upsampling where rows are duplicated: [a,b] → [a,a,b,b] vs lifting where rows cycle: [a,b] → [a,b,a,b]. Key mathematical result: bit-reversed uniform leaves equal lifted leaves. This equivalence allows flexible commitment strategies - you can bit-reverse inputs and use either algorithm to achieve the same final commitment structure. Implementation: - build_lifted_leaves: New function with scalar/packed paths for lifted strategy - Helper functions: lift_matrix (cycling), upsample_matrix (duplicating) - reference_lifted_leaves: Reference implementation for correctness testing Tests added: - lifted_leaves_match_reference: Basic correctness test - lifted_small_heights_regression: Edge cases with heights < P::WIDTH - lifted_random_matrices_match_reference: 3 scenarios with random data - lifted_uniform_leaves_bit_reverse_equivalence: Proves bitrev(uniform) = lifted - lifted_upsampled_bit_reverse_equivalence: Proves bitrev(upsample) = lift All tests passing ✅ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…ic testing Renames and restructures leaf-building functions to clarify the two strategies: - build_uniform_leaves → build_leaves_upsampled - build_lifted_leaves → build_leaves_cyclic - Corresponding reference function renames Introduces BuildMode enum to parametrize tests over both strategies, enabling systematic verification that both upsampling and cyclic modes satisfy the same correctness properties. Code improvements: - Move const generic parameters after trait bounds (more idiomatic) - Replace if/else with early continue in scalar paths for clarity - Change dead_code attribute to cfg_attr(not(test), allow(dead_code)) - Improve inline documentation and comments - Remove unused core::iter::zip import Test improvements: - Consolidate duplicate tests into unified versions that test both modes - Enhance build_reference_matrix to support both upsampling and cyclic modes - All tests now systematically verify both strategies behave correctly No functional changes - purely refactoring for maintainability and clarity. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Test infrastructure changes: - Replace BuildMode enum with Mode struct using function pointers for better ergonomics and eliminating method dispatch overhead - Introduce MatrixSpec to declaratively specify test matrix parameters (height, width, padding) for more systematic test coverage - Add matrix_scenarios() providing comprehensive test cases covering: * Small heights (< PACK_WIDTH) * Boundary cases around PACK_WIDTH and RATE * Matrices with/without padding requirements * Various height combinations and duplicates - Replace field_matrix with random_matrix for better test coverage - Add pad_matrix helper to test padding edge cases Reference implementation simplification: - Reimplement reference_leaves_upsampled using lift_matrix + upsample - Reimplement reference_leaves_cyclic using only lift_matrix - Dramatically simplifies correctness verification logic Test consolidation: - Merge random_matrices_match_reference to use 3 samples per scenario for statistical confidence while maintaining reasonable test time - Remove redundant bit_reverse_equivalence_single_matrix test (covered by main bit-reversal test) - Remove reference_matrix_comparison test (redundant with main tests) Code structure: - Restore if-else in build_leaves_upsampled for consistency with build_leaves_cyclic structure - Add Sponge type alias to reduce verbosity - Add build_upsampled/build_cyclic wrapper functions Result: More comprehensive test coverage with cleaner, more maintainable code. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
This commit introduces matrix lifting abstractions and refactors the uniform
Merkle tree to use them, improving code clarity and mathematical precision.
Changes:
- Add matrix/src/lifted.rs with CyclicLiftIndexMap and UpsampledLiftIndexMap
implementing row-index mappings for virtual matrix height extension
- Add LiftableMatrix trait providing lift_cyclic() and lift_upsampled() methods
- Refactor merkle-tree/src/uniform.rs to extract absorb_matrix() helper function,
reducing code duplication between cyclic and upsampled leaf builders
- Improve documentation to focus on mathematical semantics ("what") rather than
implementation details ("how")
- Simplify build_leaves_upsampled and build_leaves_cyclic by delegating matrix
absorption to shared helper
- Add validate_matrix_heights() helper for consistent precondition checking
- Update tests to use new LiftableMatrix trait and verify equivalences
The lifted matrix views enable explicit representation of the virtual matrix
transformations underlying the uniform Merkle tree's state upsampling strategy.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
This commit eliminates explicit materialization of lifted matrix views in favor of on-demand index computation, significantly improving memory efficiency and code clarity in the LMCS (Lifted Mixed-Matrix Commitment Scheme) implementation. **Key Changes:** 1. **Introduce `LiftDimensions` abstraction** (merkle-tree/src/uniform.rs): - Centralizes all lifting-related validation and index mapping logic - Validates power-of-two heights, non-decreasing order, and divisibility - Provides `map_idxs_upsampled()` for on-demand global-to-local index mapping - Provides `padded_widths<RATE>()` for computing RATE-aligned row widths - Cached in `UniformMerkleTree` as optional metadata (skipped in serialization) 2. **Eliminate explicit lifted view materialization** (merkle-tree/src/lmcs.rs): - Previously: materialized `UpsampledLiftedMatrixView` for every matrix - Now: store original matrices and compute lifted indices during opening - Reduces memory footprint and simplifies data flow - `ProverData` is now just `UniformMerkleTree<F, M, DIGEST_ELEMS>` directly 3. **Refactor `open_batch` to use on-demand mapping**: - Uses `LiftDimensions::map_idxs_upsampled(index)` to compute per-matrix indices - Opens rows from original matrices at computed indices - Pads to RATE-aligned widths on-demand 4. **Consolidate validation logic**: - Replace ad-hoc `validate_matrix_heights` with `LiftDimensions::new()` - All lifting invariants validated in one place with descriptive errors - Used in both `build_leaves_upsampled` and `build_leaves_cyclic` 5. **Simplify verifier implementation**: - Uses `LiftDimensions` for validation in `verify_batch` - More functional style with `fold` for sponge state accumulation - Validates padded widths instead of mixing concerns 6. **Make lift index maps public** (matrix/src/lifted.rs): - Expose `CyclicLiftIndexMap::new` and `UpsampledLiftIndexMap::new` - Use idiomatic `is_multiple_of` instead of modulo check - Enables direct construction for advanced use cases **Benefits:** - **Memory efficiency**: No materialization of O(H×W) lifted views - **Code clarity**: Single source of truth for lifting semantics - **Performance**: On-demand computation only for opened indices - **Maintainability**: Centralized validation reduces duplication - **API simplicity**: Prover data is just the tree, no wrapper struct **Mathematical Invariant Preserved:** The upsampled index mapping r ↦ ⌊r × h/H⌋ (where H is max height, h is matrix height) is now computed lazily via `map_idxs_upsampled` instead of being baked into materialized views. This is semantically equivalent but more efficient. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
This commit adds ergonomic support for using matrix references as matrices
and begins restructuring the LMCS implementation into a dedicated module.
**Key Changes:**
1. **Add blanket `impl Matrix<T> for &'a M` in matrix/src/lib.rs** (+206 lines):
- Enables using `&matrix` wherever `M: Matrix<T>` is expected
- All trait methods forward to the underlying matrix implementation
- Critical for composing lightweight views (lifted, strided, etc.) without moves
- Zero-cost abstraction: references incur no runtime overhead
**Ergonomic benefit:**
```rust
let view = (&matrix).lift_cyclic(16); // No need to clone or move
```
2. **Add tests for reference-based lifting** (matrix/src/lifted.rs):
- `lift_cyclic_view_from_ref`: Verifies cyclic lifting works with `&matrix`
- `lift_upsampled_view_from_ref`: Verifies upsampled lifting works with `&matrix`
- Tests both trait-based (`(&m).lift_*()`) and direct constructor usage
- Confirms reference semantics don't break row iteration or indexing
3. **Remove merkle-tree/src/lmcs.rs** (-225 lines):
- Deleted single-file LMCS implementation
- Preparation for restructuring into dedicated lifted module
4. **Scaffold new lifted module structure** (merkle-tree/src/lifted/):
- Add empty placeholder files: dimensions.rs, merkle_tree.rs, mod.rs, utils.rs
- Sets up organizational structure for refactored LMCS implementation
- Will contain split-out components from the monolithic lmcs.rs
**Motivation:**
The `impl Matrix for &M` blanket impl is foundational for ergonomic view composition.
Without it, users must either:
- Clone matrices (expensive for large data)
- Move matrices (loses access to original)
- Manually wrap in reference types (verbose, error-prone)
With this impl, lifted views and other transformations can be created from references
naturally, enabling patterns like:
```rust
fn process_lifted<M: Matrix<F>>(m: M, height: usize) { ... }
process_lifted(&my_matrix.lift_upsampled(32), 32); // Just works!
```
**Status:**
This is preparatory work. The new lifted/ module files are empty scaffolding;
subsequent commits will migrate the LMCS implementation into this structure.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
…enum
This commit eliminates the matrix lifting view abstraction in favor of a simpler,
more direct approach using an explicit Lifting strategy enum with on-demand index
computation. This significantly reduces code complexity while maintaining identical
mathematical semantics.
**Motivation:**
The previous approach using `LiftableMatrix` trait and view types (CyclicLiftIndexMap,
UpsampledLiftIndexMap) was architecturally over-engineered for the LMCS use case:
- Lifted views were only used transiently during tree construction
- The view abstraction added ~330 lines of generic infrastructure
- Views were never composed or reused beyond LMCS
- The indirection made the code harder to understand
The new approach is more direct: store the lifting strategy as an enum, compute
row indices on-demand using simple bit operations.
**Key Changes:**
1. **Remove matrix/src/lifted.rs** (-330 lines):
- Deleted LiftableMatrix trait
- Deleted CyclicLiftIndexMap and UpsampledLiftIndexMap view types
- Deleted RowIndexMappedView infrastructure
- Removed lifted module from matrix crate
- Restored `#![no_std]` in matrix/src/lib.rs
2. **Introduce Lifting enum** (merkle-tree/src/lmcs/merkle_tree.rs):
```rust
pub enum Lifting {
Upsample, // r ↦ floor(r / (H/h)) via bit shift
Cyclic, // r ↦ r mod h via mask
}
```
- Encodes the two canonical lifting strategies
- `map_index(index, height, max_height)` computes row mapping on-demand
- Implements the same mathematics as the deleted views but more directly
- Comprehensive documentation explaining the mathematical semantics
3. **Rename UniformMerkleTree → LiftedMerkleTree**:
- Better reflects the core concept (lifted matrix commitment)
- "Uniform" was ambiguous (uniform as in balanced? as in lifted uniformly?)
- "Lifted" directly describes the mathematical operation
4. **Add LiftedMerkleTree::rows() method**:
- Computes lifted and padded rows on-demand for a given index
- Returns `Vec<Vec<F>>` of RATE-aligned rows ready for sponge absorption
- Replaces the previous pattern of materializing lifted views then iterating
5. **Simplify MerkleTreeLmcs**:
- Takes `Lifting` parameter in constructor
- Passes lifting strategy to tree construction
- Opening logic: `let opened_rows = tree.rows(index, RATE);` (one line!)
- Previous approach required materializing lifted views per opening
6. **Add pad_rows utility** (merkle-tree/src/lmcs/utils.rs):
- Helper for zero-padding rows to multiples of RATE
- Centralizes the padding logic previously inline
7. **Update test infrastructure**:
- Add `lift_matrix` helper that materializes lifted matrices for testing
- Used to verify equivalence between incremental and materialized approaches
**Mathematical Equivalence:**
The index mapping semantics are preserved exactly:
- **Upsample**: `r ↦ ⌊r / (H/h)⌋ = r >> log₂(H/h)`
- **Cyclic**: `r ↦ r mod h = r & (h-1)`
Where `h` is the matrix height, `H` is the max height, `r ∈ [0, H)`.
**Benefits:**
1. **Simplicity**: -205 net lines, eliminated ~330 lines of view infrastructure
2. **Directness**: Clear algorithmic expression without abstraction layers
3. **Performance**: Identical (same bit operations, no materialization)
4. **Maintainability**: Easier to understand lifting logic inline
5. **Modularity**: matrix crate no longer coupled to merkle-tree concerns
**Trade-offs:**
- Lost generality: lifting views could theoretically be reused elsewhere
- Lost composability: views were composable Matrix implementations
- In practice: neither was ever used beyond LMCS, so the loss is theoretical
**Example Usage:**
```rust
// Old: materialize lifted views
let lifted: Vec<_> = matrices.iter()
.map(|m| m.lift_upsampled(final_height))
.collect();
let row: Vec<_> = lifted[i].row(index).unwrap().collect();
// New: on-demand computation
let row_index = lifting.map_index(index, matrix.height(), final_height);
let row = matrix.row(row_index).unwrap();
```
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
This commit refactors the stateful sponge trait hierarchy and introduces
independent type parameters for matrix field elements versus digest field
elements, enabling more flexible hash function configurations.
**Core Architectural Changes:**
1. **Separate matrix field type (F) from digest type (D)**:
- `LiftedMerkleTree<F, M, DIGEST_ELEMS>` → `LiftedMerkleTree<F, D, M, DIGEST_ELEMS>`
- `MerkleTreeLmcs<P, H, C, WIDTH, RATE, DIGEST>` → `MerkleTreeLmcs<PF, PD, H, C, WIDTH, DIGEST>`
- Matrices store elements of type `F`, tree stores digests of type `D`
- Enables committing to matrices over one field while hashing into another
**Use case:** Matrix elements might be BabyBear (31-bit) while digests use a
different field optimized for the hash function's native operations.
2. **Refactor sponge trait hierarchy** (symmetric/src/sponge.rs):
**Before:**
```rust
trait StatefulSponge<T, const WIDTH: usize, const RATE: usize> {
fn absorb<I>(&self, state: &mut [T; WIDTH], input: I);
fn squeeze<const OUT: usize>(&self, state: &[T; WIDTH]) -> [T; OUT];
}
```
**After:**
```rust
trait StatefulHasher<Item, State, Out> {
fn absorb_into<I>(&self, state: &mut State, input: I);
fn squeeze(&self, state: &State) -> Out;
}
struct StatefulSponge<P, const WIDTH: usize, const OUT: usize, const RATE: usize> {
pub p: P,
}
```
**Changes:**
- Trait renamed: `StatefulSponge` → `StatefulHasher`
- Method renamed: `absorb` → `absorb_into` (clearer intent)
- Remove generic `squeeze<const OUT: usize>` → non-generic `squeeze`
- RATE moved from trait to implementation (`StatefulSponge` struct)
- State and output types now explicit type parameters (not const generics)
- Concrete `StatefulSponge` struct implements `StatefulHasher`
3. **Remove RATE as pervasive const generic**:
- RATE now encapsulated in sponge implementation
- API consumers no longer need to thread RATE through type signatures
- `MerkleTreeLmcs<P, H, C, WIDTH, RATE, DIGEST>` → `MerkleTreeLmcs<PF, PD, H, C, WIDTH, DIGEST>`
- Simplifies type signatures throughout the codebase
4. **Eliminate explicit padding logic**:
- Removed `pad_rows` utility function (merkle-tree/src/lmcs/utils.rs)
- `LiftedMerkleTree::rows(index, padding_multiple)` → `rows(index)`
- Padding now entirely handled by sponge's `absorb_into` implementation
- Verifier no longer checks padded widths, only original widths
- Cleaner separation: padding is a hashing concern, not a tree concern
**Benefits:**
1. **Flexibility**: Can use different fields for matrices vs digests
2. **Simplicity**: RATE no longer pollutes every type signature
3. **Encapsulation**: Padding logic lives in sponge where it belongs
4. **Generality**: `StatefulHasher` trait is more abstract and reusable
5. **Clarity**: Method names (`absorb_into`, non-generic `squeeze`) are clearer
**Type Parameter Naming:**
- `PF`: Packed Field (for matrix elements)
- `PD`: Packed Digest (for hash outputs)
- `F`: Scalar field element type (matrix elements)
- `D`: Scalar digest element type (hash outputs)
**Mathematical Equivalence:**
The padding behavior is mathematically identical - the sponge's `absorb_into`
still zero-pads incomplete chunks to RATE before permuting. The change is purely
about where padding responsibility lives in the architecture.
**Migration Guide:**
```rust
// Old API
let sponge: PaddingFreeSponge<P, WIDTH, RATE, DIGEST> = ...;
sponge.absorb(&mut state, input);
let out = sponge.squeeze::<DIGEST>(&state);
// New API
let sponge = StatefulSponge::<P, WIDTH, DIGEST, RATE> { p: perm };
sponge.absorb_into(&mut state, input);
let out = sponge.squeeze(&state);
```
**Updated Modules:**
- symmetric/src/sponge.rs: Core trait refactoring
- merkle-tree/src/lmcs/mod.rs: Updated MMCS impl with new types
- merkle-tree/src/lmcs/merkle_tree.rs: Added D type param throughout
- merkle-tree/src/lmcs/utils.rs: Removed pad_rows utility
- merkle-tree/src/lmcs/test_helpers.rs: Updated test infrastructure
- merkle-tree/benches/lifted.rs: Updated benchmarks
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
…le hashing
This commit refactors the stateful hashing interface to enable using any
stateless hasher (e.g., Keccak, Blake3) as a stateful hasher in LMCS and
other commitment schemes, while maintaining backward compatibility with
field-native sponges.
**Motivation:**
Previously, only field-native sponges (Poseidon2) could be used with LMCS
because the StatefulHasher trait was tightly coupled to sponge construction.
This prevented using standard hash functions like Keccak-256 or Blake3, which:
- Are not field-native (operate on bytes/words)
- Don't have a natural "state" concept (stateless hash functions)
- Have different padding semantics
The new design enables:
1. Using Keccak/Blake3/SHA256 as stateful hashers via chaining adapter
2. Explicit padding width declaration per hasher type
3. Cleaner separation between sponge-specific and generic stateful APIs
**Core Architectural Changes:**
1. **New symmetric/src/stateful.rs module** (+249 lines):
Introduces `StatefulHasher<Item, State, Out>` trait:
```rust
pub trait StatefulHasher<Item, State, Out> {
const PADDING_WIDTH: usize = 1;
fn absorb_into<I>(&self, state: &mut State, input: I);
fn squeeze(&self, state: &State) -> Out;
}
```
**PADDING_WIDTH semantic:**
- Declares the absorption granularity in `Item` units
- Field-native sponges: `PADDING_WIDTH = RATE` (in field elements)
- Byte-based hashers: `PADDING_WIDTH = 1` (one field element = one item)
- Used by LMCS to determine row padding behavior
Introduces `ChainedStateful<H, I, N>` adapter:
```rust
pub struct ChainedStateful<H, I, N> {
pub h: H, // underlying stateless hasher
}
```
**Chaining rule:** `state <- H(state || encode(input))`
- State is the digest itself: `[u8; N]`, `[u32; N]`, or `[u64; N]`
- Empty input is a no-op (preserves state)
- Each absorption updates state by hashing concatenation
**Implementations provided:**
- `StatefulHasher<F, [u8; N], [u8; N]>` - Field elements → bytes
- `StatefulHasher<F, [u32; N], [u32; N]>` - Field elements → u32 words
- `StatefulHasher<F, [u64; N], [u64; N]>` - Field elements → u64 words
- Packed variants for SIMD: per-lane chaining for `PackedValue` types
2. **Refactor symmetric/src/sponge.rs** (-63 net lines):
**Removed:**
- `StatefulHasher` trait definition (moved to stateful.rs)
- `StatefulSponge` struct (no longer needed)
**Changed:**
- `PaddingFreeSponge` now directly implements `StatefulHasher`
- Added `const PADDING_WIDTH: usize = RATE;`
- Changed `T: Copy` → `T: Clone` for more flexibility
- squeeze now uses `clone()` instead of `try_into()`
3. **Update LMCS to use dynamic padding width** (merkle-tree/src/lmcs/):
**Before:** Hard-coded RATE padding
```rust
let padded_width = width.next_multiple_of(RATE);
```
**After:** Dynamic padding based on hasher's PADDING_WIDTH
```rust
let pad = <H as StatefulHasher<F, _, _>>::PADDING_WIDTH;
let padded_width = if pad > 1 {
width.next_multiple_of(pad)
} else {
width
};
```
Changes in `mod.rs`:
- `open_batch`: pad rows to hasher's PADDING_WIDTH before returning
- `verify_batch`: validate padded widths using hasher's PADDING_WIDTH
- Documentation updated to reference "hasher's padding width"
Changes in `merkle_tree.rs`:
- Documentation updated throughout
4. **New test suite** (keccak/tests/chained_stateful.rs, +128 lines):
Tests `ChainedStateful` adapter with Keccak:
- `chained_stateful_keccak_u8_matches_manual`: Validates Keccak256 chaining
- `chained_stateful_keccak_u64_matches_manual`: Validates KeccakF sponge chaining
- `serializing_hasher_matches_inner_*`: Validates SerializingHasher compatibility
Tests verify:
- Empty input segments are no-ops
- Multiple absorption calls chain correctly
- Adapter output matches manual chaining implementation
5. **Update benchmarks and tests**:
- Replace `StatefulSponge` with `PaddingFreeSponge`
- Update constructor calls to use `::new()` method
- Add p3-baby-bear dev dependency to keccak for tests
**Usage Examples:**
Using Keccak-256 as a stateful hasher:
```rust
use p3_keccak::Keccak256Hash;
use p3_symmetric::{ChainedStateful, StatefulHasher};
let keccak = ChainedStateful::<Keccak256Hash, u8, 32>::new(Keccak256Hash {});
let mut state = [0u8; 32];
// Absorb multiple segments
keccak.absorb_into(&mut state, field_elements1);
keccak.absorb_into(&mut state, field_elements2);
// Squeeze final digest
let digest = keccak.squeeze(&state);
```
Using with LMCS (now supports both Keccak and Poseidon2):
```rust
// Option 1: Field-native Poseidon2 (PADDING_WIDTH = RATE)
let sponge = PaddingFreeSponge::<Poseidon2, WIDTH, RATE, DIGEST>::new(perm);
let lmcs = MerkleTreeLmcs::new(sponge, compress, lifting);
// Option 2: Keccak-256 via adapter (PADDING_WIDTH = 1)
let keccak = ChainedStateful::<Keccak256Hash, u8, 32>::new(Keccak256Hash {});
let lmcs = MerkleTreeLmcs::new(keccak, compress, lifting);
```
**Benefits:**
1. **Flexibility**: Use any hash function (Keccak, Blake3, SHA256) with LMCS
2. **Explicit semantics**: PADDING_WIDTH makes padding behavior explicit
3. **Zero overhead**: ChainedStateful is a zero-cost abstraction
4. **Backward compatible**: Existing Poseidon2 code unchanged
5. **Composability**: Can mix different hashers in same codebase
**Design Rationale:**
**Why chaining adapter?**
Stateless hashers have no notion of "state evolution". The chaining pattern
`state <- H(state || input)` is a standard technique to create stateful behavior
from stateless hashers, commonly used in HMAC, KDFs, and authenticated encryption.
**Why PADDING_WIDTH?**
Different hashers have different absorption granularities:
- Sponges: absorb in chunks of RATE field elements
- Byte hashers: absorb arbitrary-length byte streams (no padding needed)
Making this explicit allows LMCS to pad rows correctly per hasher type.
**Why per-lane chaining for packed types?**
SIMD operations require independent state per lane. Per-lane chaining maintains
the semantic that each lane processes its own independent input stream.
**Files Changed:**
- symmetric/src/lib.rs: Export new stateful module
- symmetric/src/stateful.rs: New module with StatefulHasher + ChainedStateful
- symmetric/src/sponge.rs: Refactor to implement StatefulHasher directly
- merkle-tree/src/lmcs/mod.rs: Use dynamic PADDING_WIDTH
- merkle-tree/src/lmcs/merkle_tree.rs: Update documentation
- merkle-tree/benches/lifted.rs: Update to use PaddingFreeSponge
- keccak/Cargo.toml: Add p3-baby-bear dev dependency
- keccak/tests/chained_stateful.rs: New comprehensive test suite
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
…module - Extract ChainingHasher into symmetric/src/chaining_hasher.rs for better organization - Rename ChainedStateful -> ChainingHasher throughout codebase for clarity - Update API: ChainedStateful::new() -> ChainingHasher::new() - Simplify merkle-tree benchmarks by removing macro-based code generation - Use deterministic Poseidon2 initialization in benchmarks with default_babybear_poseidon2_24() - Create static hasher instances (K_HASH, S_HASH) for reusability in lmcs_hashes benchmarks - Update test names: chained_stateful_* -> chaining_* 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Require StatefulHasher to be Clone, enabling Copy-able LMCS configurations - Remove redundant Default bounds on PackedValue types (PF, PD) throughout LMCS - Derive Copy for MerkleTreeLmcs when components support it - Remove unused generic D from LiftedMerkleTree::_phantom field - Change root() to require D: Copy instead of D: Clone for efficiency - Clean up redundant trait bounds implied by PackedValue or StatefulHasher This simplification reduces boilerplate while maintaining the same guarantees, since PackedValue already implies Copy and other necessary traits. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Replace `.into_iter().chain(...)` with `chain(...)` function throughout. The standalone chain function automatically handles IntoIterator conversion, making the code more concise and idiomatic. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
API changes: - Change matrices_groups parameter from &[Vec<M>] to &[Vec<&M>] in Precomputation::new() and compute_deep_quotient() - Add eval_deep() function for single-point DEEP quotient evaluation - Fix derive_coeffs_from_challenge to correctly skip padding gaps Test improvements: - Replace manual Horner evaluation with NaiveDft and interpolate_coset - Rename check_evals_match_horner to check_evals_match_interpolate - Remove custom horner_ext and horner_base helper functions - Simplify deep_quotient_end_to_end using eval_deep for verification 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Introduce type abstractions and refactor DEEP quotient API for cleaner code organization: ## New Types **MatrixGroupEvals<T>** (fri/mod.rs): - Newtype wrapper for `Vec<Vec<T>>` representing `[matrix][col]` evals - Methods: `new()`, `iter()` (yields `&[T]` slices), `flatten()` (yields `&T`) - Used throughout deep.rs and verifier.rs for type clarity **VirtualPoly** (fri/verifier.rs): - Represents batched polynomial for FRI verification - Fields: - `commitments: Vec<(Commitment, Vec<Dimensions>)>` - grouped commit/dims - `reduced_openings: Vec<(EF, EF)>` - (point, reduced_eval) pairs - `challenge`, `padding` - Horner reduction parameters - Methods: - `new()`: Validates structure, computes reduced evals via Horner - `eval()`: Verifies Merkle openings, computes DEEP quotient at index ## API Changes **deep.rs**: - Replace `reduce_evals()` with `reduce_with_powers()` using Horner's method - Change `eval_deep()` signature: `(reduced_evals, opening_points, ...)` → `(reduced_openings, ...)` where reduced_openings is `&[(EF, EF)]` - Update `Precomputation::evals()` to return `&[MatrixGroupEvals<EF>]` - Use `dot_product()` instead of custom `reduce_evals()` in compute_deep_quotient ## Code Quality - Add exhaustive inline comments to verifier.rs functions - Add docstrings for all VirtualPoly struct fields - Update tests to use new API - Remove unused PrimeCharacteristicRing import from fold.rs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
…entation
Extract DEEP quotient functionality from `fri/` into new `deep/` module with
four files: `mod.rs`, `interpolate.rs`, `prover.rs`, and `verifier.rs`.
## Module Organization
- `deep/mod.rs`: Module-level documentation explaining DEEP quotient construction,
two-challenge soundness (k·m → k+m collision terms), and lifting semantics
- `deep/interpolate.rs`: Barycentric interpolation with full mathematical derivations
including weight folding for lifted polynomials
- `deep/prover.rs`: `DeepPoly` for computing Q(X) over LDE domain with SIMD-optimized
accumulation across matrices of varying heights
- `deep/verifier.rs`: `DeepOracle` for verifying openings and reconstructing Q(X)
at query points
## Key Design Decisions Documented
**Uniform opening points**: All columns share the same opening points {zⱼ}, enabling
factorization into f_reduced(X) = Σᵢ αⁱ·fᵢ(X) for O(1) reductions per query.
**Verifier's lifted perspective**: From the verifier's view, all polynomials appear
on the same domain. The prover evaluates fᵢ(zʳ) for degree-d polynomials, which
equals fᵢ'(z) where fᵢ'(X) = fᵢ(Xʳ) is the lifted polynomial.
**Alignment (not padding)**: Coefficient indices align to multiples of the hasher's
rate, equivalent to virtually appending zeros without materializing them.
**Weight folding**: Barycentric weights for lifted domains combine via
w_{gH,2i}(z) + w_{gH,2i+1}(z) = 2·w_{(gH)²,i}(z²), with the factor of 2 canceling
in the scaling factor s_{(gH)²}(z²) = 2·s_{gH}(z).
**Horner duality**: `derive_coeffs_from_challenge` produces reversed coefficients
[αⁿ⁻¹, ..., α, 1] so `reduce_with_powers` can use left-to-right Horner evaluation.
## Files Changed
- New: `lifted/src/deep/` module (4 files, ~1070 lines)
- Removed: `lifted/src/fri/deep.rs`, `prover.rs`, `verifier.rs` (~900 lines)
- Modified: `lifted/src/fri/mod.rs` (removed MatrixGroupEvals, now in deep/)
- Modified: `lifted/src/lib.rs` (export deep module)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <[email protected]>
Rewrite deep_quotient benchmark to use the refactored DEEP module API. The benchmark now uses MerkleTreeLmcs for commitment and tests both BabyBear and Goldilocks fields with realistic multi-group matrix specs. Changes: - Replace old Precomputation API with SinglePointQuotient and DeepPoly - Add generic benchmark support for both BabyBear (degree-4 ext) and Goldilocks (degree-2 ext) fields - Use multi-group commitment structure where each group is a separate commitment with matrices of varying heights - Benchmark two phases separately: - batch_eval_lifted: barycentric evaluation with weight folding - deep_poly_new: DEEP quotient polynomial construction - Remove end-to-end benchmark (commitment phase is now setup-only) - Export deep submodules publicly for benchmark access Benchmark results (Apple M1 Max): - babybear/batch_eval_lifted: ~185ms - babybear/deep_poly_new: ~984ms - goldilocks/batch_eval_lifted: ~363ms - goldilocks/deep_poly_new: ~2.25s Also fixes: - clippy::needless_range_loop in p3-field (pre-existing) - clippy::type_complexity warnings with #[allow] annotations - Unused import in fri_fold_row benchmark 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Introduce packed/SIMD support for FRI folding to accelerate the fold operation by processing multiple rows in parallel using SIMD lanes. PackedValue trait additions: - `pack_columns<N>`: Pack columns from WIDTH rows into N packed values - `unpack_columns<N>`: Inverse operation, unpack N packed values into rows PackedFieldExtension trait additions: - `pack_ext_columns<N>`: Pack extension field columns with proper coefficient handling for BinomialExtensionField FriFold trait refactoring: - Make trait generic over ARITY only (remove field type parameter) - `fold_evals` now accepts generic packed types (PF, EF, PEF) enabling both scalar and SIMD evaluation with the same implementation - Add `fold_matrix_packed` for SIMD-optimized folding that processes WIDTH rows at a time using packed arithmetic - Update `ifft4` to work with both scalar and packed types Other changes: - Add `commit` method to DeepPoly for committing the deep polynomial - Move `validate_heights` from merkle_tree/utils to lifted/src/utils - Replace `pack_arrays`/`unpack_array_into` with new PackedValue methods - Update benchmarks to compare scalar vs packed performance 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Implement the FRI commit phase for polynomial commitment: - Add CommitPhaseProof and CommitPhaseData structures - Implement commit_phase prover with iterative folding - Implement verify_query with two-phase verification: 1. Verify all Merkle openings 2. Process opened rows and verify folding consistency Key optimizations: - Precompute s_inv values once and update each round by selecting every arity-th element and raising to power arity - Verifier reuses g_inv across rounds (raised to power arity each step) - Avoids redundant generator and inverse computations Also includes: - Merkle commit benchmarks comparing Poseidon2 vs Keccak - Shared benchmark utilities module - DEEP prover folded() accessor method - Minor fixes to FRI fold benchmark 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
…ching Implement a full polynomial commitment scheme combining DEEP (Dimension Extension of Evaluation Protocol) quotient construction with FRI for efficient polynomial evaluation proofs. The implementation supports matrices of varying heights through a "lifting" strategy that virtually extends shorter matrices without data movement. ## FRI Protocol Implementation Add complete FRI prover and verifier implementation with support for arity-2 and arity-4 folding: - **Commit Phase (prover.rs)**: Iteratively fold polynomials using challenge-based combination until reaching target degree. Key optimization: precompute s_inv values and update them efficiently between rounds using s_inv[k*arity]^arity formula instead of recomputing from scratch. - **Query Phase (verifier.rs)**: Verify Merkle openings and folding consistency at randomly sampled indices. Final polynomial is sent in coefficient form for direct evaluation. - **Folding Abstraction (fold.rs)**: Generic FriFold<ARITY> trait with concrete FriFold2 (even-odd decomposition) and FriFold4 (size-4 inverse FFT) implementations. Both support SIMD-optimized matrix folding via ExtensionPacking for horizontal parallelism. ## DEEP Quotient Module Refactor DEEP quotient into dedicated prover/verifier components: - **DeepPoly (prover)**: Constructs DEEP quotient Q(X) that batches multiple polynomial evaluation claims. Uses two challenges (α for columns, β for points) for improved soundness over single-challenge design. - **DeepOracle (verifier)**: Point-query oracle that reconstructs expected DEEP quotient values from claimed evaluations and Merkle openings. Uses Horner evaluation with alignment-aware gap skipping. - **SinglePointQuotient (interpolate.rs)**: Precomputes 1/(z - x_i) for all domain points via Montgomery's batch inversion. Enables O(n) barycentric evaluation with weight folding optimization for lifted polynomials. - **MatrixGroupEvals**: New type organizing evaluations as evals[matrix_idx][column_idx] to preserve structure needed for batched reduction. ## PCS Module (New) Add high-level PCS orchestration combining DEEP and FRI: - **open()**: Compute evaluations via SinglePointQuotient, construct DeepPoly, run FRI commit phase, sample query indices, generate QueryProofs with input matrix and FRI round openings. - **verify()**: Reconstruct DeepOracle, replay challenger for FRI betas, verify Merkle openings and FRI folding at each query index. - **Proof structure**: Contains claimed evaluations, FRI commit proof (intermediate commitments + final polynomial), and per-query proofs. - **Comprehensive error types**: PcsError enum with variants for input MMCS errors, FRI MMCS errors, DEEP quotient mismatches, FRI folding errors, and final polynomial mismatches. ## Key Design Decisions 1. **Uniform opening points**: All columns share the same opening points, enabling f_reduced(X) = Σ αⁱ·fᵢ(X) factorization. Verifier computes one inner product per query instead of per-column-per-point. 2. **Lifting strategy**: Polynomials of degree d on domain D embed into larger domain D* via f(X) ↦ f(X^r). In bit-reversed order, lifted values repeat r times consecutively, enabling virtual upsampling without data movement. 3. **Coefficient alignment**: Each matrix's coefficient range padded to alignment multiple (for hasher rate). Verifier skips implicit zeros via challenge^gap multiplication. 4. **Challenger integration**: Both prover and verifier receive mutable challenger reference and sample challenges internally, ensuring cryptographic soundness through Fiat-Shamir. ## Benchmarks Update benchmarks for new API: - **deep_quotient.rs**: Test batch_eval_lifted and DeepPoly::new with configurable matrix groups. Updated to pass challenger reference instead of explicit challenges. - **fri_fold_row.rs**: Test FRI folding throughput for both scalar and packed (SIMD) implementations across arities 2 and 4. ## Files Changed - **New**: lifted/src/fri/prover.rs, lifted/src/fri/verifier.rs - **New**: lifted/src/pcs/mod.rs, lifted/src/pcs/proof.rs - **Deleted**: lifted/src/fri/commit.rs (refactored into prover/verifier) - **Modified**: lifted/src/fri/fold.rs, lifted/src/fri/mod.rs - **Modified**: lifted/src/deep/*.rs (prover/verifier/interpolate/mod) - **Modified**: lifted/src/merkle_tree/*.rs - **Modified**: lifted/src/utils.rs, lifted/src/lib.rs - **Modified**: lifted/benches/deep_quotient.rs, fri_fold_row.rs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Major improvements to the lifted module: **Error handling with thiserror:** - Add thiserror dependency for derive-based error types - Convert PcsError and LmcsError to use thiserror - Create FriError enum for FRI verification failures - Create DeepError enum for DEEP verifier construction failures - Convert all assertions in verification paths to Result-based errors **New wrapper types:** - Add OpeningClaim<EF> for verifier's (point, evaluations) pairs - Add QuotientOpening<F, EF> for prover's quotient data - Implement IntoIterator, Index, len(), is_empty() for MatrixGroupEvals **API improvements:** - Rename FRI Params to FriParams for consistency 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add PcsConfig struct grouping FriParams and alignment - Remove NUM_POINTS const generic from open/verify signatures - Use &[EF] slices instead of &[EF; N] arrays for eval_points - Standardize zip patterns to use core::iter::zip(a, b) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
…ull paths
- Make deep, fri, and utils modules public in lib.rs
- Remove glob re-exports, requiring callers to use explicit paths
- Update all benchmarks to use fully qualified imports:
- p3_lifted::merkle_tree::{Lifting, MerkleTreeLmcs, ...}
- p3_lifted::deep::{QuotientOpening, SinglePointQuotient}
- p3_lifted::utils::bit_reversed_coset_points
- Tighten field visibility with pub(crate) where appropriate
- Move FriError to fri/mod.rs, DeepError stays in verifier
- Refactor FRI fold: rename fold_matrix to fold_matrix_scalar,
add dispatcher that selects scalar/packed based on SIMD width
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <[email protected]>
Remove unused trait implementations (Index, IntoIterator) and rename methods for clarity: - iter() → iter_matrices() - flatten() → iter_evals() - len() → num_matrices() Also remove trivial constructors for OpeningClaim and QuotientOpening, using struct literals instead. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Reduce error type complexity by ~48% (31 → 16 variants + 1 unit struct): Error type changes: - DeepError: Convert from 4-variant enum to unit struct (empty openings is valid trivial case, structure mismatch is the only error) - FriError: Consolidate WrongCommitmentCount, WrongBetaCount, WrongOpeningCount, WrongFinalPolyLen into InvalidProofStructure - LmcsError: Remove verbose fields (expected/actual), convert commit-time validation errors to panics (EmptyBatch, NonPowerOfTwoHeight, etc.) - PcsError: Remove dead DeepQuotientMismatch variant API improvements: - Remove QuotientOpening struct - pass quotients and evals as parallel slices to DeepPoly::new, avoiding clones - Simplify LmcsError variants by removing redundant index/count fields - Convert commit-time checks to panics (prover bugs, not verification) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Create src/tests.rs with shared type aliases (F, EF, P, BaseLmcs, FriMmcs, Challenger) and builder functions (base_lmcs, fri_mmcs, challenger, components) - Move integration tests from inline modules to separate tests.rs files for fri, deep, pcs, and merkle_tree modules - Update unit tests in fold.rs, prover.rs, interpolate.rs, and lifted_tree.rs to import from crate::tests - Delete merkle_tree/test_helpers.rs (consolidated into root tests.rs) - Remove ~80 lines of duplicated test boilerplate 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Refactor benchmark suite from monolithic "run all combinations" to feature-flag based architecture where users select field and hash per run. Feature flags added: - bench-babybear / bench-goldilocks: mutually exclusive field selection - bench-poseidon2 / bench-keccak: mutually exclusive hash selection Benchmark changes: - Add pcs.rs: workspace vs lifted PCS comparison (arity2, arity4) - Add ext_mmcs benchmarks in merkle_commit.rs (arity2, arity4) - Rename fri_fold_row.rs -> fri_fold.rs - Remove lifted.rs, lmcs_hashes.rs (consolidated into feature system) - Remove SHA256 support, verify_merkle, deep_poly_new standalone Configuration updates: - Poseidon2 BabyBear: width=24, rate=16 (was 16, 8) - Poseidon2 Goldilocks: width=12, rate=8 - PCS log_final_degree=8 Run examples: cargo bench -p p3-lifted --features bench-babybear,bench-poseidon2 cargo bench -p p3-lifted --features bench-goldilocks,bench-keccak \ --bench merkle_commit --bench fri_fold 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
10eb0eb to
377018e
Compare
Remove the `Lifting` enum and `build_leaf_states_cyclic` function, hardcoding nearest-neighbor upsampling as the sole lifting strategy. This simplifies the API by removing the lifting parameter from `MerkleTreeLmcs::new()`. Add comprehensive module-level documentation explaining: - Why upsampling is used for bit-reversed polynomial evaluations - The mathematical property: upsampling bit-reversed evals of f(X) yields bit-reversed evals of f'(X) = f(X^r) over a larger domain - Opening semantics for the lifted polynomials - Equivalence between upsampling bit-reversed data and cyclically repeating canonically-ordered data 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Restructure PCS benchmark to use RELATIVE_SPECS from bench_utils for realistic multi-trace scenarios simulating STARK workloads: - Group 0: Main trace columns at varying heights - Group 1: Auxiliary/permutation columns - Group 2: Quotient polynomial chunks Opening point configuration: - Workspace PCS: groups 0-1 at [z1, z2], group 2 at [z1] only - Lifted PCS: all groups opened at both points [z1, z2] This provides a more accurate comparison of the two PCS implementations under realistic multi-group commitment scenarios. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add `extract_lane` method to extract an extension field element at a specific SIMD lane, the inverse of broadcasting a scalar to all lanes. Use this to simplify: - `to_ext_slice` and `to_ext_iter` in PackedFieldExtension (removing intermediate Vec allocation) - Quotient unpacking in uni-stark and batch-stark provers - `columnwise_dot_product` unpacking in matrix Also simplify `columnwise_dot_product` broadcast using `.into()` via the Algebra trait, and remove the now-redundant specialized `to_ext_iter` from PackedBinomialExtensionField. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add `map` and `map_array` for element-wise transformations - Add zero-cost slice transmute methods (`slice_as_arrays`, `arrays_as_slice`, etc.) - Implement `Index` and `IndexMut` for direct element access - Document all methods with SAFETY comments for unsafe transmutes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
…uation Add `columnwise_dot_product_batched<EF, N>` to compute Mᵀ · [v₀, ..., vₙ₋₁] for N weight vectors simultaneously. Each matrix row is loaded once instead of N times, reducing memory bandwidth for tall-narrow matrices. Benchmark results (BabyBear, EF=degree-4 extension, parallel, M2 Pro): | Size | unbatched | batched<1> | Δ | unbatched×2 | batched<2> | Δ | |------------|-----------|------------|------|-------------|------------|-------| | 2^16×128 | 0.98ms | 0.93ms | ~0% | 1.94ms | 1.79ms | -8% | | 2^16×512 | 3.75ms | 3.70ms | ~0% | 7.55ms | 6.60ms | -13% | | 2^16×4096 | 27.9ms | 30.0ms | +7% | 56.9ms | 59.6ms | +5% | | 2^18×128 | 3.53ms | 3.64ms | +3% | 7.09ms | 6.44ms | -9% | | 2^18×512 | 12.8ms | 13.1ms | +2% | 26.7ms | 23.9ms | -11% | | 2^18×4096 | 104ms | 103ms | ~0% | 208ms | 209ms | ~0% | | 2^20×128 | 13.4ms | 13.3ms | ~0% | 26.3ms | 25.0ms | -5% | | 2^20×512 | 51.5ms | 51.8ms | ~0% | 102ms | 94.6ms | -8% | Key findings: - batched<1> overhead: minimal (0-3%), negligible in practice - batched<2> benefit: 5-13% faster on narrow matrices (128-512 cols) - Wide matrices (4096 cols): no benefit, bandwidth already saturated Also refactors columnwise_dot_product to use new PackedFieldExtension methods (extract_lane, into()) for cleaner unpacking. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
…ation Introduces `MultiPointQuotient<F, EF, N>` and `DeepPoly::new_batched` as an optimized alternative to multiple `SinglePointQuotient` calls. The batched approach shares a single batch inversion across N evaluation points and uses `columnwise_dot_product_batched` for better cache utilization. Benchmarks on Apple M2 Pro show consistent improvements: BabyBear (N=1: single_point vs multi_point_1): - 65K: 3.67ms → 3.54ms (3.6% faster) - 262K: 8.73ms → 8.52ms (2.4% faster) - 1M: 27.9ms → 26.6ms (4.7% faster) BabyBear (N=2: single_point_x2 vs multi_point_2): - 65K: 5.02ms → 3.91ms (22% faster) - 262K: 11.4ms → 9.62ms (15% faster) - 1M: 33.5ms → 30.7ms (8% faster) Goldilocks (N=1: single_point vs multi_point_1): - 65K: 5.08ms → 4.96ms (2.4% faster) - 262K: 13.5ms → 13.2ms (2.1% faster) - 1M: 46.4ms → 44.6ms (3.9% faster) Goldilocks (N=2: single_point_x2 vs multi_point_2): - 65K: 6.48ms → 5.39ms (17% faster) - 262K: 16.4ms → 14.9ms (9% faster) - 1M: 53.6ms → 50.9ms (5% faster) The multi-point approach is strictly better: never slower, and 5-22% faster for the typical N=2 case used in DEEP openings. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Remove SinglePointQuotient and rename MultiPointQuotient to PointQuotients, making N a const generic parameter for the PCS. This consolidates on the batched evaluation approach which benchmarks show is consistently faster. Changes: - Remove SinglePointQuotient (consolidated into PointQuotients<1>) - Rename MultiPointQuotient → PointQuotients - Remove DeepPoly::new, rename DeepPoly::new_batched → DeepPoly::new - Make N a const generic parameter in pcs::open and pcs::verify - Add MatrixGroupEvals::map helper for transpose operations - Simplify transpose code in pcs/mod.rs and deep/tests.rs - Update benchmarks to remove SinglePointQuotient variants Net result: -249 lines (134 insertions, 383 deletions) ## Benchmark Results (Apple M2 Pro) ### Batched vs Serial Evaluation (DEEP Quotient, 1M elements, parallel) The batched approach (now the only approach) is faster than calling SinglePointQuotient twice: | Field | multi_point_2 | single_point_x2 | Improvement | |------------|---------------|-----------------|-------------| | BabyBear | 30.6ms | 33.5ms | 8.7% faster | | Goldilocks | 51.1ms | 53.5ms | 4.5% faster | ### Post-Refactoring Comparison (PCS Open, 1M elements) All results within measurement noise (~2-5%), no regression: | Config | After | Base | Change | |-------------------------|-----------|-----------|--------| | BabyBear/parallel/arity2| 485.1ms | 495.8ms | -2% | | BabyBear/parallel/arity4| 265.2ms | 271.4ms | -2% | | BabyBear/single/arity2 | 3.5s | 3.5s | ~0% | | BabyBear/single/arity4 | 1738.6ms | 1747.4ms | -1% | | Goldilocks/parallel/arity2| 1240.6ms| 1240.1ms | ~0% | | Goldilocks/parallel/arity4| 576.6ms | 572.4ms | +1% | | Goldilocks/single/arity2| 9.7s | 10.1s | -4% | | Goldilocks/single/arity4| 4.3s | 4.5s | -4% | ### Arity Comparison (1M elements) | Config | Workspace | Arity2 | Arity4 | Arity4 Speedup | |---------------------|-----------|--------|--------|----------------| | BabyBear/parallel | 482ms | 485ms | 265ms | 1.8x | | BabyBear/single | 3.6s | 3.5s | 1.74s | 2.1x | | Goldilocks/parallel | 1.22s | 1.24s | 577ms | 2.1x | | Goldilocks/single | 10.3s | 9.7s | 4.3s | 2.4x | Key findings: 1. Workspace ≈ Arity2 (within noise) - no overhead from lifting 2. Arity4 is consistently ~2x faster (halves FRI rounds) 3. Single-threaded sees larger arity4 gains (2.1-2.4x vs 1.8-2.1x) 4. Goldilocks benefits slightly more from arity4 (64-bit arithmetic) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Introduce `DeepChallenges` and `FriChallenges` structs that enforce correct
transcript ordering by requiring the data to observe as input parameters.
This makes transcript mutations explicit at call sites rather than hidden
inside constructors.
New types:
- `DeepChallenges<EF>` { alpha, beta }
- `sample(evals, challenger, alignment)` observes evaluations with alignment
padding, then samples α (column batching) and β (point batching)
- `FriChallenges<EF>` { betas, query_indices }
- `sample(proof, params, log_domain_size, challenger)` observes commitments
and final polynomial, then samples folding betas and query indices
API changes:
- `DeepPoly::new()` now takes `&DeepChallenges<EF>` instead of `&mut Challenger`
- `DeepOracle::new()` now takes `&DeepChallenges<EF>` instead of `&mut Challenger`
Utilities:
- Add `alignment_padding(len, alignment)` helper in utils.rs
The PCS flow is now explicit about when transcript state changes:
```rust
// Prover:
let deep_challenges = DeepChallenges::sample(&evals, challenger, alignment);
let deep_poly = DeepPoly::new(..., &deep_challenges, ...);
// Verifier:
let deep_challenges = DeepChallenges::sample(&proof.evals, challenger, alignment);
let deep_oracle = DeepOracle::new(..., &deep_challenges, ...);
let fri_challenges = FriChallenges::sample(&proof.fri_commit_proof, ...);
```
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <[email protected]>
… vs MMCS This refactoring addresses a fundamental design tension in the symmetric crate: `MerkleTreeLmcs` requires `StatefulHasher` for proper sponge absorption semantics with padding, while `MerkleTreeMmcs` requires `CryptographicHasher` for its overwrite-mode hashing. Previously, `PaddingFreeSponge` implemented both traits, conflating two distinct use cases. ## Symmetric Crate Changes ### New `StatefulSponge` (implements `StatefulHasher` only) A proper stateful sponge wrapper around cryptographic permutations that: - Maintains external state that evolves with each absorption - Applies zero-padding to the rate portion when input is exhausted - Exposes `PADDING_WIDTH` for callers to align input if desired - Does NOT implement `CryptographicHasher` (intentional separation) ### New `SerializingStatefulSponge` An adapter for binary-native hashers (like Keccak) that: - Wraps a `StatefulHasher<u64, ...>` (e.g., `StatefulSponge<KeccakF>`) - Serializes field elements to bytes/u32/u64 before absorption - Preserves proper sponge semantics (unlike chaining mode) - Supports both scalar `F` and parallel `[F; M]` inputs ### `ChainingHasher` consolidated into `stateful.rs` Moved from separate module, implements `StatefulHasher` via chaining mode: - Computes `H(state || serialize(input))` for each absorption - Used for streaming hashers like Blake3 without native sponge semantics - Supports scalar and parallel inputs with u8/u32/u64 digest types ### `PaddingFreeSponge` simplified - Removed `StatefulHasher` implementation - Now only implements `CryptographicHasher` - Clear role: overwrite-mode sponge for MMCS compatibility ### New `testing` module Extracted `MockPermutation` and `MockHasher` from scattered test modules: - Deterministic sum-and-fill behavior for traceable testing - Supports u8/u32/u64 in scalar and parallel `[T; M]` modes - Reduces test code duplication across modules ### `StatefulHasher` trait enhancement - Added `default_state(&self) -> State` method for initializing state ## Lifted Crate Changes ### Hash configuration restructuring Each hash variant now uses appropriate hasher types: **Poseidon2** (field-native permutation): - LMCS: `StatefulSponge<Poseidon2, WIDTH, RATE, DIGEST>` - MMCS: `PaddingFreeSponge<Poseidon2, WIDTH, RATE, DIGEST>` - Separate `lmcs_components()` and `mmcs_components()` functions **Keccak** (binary permutation): - LMCS: `SerializingStatefulSponge<StatefulSponge<KeccakF, 25, 17, 4>>` - Serializes field elements to u64 stream before sponge absorption - Compression via `TruncatedPermutation<KeccakF>` **Blake3** (new, streaming hasher): - LMCS: `ChainingHasher<Blake3>` with `CompressionFunctionFromHasher` - Scalar-only (Blake3 lacks vectorized implementation) - `PackedLmcs` aliased to `ScalarLmcs` for API compatibility ### Feature flag updates - Added `bench-blake3` feature for Blake3 benchmarks - Updated mutual exclusion: poseidon2/keccak/blake3 are exclusive - `merkle_commit` and `fri_fold` work with all three hash variants - `deep_quotient`, `lmcs_vs_mmcs`, `pcs` require Poseidon2 (DuplexChallenger) ## Design Rationale The separation ensures: 1. LMCS gets proper stateful absorption with configurable padding semantics 2. MMCS continues using proven overwrite-mode sponge 3. Each hasher family (field-native, binary, streaming) has an appropriate adapter 4. No trait bound conflicts from single type implementing incompatible interfaces 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.