Skip to content

Descend Arena Integration#36

Draft
StoeckOverflow wants to merge 57 commits intomainfrom
feature/arena-integration
Draft

Descend Arena Integration#36
StoeckOverflow wants to merge 57 commits intomainfrom
feature/arena-integration

Conversation

@StoeckOverflow
Copy link

Switch AST + Typechecker to arena-backed data structures

Our parser and typechecker moved a lot of data around (Boxes, Vecs, Rc-like pointers), creating churn for the borrow checker and unnecessary heap traffic. This PR introduces an arena-backed AST and refactors large parts of the type checker to work with &'a references and BumpVec<'a> instead of owning containers. The goals:

  • Mass allocate / mass free: allocate the whole program in one or a few bump arenas and free them all at once.
  • Lower allocation overhead and reduce pointer chasing.
  • Make lifetimes explicit (compiler-enforced shape of the program graph).

What’s in this PR

1) New arena AST and conversion from heap AST

  • Added a mirror arena AST (arena_ast::...) with &'a fields and BumpVec<'a> collections.
  • Arena nodes have a clone_in(&Bump) -> Self utility:
    derive(Clone) preserves allocations and lifetimes. For fields tied to an arena—e.g., &'a T or BumpVec<'a, T>Clone either copies the reference or allocates in the same arena. It cannot migrate data to a different arena.
    clone_in(&arena) rebinds to a (new) arena. The clone_in implementations re-allocate all arena-owned pieces in the given arena and recursively convert children, producing a value whose references and BumpVec<'a, _> now live off the target arena.

Parser integration

pub fn parse<'a>(arena: &'a Bump, source: &'a SourceCode<'a>)
  -> Result<ArenaCompilUnit<'a>, ErrorReported> 
{
    let parser = Parser::new(source);
    let heap_items = parser.parse().map_err(|err| err.emit())?;

    // 1) heap -> arena
    let mut arena_items = bumpalo::collections::Vec::new_in(arena);
    let mut struct_refs: Vec<&'a ArenaStructDecl<'a>> = Vec::new();

    for heap_item in heap_items {
        match heap_item.into_arena(arena) {
            ArenaItem::StructDecl(sd_ref) => {
                struct_refs.push(sd_ref);
                arena_items.push(ArenaItem::StructDecl(sd_ref));
            }
            other => arena_items.push(other),
        }
    }

    // 2) mutate fun defs safely (build owned, write back a ref)
    for item in arena_items.iter_mut() {
        if let ArenaItem::FunDef(fun_def_ref) = item {
            let mut owned = (**fun_def_ref).clone();
            replace_arg_kinded_idents(&mut owned, arena);
            replace_exec_idents_with_specific_execs(arena, &mut owned);
            *fun_def_ref = arena.alloc(owned);
        }
    }

    // 3) resolve struct idents using collected &'a decls
    for item in arena_items.iter_mut() {
        replace_struct_idents_with_specific_struct_dtys(arena, &struct_refs, item);
    }

    Ok(ArenaCompilUnit::new(arena_items, source))
}

This preserves the existing parser (which doesn’t support direct arena allocation) and converts the parsed tree into the arena.

2) Visitors and “mutating” updates in an arena world

We can’t mutably borrow fields behind shared references. So the VisitMut-style passes follow a clone–edit–reassign pattern:

// example for DataTyKind::ArrayShape
DataTyKind::ArrayShape(elem_ref, n_ref) => {
    let mut elem = (**elem_ref).clone();
    visitor.visit_dty(arena, &mut elem);
    *elem_ref = arena.alloc(elem);

    let mut n = (*n_ref).clone();
    visitor.visit_nat(arena, &mut n);
    *n_ref = n;
}

This keeps the API “mutating” from the outside (callers pass &mut), but internally:

  1. Clone the value (owned, editable),
  2. Run the visitor,
  3. Allocate the edited value in the arena,
  4. Write the arena reference back.

This same technique is applied throughout: data types, exec expressions, views, etc.

3) Type checking refactor (subty, unify, pre_decl, pl_expr, infer_kinded_args)

  • Replaced Vec/Box with BumpVec<'a>/&'a T consistently.

  • Added clone_in methods for types like ExecExpr, ExecExprKind, ExecPathElem, View, ArgKinded, FunDef, etc.

  • Reworked lifetime-heavy substitution and unification:

    • When substituting inside collections/maps, we often build an owned clone, run substitution, and then reassign a reference (arena.alloc(...)).
    • For binders like bind_to, we allocate a term_ref in the arena and substitute into all map values with that reference (should avoid self-referential/lifetime pitfalls).
  • PlaceExpr.ty moved from Option<&'a Ty<'a>> to OnceCell<&'a Ty<'a>>, because we now compute types lazily while walking the AST and want a write-once set.

pl_expr highlights

  • ty_check_and_passed_mems_prvs (and helpers) now take a &PlaceExpr by default; we only require &mut PlaceExpr when we truly need to rewrite that node. The computed type is stored via the node’s OnceCell (interior mutability), and we return the passed memories and provenances. This should avoid borrow conflicts and unnecessary AST mutation.

  • ty_check_view_pl_expr takes &PlaceExpr and &View immutably, clones the View into the arena (editable v_tmp), performs inference and substitution on the clone, and does not write back to the original AST. We only need the resulting type plus (mems, prvs), which are returned.

infer_kinded_args

  • For consistency, returns BumpVec<'a, ArgKinded<'a>> when called from arena contexts (added a helper returning bump vectors). This avoids mixing alloc::Vec with BumpVec.

API changes

  • PlaceExpr.ty is now OnceCell<&'a Ty<'a>>.
  • Many functions now take &'a Bump and/or return BumpVec<'a, _> instead of Vec.
  • Widespread adoption of clone_in instead of plain clone() when a value must live in the arena.
  • All constrain/substitute/ty_check_* signatures now accept arena and work with references (no Boxes).

Testing changes

  • Tests now create a Bump and construct types via DataTy::new(&arena, ...).
  • When a function expects references (&'a DataTy<'a> etc.), we allocate arena.alloc(...).
  • Some tests needed temporary arena-allocated clones to satisfy borrow checker rules while running constrain/substitute.

Example adapted test:

#[test]
fn scalar<'a>() -> UnifyResult<'a, ()> {
    let arena = Bump::new();

    let mut i32_ty = DataTy::new(&arena, DataTyKind::Scalar(ScalarTy::I32));
    let mut t = DataTy::new(&arena, DataTyKind::Ident(Ident::new_impli(&arena, "t")));

    // Constrain works on &'a DataTy<'a>
    let lhs = arena.alloc(i32_ty.clone_in(&arena));
    let rhs = arena.alloc(t.clone_in(&arena));
    let (subst, _prv) = constrain(lhs, rhs, &arena).unwrap();

    substitute(&subst, &mut i32_ty, &arena);
    substitute(&subst, &mut t, &arena);

    assert_eq!(i32_ty, t);
    Ok(())
}

Open items / follow-ups

  1. ty_check/mod.rs borrows and mutability

    • There are still some lifetime/borrow friction around GlobalCtx holding &mut CompilUnit while also errors have to be emited (err.emit(compil_unit.source)).
    • Solution idea: scope the mutable borrow so gl_ctx is dropped before emitting. Alternatively, make emit take a snapshot or &Source, not &mut CompilUnit.
  2. GlobalCtx decl storage safety

    • Ensure that no references are stored into compil_unit.items. Replace with arena.alloc_str(name) + clone_in into the arena (already done in most places; a final pass should remove the remaining cases).
  3. More clone_in coverage

    • Added clone_in for many core types (e.g. ExecExprKind, ExecPathElem, View, ArgKinded, FunDef). Should finish the remaining node kinds for consistency.
  4. Benchmarks

    • Once stabilized, run a before/after benchmark to quantify allocation reductions and CPU impact.

TL;DR

  • Introduces an arena-backed AST with clone_in utilities and into_arena conversions.
  • Refactors typechecker modules (subtyping, unification, view checking, etc.) to use &'a refs and BumpVec<'a>.
  • Moves PlaceExpr.ty to OnceCell<&'a Ty<'a>>.
  • Parser now returns an arena AST; later we can allocate in the arena directly if needed.
  • Leaves a few scoped follow-ups (global ctx decl safety, borrow scoping in mod.rs, API unification).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant