Skip to content

regalloc2 makes too many memory allocations #87

Open
@Amanieu

Description

@Amanieu

I used heaptrack to analyze the memory allocation patterns of my compiler which uses regalloc2. I used a large benchmark (LLVM library) which has 1257955 basic blocks in 104913 functions. Out of a total of 46780667 allocations, 46779011 of them come from regalloc2 and only 1656 come from other parts of the compiler.

My compiler uses a similar structure 1 to Cranelift where a Context structure allows memory allocations to be reused across compilation units. Specifically, this allows various Vecs and HashMaps to grow to a sufficient size only once while being reused many times for compiling multiple functions.

regalloc2 unfortunately doesn't follow this design:

  • A fresh Env structure is allocated and freed for each function. No memory allocations are preserved.
  • Heavy use of temporary SmallVecs and HashSets. Heaptrack reports that 19865290 allocations (42% of the total) are "temporary". These are allocations which are immediately followed by a free, with no other allocations in between.
  • Heavy use of BTreeMap which needs to allocate many nodes separately and cannot reuse memory unlike HashMap and Vec.

Heaptrack results

Here are the major allocation hotspots shown by heaptrack:

In try_to_allocate_bundle_to_reg:

  • 16158324 temporary allocations from the FxHashSet.
  • 9677435 allocations from inserting liveranges into PRegData::allocations (LiveRangeSet backed by a BTreeMap).

In merge_bundles:

  • 4478881 allocations from pushing a VReg to SpillSet::vregs (SmallVec).
  • 1895443 allocations from appending LiveRangeListEntrys to LiveBundle::ranges (SmallVec).

In insert_use_into_liverange:

  • 2578957 allocations from pushing a Use to LiveRange::uses (SmallVec).

In add_liverange_to_vreg:

  • 1432853 allocations from pushing a LiveRangeListEntry to VRegData::ranges (SmallVec).

In create_pregs_and_vregs:

  • 948582 allocations from pushing to Env::vregs and Env::allocs and Env::inst_alloc_offsets. (Vec)

In Env::new:

  • 1154043 allocations from initializing the various Vecs with Vec::with_capacity.

Performance impact

Benchmarking shows that approximately 10% of the time is spent in memory allocation functions (malloc, realloc, free). Additionally, another 10% of the time is spent in memcpy/memmove which could be due to vector reallocation.

Compiling with multiple threads can cause additional contention on the global allocator. This was noticed on musl targets where the global allocator doesn't have thread-local caches: compiler performance on 6 threads (on a 6-core machine) was reduced to that of a single thread whereas other allocators allow almost-perfect performance scaling.

Proposed solution

I believe the best way to improve regalloc2 would be to use a design similar to Cranelift with a Context structure that can preserve memory allocation across multiple invocations of the register allocator.

  • Temporary SmallVecs and HashSets should be replaced with a single instance in Context.
  • The many BTreeMaps should be replaced with cranelift-bforest which maintains a pool of nodes that can be reused.
  • The many SmallVecs should be replaced with EntityList from cranelift-entity and a ListPool in Context.

Since this would require a dependency on the cranelift-entity and cranelift-bforest crates, this is also a good opportunity to resolve #62 and switch to using proper indexed types internally.

Footnotes

  1. In fact I am using the cranelift-entity crate directly, it's very well written.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions