regalloc2 makes too many memory allocations

I used [heaptrack](https://github.com/KDE/heaptrack) to analyze the memory allocation patterns of my compiler which uses regalloc2. I used a large benchmark (LLVM library) which has 1257955 basic blocks in 104913 functions. Out of a total of 46780667 allocations, 46779011 of them come from regalloc2 and only 1656 come from other parts of the compiler.

My compiler uses a similar structure [^1] to Cranelift where a [`Context`](https://docs.rs/cranelift-codegen/latest/cranelift_codegen/struct.Context.html) structure allows memory allocations to be reused across compilation units. Specifically, this allows various `Vec`s and `HashMap`s to grow to a sufficient size only once while being reused many times for compiling multiple functions.

[^1]: In fact I am using the `cranelift-entity` crate directly, it's very well written.

regalloc2 unfortunately doesn't follow this design:
- A fresh `Env` structure is allocated and freed for each function. No memory allocations are preserved.
- Heavy use of temporary `SmallVec`s and `HashSet`s. Heaptrack reports that 19865290 allocations (42% of the total) are "temporary". These are allocations which are immediately followed by a free, with no other allocations in between.
- Heavy use of `BTreeMap` which needs to allocate many nodes separately and cannot reuse memory unlike `HashMap` and `Vec`.

## Heaptrack results

Here are the major allocation hotspots shown by heaptrack:

In `try_to_allocate_bundle_to_reg`:
- 16158324 temporary allocations from the `FxHashSet`.
- 9677435 allocations from inserting liveranges into `PRegData::allocations` (`LiveRangeSet` backed by a `BTreeMap`).

In `merge_bundles`:
- 4478881 allocations from pushing a `VReg` to `SpillSet::vregs` (`SmallVec`).
- 1895443 allocations from appending `LiveRangeListEntry`s to `LiveBundle::ranges` (`SmallVec`).

In `insert_use_into_liverange`:
- 2578957 allocations from pushing a `Use` to `LiveRange::uses` (`SmallVec`).

In `add_liverange_to_vreg`:
- 1432853 allocations from pushing a `LiveRangeListEntry` to `VRegData::ranges` (`SmallVec`).

In `create_pregs_and_vregs`:
- 948582 allocations from pushing to `Env::vregs` and `Env::allocs` and `Env::inst_alloc_offsets`. (`Vec`)

In `Env::new`:
- 1154043 allocations from initializing the various `Vec`s with `Vec::with_capacity`.

## Performance impact

Benchmarking shows that approximately 10% of the time is spent in memory allocation functions (`malloc`, `realloc`, `free`). Additionally, another 10% of the time is spent in `memcpy`/`memmove` which could be due to vector reallocation.

Compiling with multiple threads can cause additional contention on the global allocator. This was noticed on musl targets where the global allocator doesn't have thread-local caches: compiler performance on 6 threads (on a 6-core machine) was reduced to that of a single thread whereas other allocators allow almost-perfect performance scaling. 

## Proposed solution

I believe the best way to improve regalloc2 would be to use a design similar to Cranelift with a `Context` structure that can preserve memory allocation across multiple invocations of the register allocator.

- Temporary `SmallVec`s and `HashSet`s should be replaced with a single instance in `Context`.
- The many `BTreeMap`s should be replaced with `cranelift-bforest` which maintains a pool of nodes that can be reused.
- The many `SmallVec`s should be replaced with `EntityList` from `cranelift-entity` and a `ListPool` in `Context`.

Since this would require a dependency on the `cranelift-entity` and `cranelift-bforest` crates, this is also a good opportunity to resolve #62 and switch to using proper indexed types internally.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

regalloc2 makes too many memory allocations #87

Heaptrack results

Performance impact

Proposed solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

regalloc2 makes too many memory allocations #87

Description

Heaptrack results

Performance impact

Proposed solution

Footnotes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions