VRAM-efficient multi-GPU and/or multi-node preconditioner computation #100

luciaquirke · 2025-12-19T06:33:58Z

More VRAM efficient variant where preconditioners can be spread across an arbitrary number of nodes to compute large outer products. This is useful because preconditioners are often applied to a query and then the query is run across a large dataset, so slow but VRAM-efficient preconditioner computation and usage is a scalable pattern.

The gradients computed from each data point on one device needs to be sent to all the other devices for the preconditioners to be updated, so this is not a drop-in replacement for our regular gradient collector.

LouisYRYJ · 2025-12-27T13:40:02Z

bergson/collector/multi_node_gradient_collector.py

+
+
+@dataclass(kw_only=True)
+class MultiNodeGradientCollector(HookCollectorBase):


is this going to be a replacement for GradientCollector? It seems like we don't it, if we have this one

Yes, I will merge this as a separate class for dogfooding and then replace the GradientCollector when we're convinced it's stable

Also this does a distributed operation with the data every step so all the preconditioners get all the data so it will probably be too slow to be our main collector. It's mostly aimed at collecting big preconditioners where you only need to process a small amount of data to get a reasonable estimate. I guess it will be equally fast if you skip the preconditioners but slower in a scenario where you could fit all the precs on the same rank

LouisYRYJ · 2025-12-27T13:41:53Z

bergson/build.py


 def build_worker(
    rank: int,
+    local_rank: int,


add to doc what this does

Automatically generated by python-semantic-release

luciaquirke · 2026-01-09T01:29:48Z

@norabelrose do you like this pattern where we have a distributed config dataclass that holds the rank information as properties, which return different values after the local_rank env variables are set? I was thinking of removing the local_rank parameters everywhere and always accessing them via the config object.

Or is it important to only initialize and pass in the rank parameters once they're set so users can't access potentially invalid variables except through os.environs?

https://github.com/EleutherAI/bergson/pull/100/changes#diff-e191f18aceff7de00f46cfdefddbff2d410ea97bf30d8c3bdf669eaa52c6b626

luciaquirke · 2026-01-09T01:54:03Z

@LouisYRYJ I extracted the multi node args into a config object and updated some names for clarity, going to merge for dogfooding today

luciaquirke changed the title ~~[Option] Parallelize preconditioners across ranks #94~~ [Option] Parallelize preconditioners across ranks Dec 19, 2025

luciaquirke force-pushed the multi-node branch from 8066d72 to fa7f1b3 Compare December 19, 2025 06:35

luciaquirke requested a review from LouisYRYJ December 21, 2025 00:44

luciaquirke force-pushed the multi-node branch 2 times, most recently from a9d1531 to 4061982 Compare December 21, 2025 01:12

luciaquirke changed the title ~~[Option] Parallelize preconditioners across ranks~~ [Option] Parallelize preconditioners across ranks; multi-node FSDP Dec 21, 2025

LouisYRYJ reviewed Dec 27, 2025

View reviewed changes

bergson/build.py

def build_worker(

rank: int,

local_rank: int,

Copy link

Contributor

LouisYRYJ Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add to doc what this does

luciaquirke reacted with eyes emoji

luciaquirke added 2 commits January 8, 2026 06:56

save

a58907c

Enable FSDP across nodes with START_RANK

db0e2d8

luciaquirke force-pushed the multi-node branch from 4061982 to db0e2d8 Compare January 8, 2026 06:59

luciaquirke and others added 2 commits January 9, 2026 01:13

extract distributed args to config

2f71965

0.5.0

e8fb0b4

Automatically generated by python-semantic-release

luciaquirke force-pushed the multi-node branch 3 times, most recently from e1c1da9 to 8a9e3d7 Compare January 9, 2026 01:50

luciaquirke changed the title ~~[Option] Parallelize preconditioners across ranks; multi-node FSDP~~ VRAM-efficient multi-GPU and/or multi-node preconditioner computation Jan 9, 2026

luciaquirke force-pushed the multi-node branch from 8a9e3d7 to 19657ee Compare January 9, 2026 01:53

luciaquirke force-pushed the multi-node branch 3 times, most recently from f933ace to ed20a46 Compare January 9, 2026 01:58

Use dist config

c7da55c

luciaquirke force-pushed the multi-node branch from ed20a46 to c7da55c Compare January 9, 2026 04:09

luciaquirke merged commit aaca464 into main Jan 9, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

VRAM-efficient multi-GPU and/or multi-node preconditioner computation #100

VRAM-efficient multi-GPU and/or multi-node preconditioner computation #100

Uh oh!

luciaquirke commented Dec 19, 2025 •

edited

Loading

Uh oh!

LouisYRYJ Dec 27, 2025

Uh oh!

luciaquirke Jan 6, 2026

Uh oh!

luciaquirke Jan 9, 2026 •

edited

Loading

Uh oh!

LouisYRYJ Dec 27, 2025

Uh oh!

luciaquirke commented Jan 9, 2026 •

edited

Loading

Uh oh!

luciaquirke commented Jan 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		@dataclass(kw_only=True)
		class MultiNodeGradientCollector(HookCollectorBase):

VRAM-efficient multi-GPU and/or multi-node preconditioner computation #100

VRAM-efficient multi-GPU and/or multi-node preconditioner computation #100

Uh oh!

Conversation

luciaquirke commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LouisYRYJ Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

luciaquirke Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

luciaquirke Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LouisYRYJ Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

luciaquirke commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

luciaquirke commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

luciaquirke commented Dec 19, 2025 •

edited

Loading

luciaquirke Jan 9, 2026 •

edited

Loading

luciaquirke commented Jan 9, 2026 •

edited

Loading

luciaquirke commented Jan 9, 2026 •

edited

Loading