more efficient use of memory buffers for LN recomputation #532

ngc92 · 2024-06-03T15:32:02Z

saves another ~300MiB for B=64 with the 774M model.

We just use a single buffer lnf for all layernorm computations. This works fine, except that we already were repurposing that in the backward pass :)

So in backward, we now instead reuse the last residual connection result instead (which is of the same size), which is also no longer needed after computing the gradient of lnf.

more efficient use of memory buffers for LN recomputation

0d16b51

ngc92 force-pushed the ln-buffers branch from ab87e76 to 0d16b51 Compare June 3, 2024 15:32

karpathy merged commit d17893b into karpathy:master Jun 3, 2024
8 checks passed

ngc92 deleted the ln-buffers branch June 3, 2024 21:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

more efficient use of memory buffers for LN recomputation #532

more efficient use of memory buffers for LN recomputation #532

ngc92 commented Jun 3, 2024

more efficient use of memory buffers for LN recomputation #532

more efficient use of memory buffers for LN recomputation #532

Conversation

ngc92 commented Jun 3, 2024