Add support for gradient checkpointing by eric-czech · Pull Request #60 · kuleshov-group/caduceus

eric-czech · 2025-01-11T21:20:16Z

This adds support for gradient checkpointing using both the Mosaic Composer Trainer and Hugging Face Trainer interfaces.

Both of those assume that checkpointing is configured at training time, rather than during configuration, so you can see that little changes about configuration other than adding a gradient_checkpointing_stride to control how frequently checkpoints are added to the Mamba blocks.

I went back and forth a little bit on how to validate this functionality, and ultimately landed on counting executions of forward passes (through hooks) as being the cleanest way to do it. Let me know if anybody is aware of other ways to test it.

linnnnCTCT · 2025-05-15T14:55:07Z

Thanks for providing this implementation!

I've been testing scaling up the model parameters on an A100-80G GPU using gradient checkpointing. However, I hit a limit around 300M parameters (with a caduceus_ph config, d_model:1024, n_layer:48).

Have you tested larger parameter? Do you have any recommended strategies for scaling further?

Add support for gradient checkpointing

3683995

eric-czech force-pushed the main branch from c54fee2 to 3683995 Compare January 11, 2025 21:21

eric-czech mentioned this pull request Jan 13, 2025

Fix rcps test and mamba imports #65

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for gradient checkpointing#60

Add support for gradient checkpointing#60
eric-czech wants to merge 1 commit intokuleshov-group:mainfrom
eric-czech:main

eric-czech commented Jan 11, 2025 •

edited

Loading

Uh oh!

linnnnCTCT commented May 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

eric-czech commented Jan 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linnnnCTCT commented May 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eric-czech commented Jan 11, 2025 •

edited

Loading