-
Notifications
You must be signed in to change notification settings - Fork 8
Description
When you do a git add
with a filter function, git pipes the file contents to your filter function via stdin. We then read from stdin as if it was a file to load the checkpoint with the dl-framework native checkpoint reader.
This can result in high memory usage as the whole checkpoint (in it's on-disk format) is in memory and the whole checkpoint (in nest dict of tensors format) is also in memory. For example, the 24Gb of RAM on the AI cluster node @dptam was using had OoM errors when trying to git add a mt5-xl model (14Gb on disk).
One possible work-around, which we could codify into a tool, is to basically run the clean filter on the file manually (with a disk path instead of stdin) so that you get the git-theta metadata file out, and then that gets added, tracked, and committed. Then you can checkout the path to pull the real checkpoint out of git. Unsure how usable this is beyond doing it the first time.