Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds a low memory mode to git theta where some concurrency is sacrificed to keep the memory footprint as low as possible.
The main issues it fixes are:
During the clean filter the dl native checkpoint data is piped into the filter by git. It is then passed into the checkpoint loader (for example
torch.load
) and read. This case can cause a transient issue where there is ~2x the model size memory used, the bytes in the stdin buffer and the actual model as tensors. When usingGIT_THETA_LOW_MEMORY=True
the stdin is first written to a temp file before being read from disk.During the parameter cleaning process, there is a map from parameter name to parameter value, as a parameter value is cleaned it is not removed from this map and therefore not garbage collected. This can cause issues as when the parameter is serialized it can case a transient doubling of memory (the tensor itself and the serialized version). This is especially apparent on things like embedding tables. This change removes the parameter values from the map once they are serialized so the memory usage goes down as more of the model is written out.
One future update might be to move from a boolean to a numerical values where something like level 1 low memory just does the checkpoint temp file and higher levels do things like reducing the concurrency to allow for releasing model parameters.
These kind of changes (only loading then releasing subsets of parameters) will be be important if we want to support really big models that have streamable/lazy loading of checkpoints.
There are also a few small changes,
git_theta.py
->git_theta_cli.py
as the naming was causing weird import issues (but the cli command is stillgit-theta
as it is set by the console script entry point), getting blobs from git was bugged for things like lived in subdirs, and checking in real torch checkpoints that had parameters last on cuda devices was bugged.