Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Git-Theta Clean #239

Open
blester125 opened this issue Apr 24, 2024 · 0 comments
Open

Git-Theta Clean #239

blester125 opened this issue Apr 24, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@blester125
Copy link
Collaborator

Currently, when a git commit is removed from git, the lfs files for the parameters will remain in .git/lfs, we should have a command like git theta clean that will remove dangling parameter files. This is especially useful in the cases where a merge is undone or an experimental branch is deleted.

Basic steps would probably be:

  • Iterate through all files that are theta tracked (though all history)
  • Iterate through the history of each file
  • Collect the git lfs oid metadata for each parameter in the model
  • Delete all files from .git/lfs that aren't in the git history

We might need to also check for lfs tracked files to make sure we don't delete on that is needed. git lfs data seems to be stored in .git/lfs/XX/YY/ dirs where XXYY are the start of the oid metadata.

We would have to check all files above, even if the tool was scoped to delete a single model (i.e., git theta clean my-model.pt) because if parameters are shared between models they are shared in .git/lfs so we would need to make sure no other model uses that file.

As outlined, this would only clean up a local clone of the repo, unclear on how/if we would need to clean up the remote version

@blester125 blester125 added the enhancement New feature or request label Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant