-
Notifications
You must be signed in to change notification settings - Fork 74
/
git-clone-subset
executable file
·151 lines (124 loc) · 5.52 KB
/
git-clone-subset
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
#!/usr/bin/env bash
#
# git-clone-subset - clones a subset of a git repository
#
# Copyright (C) 2012 Rodrigo Silva (MestreLion) <[email protected]>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not see <http://www.gnu.org/licenses/gpl.html>
#
# Uses git clone and git filter-branch to remove from the clone all files but
# the ones requested, along with their associated commit history.
usage() {
cat <<- USAGE
Usage: $myname [options] <repository> <destination-dir> <pattern>
USAGE
if [[ "$1" ]] ; then
cat >&2 <<- USAGE
Try '$myname --help' for more information.
USAGE
exit 1
fi
cat <<-USAGE
Clones a <repository> into a <destination-dir> and prune from history
all files except the ones matching <pattern> by running on the clone:
git filter-branch --prune-empty --tree-filter 'git rm ...' -- --all
This effectively creates a clone with a subset of files (and history)
of the original repository. The original repository is not modified.
Useful for creating a new repository out of a set of files from another
repository, migrating (only) their associated history. Very similar to:
git filter-branch --subdirectory-filter
But $myname works on a path pattern instead of just a single directory.
Options:
-h, --help
show this page.
<repository>
URL or local path to the git repository to be cloned.
<destination-dir>
Directory to create the clone. Same rules for git-clone applies: it
will be created if it does not exist and it must be empty otherwise.
But, unlike git-clone, this argument is not optional: git-clone uses
several rules to determine the "friendly" basename of a cloned repo,
and $myname will not risk parse its output, let alone
predict the chosen name.
<pattern>
Glob pattern to match the desired files/dirs. It will be ultimately
evaluated by a call to bash, NOT git or sh, using extended glob
'!(<pattern>)' rule. Quote it or escape it on command line, so it
does not get evaluated prematurely by your current shell. Only a
single pattern is allowed: if more are required, use extglob's "|"
syntax. Globs will be evaluated with bash's shopt dotglob set, so
beware. Patterns should not contain spaces or special chars like
" ' \$ ( ) { } \`, not even quoted or escaped, since that might
interfere with the !() syntax after pattern expansion.
Pattern Examples:
"*.png"
"*.png|*icon*"
"*.h|src/|lib"
Limitations:
- Renames are NOT followed. As a workaround, list the rename history with
'git log --follow --name-status --format='%H' -- file | grep "^[RAD]"'
and include all multiple names of a file in the pattern, as in
"current_name|old_name|initial_name". As a side effect, if a different
file has taken place of an old name, it will be preserved too, and
there is no way around this using this tool.
- There is no (easy) way to keep some files in a dir: using 'dir/foo*'
as pattern will not work. So keep the whole dir and remove files
afterwards, using git filter-branch and a (quite complex) combination
of cloning, remote add, rebase, etc.
- Pattern matching is quite limited, and many of bash's escaping and
quoting does not work properly when pattern is expanded inside !().
Copyright (C) 2013 Rodrigo Silva (MestreLion) <[email protected]>
License: GPLv3 or later. See <http://www.gnu.org/licenses/gpl.html>
USAGE
exit 0
}
# Helper functions
myname="${0##*/}"
argerr() { printf "%s: %s\n" "${0##*/}" "${1:-error}" >&2 ; usage 1 ; }
invalid() { argerr "invalid option: $1" ; }
missing() { argerr "missing ${2:+$2 }operand${1:+ from $1}." ; }
# Argument handling
for arg in "$@"; do case "$arg" in -h|--help) usage ;; esac; done
repo=$1
dir=$2
pattern=$3
[[ "$repo" ]] || missing "" "<repository>"
[[ "$dir" ]] || missing "" "<destination-dir>"
[[ "$pattern" ]] || missing "" "<pattern>"
(($# > 3)) && argerr "too many arguments: $4"
# Clone the repo and enter it
git clone --no-hardlinks "$repo" "$dir" && cd "$dir" &&
# Remove remotes (a clone is meant to be a different repository)
while read -r remote; do
git remote rm "$remote" || break
done < <(git remote) &&
# The heart of the script
git filter-branch --prune-empty --tree-filter \
"bash -O dotglob -O extglob -c 'git rm -rf --ignore-unmatch -- !($pattern)'" \
-- --all &&
# fix a bug in filter-branch where empty root commits are not removed
# even with --prune-empty. First we loop each root commit
while read -r root; do
# Test if it's an non-empty commit
if [[ "$(git ls-tree "$root")" ]]; then continue; fi
# Now "remove" it by deleting its child's parent reference
git filter-branch --force --parent-filter "sed 's/-p $root//'" -- --all
done < <(git rev-list --max-parents=0 HEAD) &&
# Delete backups and the reflog, not needed in a clone
if [[ -e .git/refs/original/ ]] ; then
git "for-each-ref" --format="%(refname)" refs/original/ |
xargs -n 1 git update-ref -d
fi &&
git reflog expire --expire=now --all &&
git gc --prune=now