Skip to content

Commit

Permalink
Merge pull request #94 from CCBR/docs_update
Browse files Browse the repository at this point in the history
docs update; pdq update
  • Loading branch information
kopardev authored Mar 1, 2024
2 parents c2680cf + fefa09e commit 92e1987
Show file tree
Hide file tree
Showing 11 changed files with 199 additions and 104 deletions.
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,20 @@

### Bug fixes

## spacesavers2 v0.12.0

### New features

- `spacesavers2_pdq` is now counting inodes (not files) and including links and directories (#95, @kopardev)
- "pathlib.glob" is replaced with "os.scandir" for speedy folder traversing
- `--quite` option added to `spacesavers2_pdq` and `spacesavers2_catalog` to suppress progress bar output when running non-interactively eg. as a cronjob. This reduces size of .err file.

### Bug fixes

- `spacesavers2_pdq` not does NOT ignore links and folders (#93, @kopardev)
- `redirect` correctly captures intermediate non-zero exit codes
- "eval" statements removed from `spacesavers2_e2e` to accurately capture non-zero exit codes; makes sure e2d fails if catalog fails internally

## spacesavers2 0.11.6

### New features
Expand Down
2 changes: 1 addition & 1 deletion bin/redirect
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,6 @@ else
fi

if [[ "$run" == "0" ]]; then
${TOOLDIR}/${TOOLNAME} "$@" || true
${TOOLDIR}/${TOOLNAME} "$@" || exit 1
conda deactivate 2>/dev/null
fi
1 change: 0 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@
- [spacesavers2_catalog](catalog.md)
- [spacesavers2_mimeo](mimeo.md)
- [spacesavers2_grubbers](grubbers.md)
- [spacesavers2_blamematrix](blamematrix.md)
- [spacesavers2_usurp](usurp.md)
- [spacesavers2_e2e](e2e.md)
- [spacesavers2_pdq](pdq.md)
Expand Down
30 changes: 15 additions & 15 deletions docs/pdq.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ pdq = Pretty Darn Quick
This uses `glob` library to list all files in a user-provided folder recursively.

For each user it gathers information like:
- total number of files
- total number of inodes
- total number of bytes

It is quick tool to gather datapoints to monitor filesystem usage. Typically, can be run once daily and compared with previous days run to find large changes.
Expand All @@ -21,12 +21,12 @@ It is quick tool to gather datapoints to monitor filesystem usage. Typically, ca
```bash
usage: spacesavers2_pdq [-h] -f FOLDER [-p THREADS] [-o OUTFILE] [-j JSON] [-v]

spacesavers2_pdq: get quick per user info (number of files and bytes).
spacesavers2_pdq: get quick per user info (number of inodes and bytes).

options:
-h, --help show this help message and exit
-f FOLDER, --folder FOLDER
spacesavers2_pdq will be run on all files in this folder and its subfolders
spacesavers2_pdq will be run on all inodes in this folder and its subfolders
-p THREADS, --threads THREADS
number of threads to be used (default 4)
-o OUTFILE, --outfile OUTFILE
Expand Down Expand Up @@ -55,27 +55,27 @@ user3 1499 126442496
The 3 items in the line are as follows:
| Column | Description | Example |
| ------ | ------------------------ | ---------------------------------------------------------------------------------------------- |
| 1 | username | "user1" |
| 2 | total no. of files owned | 1386138 |
| 3 | total no. of bytes occupied | 6089531321856 |
| Column | Description | Example |
| ------ | --------------------------- | ------------- |
| 1 | username | "user1" |
| 2 | total no. of inodes owned | 1386138 |
| 3 | total no. of bytes occupied | 6089531321856 |
## JSON output
Here is an example output:
```
{
"/data/CCBR_Pipeliner/Tools/spacesavers2": {
"37513": {
"username": "kopardevn",
"nfiles": 1267,
"/path/to/some/folder ": {
"1234": {
"username": "user1",
"ninodes": 1267,
"nbytes": 96084992
},
"60731": {
"username": "sovacoolkl",
"nfiles": 895,
"4356": {
"username": "user2",
"ninodes": 895,
"nbytes": 89249280
}
}
Expand Down
63 changes: 40 additions & 23 deletions spacesavers2_catalog
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ from pathlib import Path

def task(f):
fd = FileDetails()
# print(f"Initiating {f}")
fd.initialize(
f,
buffersize=args.buffersize,
Expand All @@ -27,8 +28,28 @@ def task(f):
bottomhash=args.bottomhash,
st_block_byte_size=args.st_block_byte_size,
)
# print(f"Returning {f}")
return fd

def process(fd,broken_links,outfh,geezerage,geezersize,geezers):
uid = fd.get_userid()
if fd.get_type() == "L": # broken link
if not uid in broken_links: broken_links[uid] = list()
broken_links[uid].append(fd.get_filepath())
else:
result = "%s" % (fd)
if not result == "":
outfh.write(f"{result}\n")
if fd.get_type() == "f":
age = fd.get_age()
size = fd.get_size()
if age > geezerage and size > geezersize:
x = list()
x.append("{0:.2f} yrs".format(age/365))
x.append(fd.get_size_human_readable())
x.append(fd.get_filepath())
if not uid in geezers: geezers[uid] = list()
geezers[uid].append("\t".join(x))

def main():
elog = textwrap.dedent(
Expand Down Expand Up @@ -131,6 +152,14 @@ def main():
action=argparse.BooleanOptionalAction,
help="output per-user geezer files list.",
)
parser.add_argument(
"-q",
"--quite",
dest="quite",
required=False,
action=argparse.BooleanOptionalAction,
help="Do not show progress",
)
parser.add_argument(
"-a",
"--geezerage",
Expand All @@ -154,16 +183,17 @@ def main():
global args
args = parser.parse_args()

tqdm_disable = False
if args.quite: tqdm_disable = True

global sed
sed = dict()
for s in args.se.split(","):
sed[s] = 1

folder = args.folder
p = Path(folder)
files = [p]
files2 = p.glob("**/*")
files.extend(files2)
p = Path(folder).absolute()
dirs = [p]

broken_links = dict()
geezers = dict()
Expand All @@ -174,25 +204,12 @@ def main():
outfh = sys.stdout

with Pool(processes=args.threads) as pool:
for fd in tqdm.tqdm(pool.imap_unordered(task, files),total=len(files)):
uid = fd.get_userid()
if fd.get_type() == "L": # broken link
if not uid in broken_links: broken_links[uid] = list()
broken_links[uid].append(fd.get_filepath())
else:
result = "%s" % (fd)
if not result == "":
outfh.write(f"{result}\n")
if fd.get_type() == "f":
age = fd.get_age()
size = fd.get_size()
if age > args.geezerage and size > args.geezersize:
x = list()
x.append("{0:.2f} yrs".format(age/365))
x.append(fd.get_size_human_readable())
x.append(fd.get_filepath())
if not uid in geezers: geezers[uid] = list()
geezers[uid].append("\t".join(x))
for fd in tqdm.tqdm(pool.imap_unordered(task, scantree(p,dirs)),disable=tqdm_disable):
process(fd,broken_links,outfh,args.geezerage,args.geezersize,geezers)

with Pool(processes=args.threads) as pool:
for fd in tqdm.tqdm(pool.imap_unordered(task, dirs),disable=tqdm_disable):
process(fd,broken_links,outfh,args.geezerage,args.geezersize,geezers)

if args.outfile:
outfh.close()
Expand Down
43 changes: 32 additions & 11 deletions spacesavers2_e2e
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
# spacesavers2 end-to-end wrapper script
####################################################################################
set -e -o pipefail
sleep_duration=10

SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )

Expand Down Expand Up @@ -44,10 +45,12 @@ outfile_blamematrix_err="${OUTFOLDER}/${PREFIX}.blamematrix.err"

if [ ! -d $OUTFOLDER ];then mkdir -p $OUTFOLDER;fi

exit_code=0
# spacesavers2_catalog
if [ -d $OUTFOLDER ];then
echo "Running spacesavers2_catalog..."
echo "Creating File: $outfile_catalog"
spacesavers2_catalog --version
cmd=$(
cat << EOF
spacesavers2_catalog \
Expand All @@ -56,15 +59,21 @@ spacesavers2_catalog \
--outfile ${outfile_catalog} \
--bottomhash \
--brokenlink \
--geezers \
> ${outfile_catalog_log} 2> ${outfile_catalog_err}
--geezers --quite
EOF
)
echo $cmd
eval $cmd
$cmd > ${outfile_catalog_log} 2> ${outfile_catalog_err}
exit_code=$?
echo "ExitCode:$exit_code"
if [ $exit_code -ne 0 ];then
exit 1
fi
else # exit if $OUTFOLDER does not exist
exit 1
fi

sleep 60
sleep $sleep_duration

# spacesavers2_mimeo
echo "Running spacesavers2_mimeo..."
Expand All @@ -73,6 +82,7 @@ if [ ! -f "${outfile_catalog}" ];then
echo "Creation of ${outfile_catalog} FAILED!!"
exit 1
fi
spacesavers2_mimeo --version
cmd=$(
cat << EOF
spacesavers2_mimeo \
Expand All @@ -82,14 +92,19 @@ spacesavers2_mimeo \
--duplicatesonly \
--maxdepth $MAXDEPTH \
--p $PREFIX \
--kronaplot \
> ${outfile_mimeo_log} 2> ${outfile_mimeo_err}
--kronaplot
EOF
)
echo $cmd
eval $cmd
$cmd > ${outfile_mimeo_log} 2> ${outfile_mimeo_err}
exit_code=$?
echo "ExitCode:$exit_code"
if [ $exit_code -ne 0 ];then
exit 1
fi

sleep 60
sleep $sleep_duration

# spacesavers2_grubbers
echo "Running spacesavers2_grubbers..."
Expand All @@ -103,17 +118,23 @@ for filegz in `ls ${OUTFOLDER}/${PREFIX}*files.gz`;do
outfile=`echo $filegz|sed "s/mimeo.files.gz/grubbers.tsv/g"`
logfile=`echo $filegz|sed "s/mimeo.files.gz/grubbers.log/g"`
errfile=`echo $filegz|sed "s/mimeo.files.gz/grubbers.err/g"`
spacesavers2_grubbers --version
cmd=$(
cat << EOF
spacesavers2_grubbers \
--filesgz $filegz \
--limit $LIMIT \
--outfile $outfile \
> $logfile 2> $errfile
--outfile $outfile
EOF
)
echo $cmd
eval $cmd
$cmd > $logfile 2> $errfile
exit_code=$?
echo "ExitCode:$exit_code"
if [ $exit_code -ne 0 ];then
exit 1
fi

done


Expand Down
Loading

0 comments on commit 92e1987

Please sign in to comment.