Skip to content

Commit 2ac5ca8

Browse files
committed
LICENSE, README, example script
1 parent 6bea3dd commit 2ac5ca8

File tree

4 files changed

+116
-7
lines changed

4 files changed

+116
-7
lines changed

LICENSE

+20
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
The MIT License (MIT)
2+
3+
Copyright (c) 2021 Simon Heumos
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy of
6+
this software and associated documentation files (the "Software"), to deal in
7+
the Software without restriction, including without limitation the rights to
8+
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
9+
the Software, and to permit persons to whom the Software is furnished to do so,
10+
subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
17+
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
18+
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
19+
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
20+
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

README.md

+68
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# pgge
2+
3+
## the pangenome graph evaluator
4+
5+
This pangenome graph evaluation pipeline measures the reconstruction accuracy of a pangenome graph (in the variation graph model).
6+
Its goal is to give guidance in finding the best pangenome graph construction tool for a given input data and task.
7+
8+
It has two phases:
9+
10+
1. _[GraphAligner](https://github.com/maickrau/GraphAligner)_: (*alignment*) -- SHORT DESCRIPTION. TODO.
11+
12+
2. _[peanut](https://github.com/subwaystation/rs-peanut)_: (*alignment evaluation*) -- SHORT DESCRIPTION. TODO.
13+
14+
## general usage
15+
16+
TODO
17+
18+
Create our pangenome graph and consensus graphs using `pggb`:
19+
20+
```
21+
pggb -i cerevisiae.pan.fa -s 50000 -p 90 -w 30000 -n 5 -t 16 -v -Y "#" -S -k 8 -B 10000000 -I 0.7 -o pggb -W -m -S
22+
```
23+
Evaluate the smoothed graph.
24+
```
25+
for f in pggb/*.smooth.gfa
26+
do
27+
pgge -g $f -f cerevisiae.pan.fa -o $f.gaf -t 16
28+
done
29+
```
30+
Evaluate the consensus graphs.
31+
```
32+
for f in pggb/*.consensus*.gfa
33+
do
34+
pgge -g $f -f cerevisiae.pan.fa -o $f.gaf -t 16
35+
done
36+
```
37+
Print results to stdout:
38+
```
39+
for f in pggb/*.pgge
40+
do
41+
echo $f | tr "\n" "\t"
42+
cat $f
43+
done
44+
```
45+
46+
These commands can also be found in `scripts/pggb.sh`.
47+
### output
48+
49+
The output is written to `input.gaf.pgge` in a tab-delimited format:
50+
```
51+
0.994424 0.9929550476154135 0.9970526308543786
52+
```
53+
The first number is the `aid`, the second number is the [qsm](https://github.com/subwaystation/rs-peanut#query-sequence-match-qsm), and the third number is the [qsamm](https://github.com/subwaystation/rs-peanut#query-sequence-alignment-match-mismatch-qsamm).
54+
55+
## installation
56+
TODO
57+
58+
## TODOs
59+
- [ ] Finish README.
60+
- [ ] Explain `aid`.
61+
- [ ] Add option to directly start from GAF file.
62+
- [ ] The user should be able to select options for GraphAligner.
63+
- [ ] Add usage examples for _`minigraph`_, _`cactus`_, and _`SibeliaZ`_.
64+
- [ ] Add Dockerfile.
65+
- [ ] Add a CI building the Dockerfile and emitting evaluation metrics for all tools using `HLA-Zoo` data.
66+
- [ ] Should _`pgge`_ accept several files as input and output the results in one file?
67+
- [ ] Add output-folder option.
68+
- [ ] Integrate into nf-core/pangenome pipeline.

pgge

+9-7
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ then
1414
fi
1515

1616
## TODO add options for GraphAligner
17-
## TODO add output-folder option?
17+
## TODO add output-folder option
1818

1919
# read the options
2020
cmd=$0" "$@
@@ -92,15 +92,17 @@ $timer -f "$fmt" GraphAligner \
9292
-a $output_gaf \
9393
-x vg \
9494
-t $threads \
95-
>/dev/null && cut -f 2,3,4,16 $output_gaf \
95+
2> >(tee -a $log_file)
96+
97+
($timer -f "$fmt" cut -f 2,3,4,16 $output_gaf \
9698
| sed s/id:f:// \
9799
| awk '{ len=$3-$2; tlen+=len; sum+=$4*len; } END { print sum / tlen }' \
98-
2> >(tee -a $log_file)
100+
| tr "\n" "\t" \
101+
1> $output_gaf.pgge) 2> >(tee -a $log_file)
102+
99103
## TODO
100104
## Directly start from GAF
101105

102-
## TODO
103-
## Emit new evaluation metric
104-
$timer -f "$fmt" peanut \
106+
($timer -f "$fmt" peanut \
105107
-g $output_gaf \
106-
2> >(tee -a $log_file)
108+
1>> $output_gaf.pgge) 2> >(tee -a $log_file)

scripts/pgge.sh

+19
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
#!/bin/bash
2+
3+
pggb -i cerevisiae.pan.fa -s 50000 -p 90 -w 30000 -n 5 -t 16 -v -Y "#" -S -k 8 -B 10000000 -I 0.7 -o pggb -W -m -S
4+
5+
for f in pggb/*.smooth.gfa
6+
do
7+
pgge -g $f -f cerevisiae.pan.fa -o $f.gaf -t 16
8+
done
9+
10+
for f in pggb/*.consensus*.gfa
11+
do
12+
pgge -g $f -f cerevisiae.pan.fa -o $f.gaf -t 16
13+
done
14+
15+
for f in pggb/*.pgge
16+
do
17+
echo $f | tr "\n" "\t"
18+
cat $f
19+
done

0 commit comments

Comments
 (0)