|
| 1 | +# pgge |
| 2 | + |
| 3 | +## the pangenome graph evaluator |
| 4 | + |
| 5 | +This pangenome graph evaluation pipeline measures the reconstruction accuracy of a pangenome graph (in the variation graph model). |
| 6 | +Its goal is to give guidance in finding the best pangenome graph construction tool for a given input data and task. |
| 7 | + |
| 8 | +It has two phases: |
| 9 | + |
| 10 | +1. _[GraphAligner](https://github.com/maickrau/GraphAligner)_: (*alignment*) -- SHORT DESCRIPTION. TODO. |
| 11 | + |
| 12 | +2. _[peanut](https://github.com/subwaystation/rs-peanut)_: (*alignment evaluation*) -- SHORT DESCRIPTION. TODO. |
| 13 | + |
| 14 | +## general usage |
| 15 | + |
| 16 | +TODO |
| 17 | + |
| 18 | +Create our pangenome graph and consensus graphs using `pggb`: |
| 19 | + |
| 20 | +``` |
| 21 | +pggb -i cerevisiae.pan.fa -s 50000 -p 90 -w 30000 -n 5 -t 16 -v -Y "#" -S -k 8 -B 10000000 -I 0.7 -o pggb -W -m -S |
| 22 | +``` |
| 23 | +Evaluate the smoothed graph. |
| 24 | +``` |
| 25 | +for f in pggb/*.smooth.gfa |
| 26 | +do |
| 27 | + pgge -g $f -f cerevisiae.pan.fa -o $f.gaf -t 16 |
| 28 | +done |
| 29 | +``` |
| 30 | +Evaluate the consensus graphs. |
| 31 | +``` |
| 32 | +for f in pggb/*.consensus*.gfa |
| 33 | +do |
| 34 | + pgge -g $f -f cerevisiae.pan.fa -o $f.gaf -t 16 |
| 35 | +done |
| 36 | +``` |
| 37 | +Print results to stdout: |
| 38 | +``` |
| 39 | +for f in pggb/*.pgge |
| 40 | +do |
| 41 | + echo $f | tr "\n" "\t" |
| 42 | + cat $f |
| 43 | +done |
| 44 | +``` |
| 45 | + |
| 46 | +These commands can also be found in `scripts/pggb.sh`. |
| 47 | +### output |
| 48 | + |
| 49 | +The output is written to `input.gaf.pgge` in a tab-delimited format: |
| 50 | +``` |
| 51 | +0.994424 0.9929550476154135 0.9970526308543786 |
| 52 | +``` |
| 53 | +The first number is the `aid`, the second number is the [qsm](https://github.com/subwaystation/rs-peanut#query-sequence-match-qsm), and the third number is the [qsamm](https://github.com/subwaystation/rs-peanut#query-sequence-alignment-match-mismatch-qsamm). |
| 54 | + |
| 55 | +## installation |
| 56 | +TODO |
| 57 | + |
| 58 | +## TODOs |
| 59 | +- [ ] Finish README. |
| 60 | +- [ ] Explain `aid`. |
| 61 | +- [ ] Add option to directly start from GAF file. |
| 62 | +- [ ] The user should be able to select options for GraphAligner. |
| 63 | +- [ ] Add usage examples for _`minigraph`_, _`cactus`_, and _`SibeliaZ`_. |
| 64 | +- [ ] Add Dockerfile. |
| 65 | +- [ ] Add a CI building the Dockerfile and emitting evaluation metrics for all tools using `HLA-Zoo` data. |
| 66 | +- [ ] Should _`pgge`_ accept several files as input and output the results in one file? |
| 67 | +- [ ] Add output-folder option. |
| 68 | +- [ ] Integrate into nf-core/pangenome pipeline. |
0 commit comments