-
Notifications
You must be signed in to change notification settings - Fork 3
Subcommands
Mumemto currently has three subcommands, representing downstream analyses. See details below for each module.
By default, mumemto
runs the main match finding algorithm. Passing in a module name as the first argument runs that specific module. As such mumemto -h
shows flags relevant to the main algorithm (see Command Line Parameters).
mumemto viz
runs the visualization module. Passing in the output prefix of a mumemto
run will run a multi-MUM synteny visualization:
mumemto viz -i <output-prefix> -o plot.png
Note
Currently visualization only works for multi-MUMs and partial multi-MUMs.
Use mumemto viz -h
for customizable flags.
Important flags to note:
-
-f
can pass in a list of names for each sequence, which display as y-axis labels for the plot. Names should not include a dot (.
) or slash (/
) (or they will be interpreted as paths and the filename will appear instead). -
-l
controls the minimum multi-MUM length to appear in the figure -
--interactive
: turns on interactive mode. Using plotly, the output synteny plot is pan-able and zoom-able for interactive visualization. Note this might be slow for large datasets. -
--no-coll-block
: this flag will turn of collinear blocking. Blocking groups consecutive multi-MUMs that are collinear into a synteny block and plots this as a single block. This singificantly reduces rendering overhead in many cases. However, this may remove multi-MUMs in some cases (though these may be spurious anyways). Turning off collinear blocking may make--interactive
mode prohibitively slow. -
--max-gap-len
: the maximum gap length allowable between collinear multi-MUMs to consider them a single block. By default this is calculated such that the gap between multi-MUMs is less than a pixel in the rendered plot. -
[new in v1.2]
--mode
: one of three options for visualization mode.normal
concatenates any sequences in an input FASTA file into a single row.delineated
adds separators between contigs/sequences in the original FASTA to separate distinct sequences along the row.gapped
will split the visualization into multiple windows (similar to the plot in the README), aligning corresponding sequences in a FASTA. This is particular useful for chromosome-split visualizations of full assemblies
Note
--mode gapped
only works if the number of sequences is identical in each input FASTA. We recommend ordering the chromosomes, one per sequence entry, in the same order in each FASTA if you run mumemto
on full assemblies. This will result in orderly chromosome-split visualization of the entire pangenome.
This module converts between *.mum
and *.bumbl
(a binary format) files. See Input and Output Files for more info on filetypes. mumemto convert
can be run as follows:
### convert to binary format (by default [prefix].bumbl)
mumemto convert -m [prefix].mums
### convert back to human-readable
mumemto convert -b [prefix].bumbl > [prefix].mums # or | less
This module computes collinear blocks for a set of multi-MUMs. It takes either *.mums
or *.bumbl
input, and stores the collinear blocks a new output file, as an annotation:
mumemto collinear -m [prefix].mums -o blocked.mums
In *.mums
format, the collinear annotations are added as a new column, indicating for each multi-MUM row, which block it belongs to. If a multi-MUM row is annotated with "-", it is "non-collinear", and is omitted from visualization and other downstream analyses. The parameters for determining collinear blocks can be changed here, and saved blocks are used in other modules (rather than re-computing).
Mumemto can also identify large inversion polymorphisms present in the dataset using multi-MUM blocks. This is accomplished by identifying collinear blocks of multi-MUMs that are in reverse order on the opposite strand.
Note
Inversions are reported when present relative to the first sequence. In this case, the first sequence is treated as a reference, and inversions are defined relative to the reference.
mumemto inversion -i <output-prefix>
The output lists the approximate start and end offsets of the inversion region. These are approximate as they correspond to the first and last multi-MUM present in the inverted region, which may be a few bases off from the true corrdinates.
If the assemblies represent ordered and oriented contigs using a reference-based scaffolder like RagTag, mumemto can report inversions that are close to the boundary of contig breakpoints (potentially indicating incorrect orientation). To do this, pass a list of paths to the *.agp
output file with --agp-filelist
, that defines the order of contigs (assuming the input sequence is concatenated in the same order, with or without N gaps). Also, pass in the chromosome number --chr
(indicating which part of the *.agp
file to use).
This module computes the multi-MUM coverage over each input sequence. To run, pass in the output prefix:
mumemto coverage -i <output-prefix>
This outputs a coverage (as a percentage of sequence length) of non-overlapping multi-MUMs. i.e. a position covered by multiple multi-MUMs is counted only once.
If there are any questions or suggestions, please submit a github issue or contact me at vshivak1 [at] jhu.edu.