-
Notifications
You must be signed in to change notification settings - Fork 3
Subcommands
Mumemto currently has three subcommands, representing downstream analyses. See details below for each module.
By default, mumemto
runs the main match finding algorithm. Passing in a module name as the first argument runs that specific module. As such mumemto -h
shows flags relevant to the main algorithm (see Command Line Parameters).
mumemto viz
runs the visualization module. Passing in the output prefix of a mumemto
run will run a multi-MUM synteny visualization:
mumemto viz -i <output-prefix> -o plot.png
Note
Currently visualization only works for multi-MUMs and partial multi-MUMs.
Use mumemto viz -h
for customizable flags.
Important flags to note:
-
-f
can pass in a list of names for each sequence, which display as y-axis labels for the plot. Names should not include a dot (.
) or slash (/
) (or they will be interpreted as paths and the filename will appear instead). -
-l
controls the minimum multi-MUM length to appear in the figure -
--interactive
: turns on interactive mode. Using plotly, the output synteny plot is pan-able and zoom-able for interactive visualization. Note this might be slow for large datasets. -
--no-coll-block
: this flag will turn of collinear blocking. Blocking groups consecutive multi-MUMs that are collinear into a synteny block and plots this as a single block. This singificantly reduces rendering overhead in many cases. However, this may remove multi-MUMs in some cases (though these may be spurious anyways). Turning off collinear blocking may make--interactive
mode prohibitively slow. -
--max-gap-len
, the maximum gap length allowable between collinear multi-MUMs to consider them a single block. By default this is calculated such that the gap between multi-MUMs is less than a pixel in the rendered plot.
This module converts between *.mum
and *.bumbl
(a binary format) files. See Input and Output Files for more info on filetypes. mumemto convert
can be run as follows:
### convert to binary format (by default [prefix].bumbl)
mumemto convert -m [prefix].mums
### convert back to human-readable
mumemto convert -b [prefix].bumbl > [prefix].mums # or | less
This module computes collinear blocks for a set of multi-MUMs. It takes either *.mums
or *.bumbl
input, and stores the collinear blocks a new output file, as an annotation:
mumemto collinear -m [prefix].mums -o blocked.mums
In *.mums
format, the collinear annotations are added as a new column, indicating for each multi-MUM row, which block it belongs to. If a multi-MUM row is annotated with "-", it is "non-collinear", and is omitted from visualization and other downstream analyses. The parameters for determining collinear blocks can be changed here, and saved blocks are used in other modules (rather than re-computing).
Mumemto can also identify large inversion polymorphisms present in the dataset using multi-MUM blocks. This is accomplished by identifying collinear blocks of multi-MUMs that are in reverse order on the opposite strand.
Note
Inversions are reported when present relative to the first sequence. In this case, the first sequence is treated as a reference, and inversions are defined relative to the reference.
mumemto inversion -i <output-prefix>
The output lists the approximate start and end offsets of the inversion region. These are approximate as they correspond to the first and last multi-MUM present in the inverted region, which may be a few bases off from the true corrdinates.
If the assemblies represent ordered and oriented contigs using a reference-based scaffolder like RagTag, mumemto can report inversions that are close to the boundary of contig breakpoints (potentially indicating incorrect orientation). To do this, pass a list of paths to the *.agp
output file with --agp-filelist
, that defines the order of contigs (assuming the input sequence is concatenated in the same order, with or without N gaps). Also, pass in the chromosome number --chr
(indicating which part of the *.agp
file to use).
This module computes the multi-MUM coverage over each input sequence. To run, pass in the output prefix:
mumemto coverage -i <output-prefix>
This outputs a coverage (as a percentage of sequence length) of non-overlapping multi-MUMs. i.e. a position covered by multiple multi-MUMs is counted only once.
If there are any questions or suggestions, please submit a github issue or contact me at vshivak1 [at] jhu.edu.