Skip to content

Subcommands

Vikram Shivakumar edited this page Feb 21, 2025 · 5 revisions

Mumemto currently has three subcommands, representing downstream analyses. See details below for each module.

By default, mumemto runs the main match finding algorithm. Passing in a module name as the first argument runs that specific module. As such mumemto -h shows flags relevant to the main algorithm (see Command Line Parameters).

Visualization

mumemto viz runs the visualization module. Passing in the output prefix of a mumemto run will run a multi-MUM synteny visualization:

mumemto viz -i <output-prefix> -o plot.png

Note

Currently visualization only works for multi-MUMs and partial multi-MUMs.

Use mumemto viz -h for customizable flags.

Important flags to note:

  • -f can pass in a list of names for each sequence, which display as y-axis labels for the plot. Names should not include a dot (.) or slash (/) (or they will be interpreted as paths and the filename will appear instead).

  • -l controls the minimum multi-MUM length to appear in the figure

  • --interactive: turns on interactive mode. Using plotly, the output synteny plot is pan-able and zoom-able for interactive visualization. Note this might be slow for large datasets.

  • --no-coll-block: this flag will turn of collinear blocking. Blocking groups consecutive multi-MUMs that are collinear into a synteny block and plots this as a single block. This singificantly reduces rendering overhead in many cases. However, this may remove multi-MUMs in some cases (though these may be spurious anyways). Turning off collinear blocking may make --interactive mode prohibitively slow.

  • --max-gap-len, the maximum gap length allowable between collinear multi-MUMs to consider them a single block. By default this is calculated such that the gap between multi-MUMs is less than a pixel in the rendered plot.

Convert

This module converts between *.mum and *.bumbl (a binary format) files. See Input and Output Files for more info on filetypes. mumemto convert can be run as follows:

### convert to binary format (by default [prefix].bumbl)
mumemto convert -m [prefix].mums
### convert back to human-readable
mumemto convert -b [prefix].bumbl > [prefix].mums # or | less

Collinear

This module computes collinear blocks for a set of multi-MUMs. It takes either *.mums or *.bumbl input, and stores the collinear blocks a new output file, as an annotation:

mumemto collinear -m [prefix].mums -o blocked.mums

In *.mums format, the collinear annotations are added as a new column, indicating for each multi-MUM row, which block it belongs to. If a multi-MUM row is annotated with "-", it is "non-collinear", and is omitted from visualization and other downstream analyses. The parameters for determining collinear blocks can be changed here, and saved blocks are used in other modules (rather than re-computing).

Inversions

Mumemto can also identify large inversion polymorphisms present in the dataset using multi-MUM blocks. This is accomplished by identifying collinear blocks of multi-MUMs that are in reverse order on the opposite strand.

Note

Inversions are reported when present relative to the first sequence. In this case, the first sequence is treated as a reference, and inversions are defined relative to the reference.

mumemto inversion -i <output-prefix>

The output lists the approximate start and end offsets of the inversion region. These are approximate as they correspond to the first and last multi-MUM present in the inverted region, which may be a few bases off from the true corrdinates.

If the assemblies represent ordered and oriented contigs using a reference-based scaffolder like RagTag, mumemto can report inversions that are close to the boundary of contig breakpoints (potentially indicating incorrect orientation). To do this, pass a list of paths to the *.agp output file with --agp-filelist, that defines the order of contigs (assuming the input sequence is concatenated in the same order, with or without N gaps). Also, pass in the chromosome number --chr (indicating which part of the *.agp file to use).

Coverage

This module computes the multi-MUM coverage over each input sequence. To run, pass in the output prefix:

mumemto coverage -i <output-prefix>

This outputs a coverage (as a percentage of sequence length) of non-overlapping multi-MUMs. i.e. a position covered by multiple multi-MUMs is counted only once.

Clone this wiki locally