Skip to content

Subcommands

Vikram Shivakumar edited this page Feb 21, 2025 · 5 revisions

Mumemto currently has three subcommands, representing downstream analyses. See details below for each module.

By default, mumemto runs the main match finding algorithm. Passing in a module name as the first argument runs that specific module. As such mumemto -h shows flags relevant to the main algorithm (see Command Line Parameters).

Visualization

mumemto viz runs the visualization module. Passing in the output prefix of a mumemto run will run a multi-MUM synteny visualization:

mumemto viz -i <output-prefix> -o plot.png

Note

Currently visualization only works for multi-MUMs and partial multi-MUMs.

Use mumemto viz -h for customizable flags.

Important flags to note:

  • -f can pass in a list of names for each sequence, which display as y-axis labels for the plot. Names should not include a dot (.) or slash (/) (or they will be interpreted as paths and the filename will appear instead).

  • -l controls the minimum multi-MUM length to appear in the figure

  • --interactive: turns on interactive mode. Using plotly, the output synteny plot is pan-able and zoom-able for interactive visualization. Note this might be slow for large datasets.

  • --no-coll-block: this flag will turn of collinear blocking. Blocking groups consecutive multi-MUMs that are collinear into a synteny block and plots this as a single block. This singificantly reduces rendering overhead in many cases. However, this may remove multi-MUMs in some cases (though these may be spurious anyways). Turning off collinear blocking may make --interactive mode prohibitively slow.

  • --max-gap-len: the maximum gap length allowable between collinear multi-MUMs to consider them a single block. By default this is calculated such that the gap between multi-MUMs is less than a pixel in the rendered plot.

  • [new in v1.2] --mode: one of three options for visualization mode. normal concatenates any sequences in an input FASTA file into a single row. delineated adds separators between contigs/sequences in the original FASTA to separate distinct sequences along the row. gapped will split the visualization into multiple windows (similar to the plot in the README), aligning corresponding sequences in a FASTA. This is particular useful for chromosome-split visualizations of full assemblies

Note

--mode gapped only works if the number of sequences is identical in each input FASTA. We recommend ordering the chromosomes, one per sequence entry, in the same order in each FASTA if you run mumemto on full assemblies. This will result in orderly chromosome-split visualization of the entire pangenome.

Convert

This module converts between *.mum and *.bumbl (a binary format) files. See Input and Output Files for more info on filetypes. mumemto convert can be run as follows:

### convert to binary format (by default [prefix].bumbl)
mumemto convert -m [prefix].mums
### convert back to human-readable
mumemto convert -b [prefix].bumbl > [prefix].mums # or | less

Collinear

This module computes collinear blocks for a set of multi-MUMs. It takes either *.mums or *.bumbl input, and stores the collinear blocks a new output file, as an annotation:

mumemto collinear -m [prefix].mums -o blocked.mums

In *.mums format, the collinear annotations are added as a new column, indicating for each multi-MUM row, which block it belongs to. If a multi-MUM row is annotated with "-", it is "non-collinear", and is omitted from visualization and other downstream analyses. The parameters for determining collinear blocks can be changed here, and saved blocks are used in other modules (rather than re-computing).

Inversions

Mumemto can also identify large inversion polymorphisms present in the dataset using multi-MUM blocks. This is accomplished by identifying collinear blocks of multi-MUMs that are in reverse order on the opposite strand.

Note

Inversions are reported when present relative to the first sequence. In this case, the first sequence is treated as a reference, and inversions are defined relative to the reference.

mumemto inversion -i <output-prefix>

The output lists the approximate start and end offsets of the inversion region. These are approximate as they correspond to the first and last multi-MUM present in the inverted region, which may be a few bases off from the true corrdinates.

If the assemblies represent ordered and oriented contigs using a reference-based scaffolder like RagTag, mumemto can report inversions that are close to the boundary of contig breakpoints (potentially indicating incorrect orientation). To do this, pass a list of paths to the *.agp output file with --agp-filelist, that defines the order of contigs (assuming the input sequence is concatenated in the same order, with or without N gaps). Also, pass in the chromosome number --chr (indicating which part of the *.agp file to use).

Coverage

This module computes the multi-MUM coverage over each input sequence. To run, pass in the output prefix:

mumemto coverage -i <output-prefix>

This outputs a coverage (as a percentage of sequence length) of non-overlapping multi-MUMs. i.e. a position covered by multiple multi-MUMs is counted only once.