Identify 24-locus MIRU-VNTR for Mycobacterium tuberculosis complex (MTBC) directly from long reads generated by Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio). Also work on assembled genome.
- Linux
- primersearch from EMBOSS
- install from the official website or
- install via conda
conda install -c bioconda emboss
- Ensure the primersearch command is in your device's environment path, where primersearch program can be executed directly by typing
primersearch
on the commandline
- pandas
- can be installed via conda
conda install pandas
or via PyPIpip install pandas
- can be installed via conda
- statistics
- can be installed via PyPI
pip install statistics
- can be installed via PyPI
git clone https://github.com/phglab/MIRUReader.git
- Added a check to ensure primersearch is executable prior to MIRUReader program execution
- Updated documentation to the README
- Update output format for option '--details'.
- Auto convert fastq to fasta.
For one sample analysis:
python /your/path/to/MIRUReader.py -r sample.fasta -p sampleID > miru.txt
For multiple samples analysis:
-
Create a mapping file (mappingFile.txt) that looks like:
sample_001.fasta sample_001
sample_002.fasta sample_002
... -
Then run the program:
cat mappingFile.txt | while read -a line; do python /your/path/to/MIRUReader.py -r ${line[0]} -p ${line[1]}; done > miru.multiple.txt
sample_prefix 0154 0424 0577 0580 0802 0960 1644 1955 2059 2163b 2165 2347 2401 2461 2531 2687 2996 3007 3171 3192 3690 4052 4156 4348
sample_001 2 4 4 2 3 3 3 2 2 5 4 4 4 2 5 1 6 3 3 5 3 7 2 3
Notes:
- The program is compatible to Python 2 and Python 3.
- Accepted reads file format includes '.fastq', '.fastq.gz', '.fasta', and '.fasta.gz'.
- The program output is a tab-delimited plain text which can be copied to or opened in Excel spreadsheet.
Main options | Description |
---|---|
-r READS | Input reads file in fastq/fasta format, can be gzipped or not gzipped |
-p PREFIX | Sample ID required for naming output file. |
--table TABLE | Allele calling table, default is MIRU_table. Can be user-defined in fixed format. However, providing custom allele calling table for other VNTR is not tested. |
--primers PRIMERS | Primers sequences, default is MIRU_primers. Can be user-defined in fixed format. |
Optional options | Description |
---|---|
--amplicons | Use output from primersearch ("prefix.18.primersearch.out") and summarize MIRU profile directly. |
--details | This option is for further inspection. It displays details of repeat count for each loci with total mismatch error in the primer sequences alignment. |
--nofasta | Delete fasta file generated if your input read is in fastq format. |
- Why are there two MIRU allele calling tables (MIRU_table and MIRU_table_0580)?
MIRU loci 0580 (MIRU_table_0580) consist of a different numbering system for determination of repeat numbers as compared to the other 23 MIRU locus (MIRU_table) for MTBC isolates.
- If an error message
OSError: primersearch is not found.
appears, please ensure yourprimersearch
executable file is in your environment path (echo $PATH
) and can be called directly.