Scripts for the analysis of extending sequences of small RNA. This was written for Lastik et al. (2025). Please cite the following paper if you use this:
Lastik, D., Kounosu, A., Dayi, M. et al. Small non-coding RNAs have predicted roles in reproductive biology and transposable element regulation in the parasitic worm Strongyloides venezuelensis. Sci Rep 15, 20608 (2025). https://doi.org/10.1038/s41598-025-01968-2
These scripts use the python libraries anytree
, biopython
, numpy
and matplotlib
. These can be installed using pip:
$ pip3 install anytree biopython numpy matplotlib
To use you need a file of small RNA sequences in either FASTA or FASTQ format. Firstly, to collapse and analyse them run them though the trieCollapse.py
script as follows:
$ python3 trieCollapse.py <input-file>
This will produce the file collapsedSequences.fasta
which contains information about which sequences are shorter versions of others in the header of the file. To extract the longest chain from each one and produce various plot with the information about these sequences, run the script analyseExtendingSequences.py
on this with the command:
$ python3 analyseExtendingSequences.py collapsedSequences.fasta
Note trieCollapse.py
has a few other filtering options not described here. Use python3 trieCollapse.py --help
to see them.