is a deep learning framework for inferring triplet tree topology from three unaligned nucleotide sequences.
The DefIn is developed on python 3.6 and PyTorch 1.9. This tool is tested on Linux based system. The prerequisites are as follows,
- Python 3.6
- Numpy 1.19
- scikit-learn 0.24.2
- scipy 1.5.4
- BioPython 1.79
- Json 0.1.1
- PyTorch 1.9
- CudaToolkit 10.2.89
- torchvision 0.10.0
As an alternative, a conda environment, dphy.yml
, is also provided in this package.
DeePhy takes Genomic Footprint (GFP) of triplet data. Hence, GFP of nucleotide sequence is required to derive. The tool for deriving GFP a nucleotide sequence is provided in the following link,
Create the ground truth of the triplet tree(s). Three OTUs are one-hot encoded. The siblings and outgroup are denoted by 0 and 1, respectively.
This step is used to partition entire dataset into training, validation, and test data. For only testing with the trained model, set training and validation as blank list.
Based on the training, validation, and testing partition DeePhy executes prediction of triplet topology.
python --dataset [location of dataset] --subdir [name of subdirectory]
dataset path
subdirectory of dataset, e.g. GFP
number of data loading workers (default=16)
batch size (default=512)
the program uses CUDA
output folder
path to saved model