GERBIL is a tool for generative modeling of protein sequence families. It is built upon the deepnet package, created by Nitish Srivastava.
The models are described in thesis.pdf.
Currently the following generative models are supported:
- Restricted Boltzmann Machines
- Deep Boltzmann Machines
NOTE : Currently under development. Usage should be considered experimental.
See INSTALL.txt for installation instructions
Alternatively, you can run on AWS using the Gerbil AMI (see below)
We provide a precreated AMI with all libraries and paths set.
Create a
g2.2xlargeinstance and use the Gerbil specific AMIami-d18ab9e1.Pull the latest gerbil version
$ cd gerbil/ && git pull origin master
Export the following variables:
$ export GERBILPATH=/path/to/gerbil $ export PYTHONPATH=${GERBILPATH}:${PYTHONPATH}First create the datasets. In this example we will consider the PF00240 family
$ cd deepnet/datasets $ wget http://gremlin.bakerlab.org/fasta_2013/PF00240.fas -P /tmp/ $ python create_dataset.py --fastafile /tmp/PF00240.fas --familyname PF00240
To see full range of options available
$ python generate_experiments.py -h
Create a new experiment for learning the parameters of the RBM
$ cd deepnet/experiments $ python generate_experiments.py --start_job_id 1 \ --model rbm --data_dir PF00240Learn the params of the model
$ cd exp1 && ./runall.sh
Calculate imputation error. Job file written out to
./run_in_parallel.sh$ cd deepnet/ $ python impute_parallel_run.py --start_expid 1 --end_expid 1 --model_prefix rbm $ ./run_in_parallel.sh
Create a csv table with results
$ python create_results_csv.py --expid 1 $ less results/imperr_exp1.csv
A DBM needs a RBM to warmstart from. Continuing from exp1 we will create
another experiment exp2 to train a DBM.
Create a new experiment
$ cd deepnet/experiments $ python generate_experiments.py --start_job_id 2 \ --model dbm --data_dir PF00240Copy over the best RBM model in
exp1ontoexp2$ python choose_best_model.py --impute_dir likelihoods/exp1/ \ --model_dir exp1/dbm_models/ --output_dir exp2/dbm_modelsLearn the params of the model
$ cd exp2 && ./runall.sh
Calculate imputation error. Job file written out to
./run_in_parallel.sh$ cd deepnet/ $ python impute_parallel_run.py --start_expid 2 --end_expid 2 --model_prefix dbm $ ./run_in_parallel.sh
Create a csv table with results
$ python create_results_csv.py --expid 2 $ less results/imperr_exp2.csv
Get the best model from an experiment
$ cd deepnet/experiments $ python choose_best_model.py --impute_dir likelihoods/exp1/ \ --model_dir exp1/dbm_models/ --print_only exp1/dbm_models/rbm1_1430616201Extract the params from the best model
$ cd deepnet/ $ python write_model_to_mat.py \ experiments/exp1/dbm_models/rbm1_1430616201 \ /path/to/my/folder/rbm1_1430616201.mat