Make sure have the environemnt setup (../README.md).
In this step, we'll navigate to https://kipoi.org, select the model and run the example.
Task:
- choose a model of interest (Example: Basset)
- Run the commands for the CLI (all except
kipoi env install)
Notes:
- You can use the search bar at the top of https://kipoi.org/groups/ to search by models. Try for example: 'DNA binding'
Let's try to run model prediction on new data. We'll use enhancer regions in H1-hESC as annotated by ChromHMM (download link).
Unzip the fasta file Linux
zcat input/hg19.chr22.fa.gz > input/hg19.chr22.faOSX
gzcat input/hg19.chr22.fa.gz > input/hg19.chr22.faActivate the right environment
source activate `kipoi env get DeepBind`Run prediction
kipoi predict DeepBind/Homo_sapiens/TF/D00328.018_ChIP-seq_CTCF \
--dataloader_args='{"intervals_file": "input/enhancer-regions.hg19.chr22.bed.gz",
"fasta_file": "input/hg19.chr22.fa"}' \
-o preds.tsvInvestigate the output:
less preds.tsvSkim through the help of kipoi predict: kipoi predict --help. Further things to try:
- try out different file formats:
- output.h5
- output.bed
- Use a different batch size
--batch_size=256
- Try using multiple workers in parallel
-n 8
- (Optional) Use a GPU
- install a new GPU environment by adding
--gputokipoi env create - Run
kipoi predictas before
- install a new GPU environment by adding
Now, let's run model predictions in parallel. We'll use Snakemake for this.
First, explore the Snakefile.
Next, run:
snakemakeThis will run model prediction for many different models.
Now that we have the predictions scored under output/, let's load them into python, join them into a table and perform a very simple analysis. Go through the load-visualize.ipynb notebook.
Next step: 2-score_variants