Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

README.md

Setup

Make sure have the environemnt setup (../README.md).

Tutorial

1. Quick start

Explore kipoi.org/groups

In this step, we'll navigate to https://kipoi.org, select the model and run the example.

Task:

  • choose a model of interest (Example: Basset)
  • Run the commands for the CLI (all except kipoi env install)

Notes:

2. Run kipoi predict on new data

Let's try to run model prediction on new data. We'll use enhancer regions in H1-hESC as annotated by ChromHMM (download link).

Unzip the fasta file Linux

zcat input/hg19.chr22.fa.gz > input/hg19.chr22.fa

OSX

gzcat input/hg19.chr22.fa.gz > input/hg19.chr22.fa

Activate the right environment

source activate `kipoi env get DeepBind`

Run prediction

kipoi predict DeepBind/Homo_sapiens/TF/D00328.018_ChIP-seq_CTCF \
   --dataloader_args='{"intervals_file": "input/enhancer-regions.hg19.chr22.bed.gz",
                       "fasta_file": "input/hg19.chr22.fa"}' \
  -o preds.tsv

Investigate the output:

less preds.tsv

Skim through the help of kipoi predict: kipoi predict --help. Further things to try:

  • try out different file formats:
    • output.h5
    • output.bed
  • Use a different batch size
    • --batch_size=256
  • Try using multiple workers in parallel
    • -n 8
  • (Optional) Use a GPU
    • install a new GPU environment by adding --gpu to kipoi env create
    • Run kipoi predict as before

3. Run kipoi predict for multiple models using Snakemake

Now, let's run model predictions in parallel. We'll use Snakemake for this.

First, explore the Snakefile.

Next, run:

snakemake

This will run model prediction for many different models.

4. Load and analyze predictions in python

Now that we have the predictions scored under output/, let's load them into python, join them into a table and perform a very simple analysis. Go through the load-visualize.ipynb notebook.

Links

Next step: 2-score_variants