examples/2-score_variants at master · kipoi/examples

Name	Name	Last commit message	Last commit date
parent directory ..
input	input
1-load-visualize.ipynb	1-load-visualize.ipynb
2-veff-python.ipynb	2-veff-python.ipynb
README.md	README.md
Snakefile	Snakefile

Setup

Make sure have the environemnt setup (../README.md) and that you have extracted the fasta file:

zcat ../1-predict/input/hg19.chr22.fa.gz > ../1-predict/input/hg19.chr22.fa.

Tutorial

1. Run `kipoi veff score_variants` for a single model

Have a look at the clinvar_20180429.pathogenic.chr22.vcf.gz

zless -S input/clinvar_20180429.pathogenic.chr22.vcf.gz

This file contains genetic variants from the ClinVar database. We filtered the original ClinVar VCF file to chromosome 22 and included only pathogenic variants.

Let's score the impact of these genetic variants to different molecular phenotypes (e.g. TF-factor binding affinity or DNA accessibility) using models in Kipoi.

First, activate the right environment:

source activate `kipoi env get DeepBind`
mkdir -p output

Next, run model predictions for sequences containing the reference allele, the alternative allele and also write the difference between model predictions to a file:

kipoi veff score_variants DeepBind/Homo_sapiens/TF/D00328.018_ChIP-seq_CTCF \
   --dataloader_args='{"fasta_file": "input/hg19.chr22.fa"}' \
   -i input/clinvar_20180429.pathogenic.chr22.vcf.gz \
   -s ref alt diff \
   -o /tmp/annotated.vcf

Let's investigate the results

less -S /tmp/annotated.vcf

As you can see, new entries were added to the INFO field of the vcf:

##INFO=<ID=KV:kipoi:DeepBind/Homo_sapiens/TF/D00328.018_ChIP_seq_CTCF:REF,...
##INFO=<ID=KV:kipoi:DeepBind/Homo_sapiens/TF/D00328.018_ChIP_seq_CTCF:ALT,...
##INFO=<ID=KV:kipoi:DeepBind/Homo_sapiens/TF/D00328.018_ChIP_seq_CTCF:DIFF,...
##INFO=<ID=KV:kipoi:DeepBind/Homo_sapiens/TF/D00328.018_ChIP_seq_CTCF:rID,...

REF, ALT or DIFF correspond to different scoring functions specified with -s ref alt diff.

Let's write the scores to a tsv file instead of the vcf:

kipoi veff score_variants DeepBind/Homo_sapiens/TF/D00328.018_ChIP-seq_CTCF \
   --dataloader_args='{"fasta_file": "input/hg19.chr22.fa"}' \
   -i input/clinvar_20180429.pathogenic.chr22.vcf.gz \
   -s ref alt diff \
   -e /tmp/annotated.tsv

less -S /tmp/annotated.tsv

Your turn

Use the Basset model to run model predictions.
Write only the predictions for A549 of the Basset model. Hint: use kipoi veff score_variants --help and --model_outputs.
Run variant effect predictions from python:

import kipoi_veff.snv_predict as sp
sp.score_variants(model='Basset',
                  dl_args={'fasta_file': 'input/hg19.chr22.fa'},
                  input_vcf='input/clinvar_20180429.pathogenic.chr22.vcf.gz',
                  output_vcf='/tmp/py-annotated.vcf')

2. Run `kipoi veff score_variants` for multiple models using Snakemake

Now, let's run model predictions in parallel. We'll use Snakemake for this.

First, explore the Snakefile.

Next, run:

snakemake -j 5

This will run variant effect prediction for many different models. -j 5 runs 5 jobs in parallel.

4. Load and analyze predictions in python

Now that we have the predictions scored under output/, let's load them into python, join them into a table and do a simple analysis. Go through the load-visualize.ipynb notebook.

Links

Next step: 3-interpret

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Setup

Tutorial

1. Run `kipoi veff score_variants` for a single model

Your turn

2. Run `kipoi veff score_variants` for multiple models using Snakemake

4. Load and analyze predictions in python

Links

FilesExpand file tree

2-score_variants

Directory actions

More options

Directory actions

More options

Latest commit

History

2-score_variants

Folders and files

parent directory

README.md

Setup

Tutorial

1. Run kipoi veff score_variants for a single model

Your turn

2. Run kipoi veff score_variants for multiple models using Snakemake

4. Load and analyze predictions in python

Links

1. Run `kipoi veff score_variants` for a single model

2. Run `kipoi veff score_variants` for multiple models using Snakemake