Skip to content
This repository has been archived by the owner on Jan 18, 2020. It is now read-only.

Data Model

bruth edited this page Oct 8, 2014 · 1 revision

The Varify Data Warehouse (VDW) data model can be broken up into two primary groups: variant-related and sample-related data.

VDW ER Diagram

Dump Variants and Annotations

pg_dump --data-only --jobs=4 --no-privileges -dbname=DATABASE_NAME
        -t chromosome \
        -t variant -t variant_type \
        -t variant_effect -t effect -t functional_class -t effect_impact -t effect_region \
        -t evs -t '"1000g"' -t polyphen2 -t sift \
        -t phenotype -t variant_phenotype \
        -t gene -t gene_family -t synonym -t gene_phenotype -t exon -t transcript \
| bzip2 --compress --stdout > vdw_variants.sql.bz2
  • Omit --data-only to include the schema definitions, e.g. CREATE TABLE..
  • Omit --jobs=N to reduce database load (it will take more time however)
  • Remember to specify the database name and any other parameters such as --host

Load Variants and Annotations

Given the output from above, we can decompress the file and pipe it to the psql command

bzip2 --decompress --stdout vdw_variants.sql.bz2 | psql --dbname DATABASE_NAME
Clone this wiki locally