Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TAD_pathways #19

Open
wants to merge 46 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 45 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
f0b585d
Add files via upload
Colin-Cheng Jul 5, 2018
a28cf0d
Delete generate_index_files.py
Colin-Cheng Jul 5, 2018
4007048
Add files via upload
Colin-Cheng Jul 5, 2018
624cea3
Add files via upload
Colin-Cheng Jul 6, 2018
6003eba
add the TAD_boundary argument
Colin-Cheng Jul 6, 2018
ad0eeec
Update build_custom_tad_genelist.py
Colin-Cheng Jul 6, 2018
38caa54
Update build_custom_tad_genelist.py
Colin-Cheng Jul 11, 2018
19c6d5f
update data directory
Colin-Cheng Jul 12, 2018
9dc1f02
add tables directory
Colin-Cheng Jul 12, 2018
74b0476
Delete example_pipeline_bmd.sh
Colin-Cheng Jul 12, 2018
4c2fa01
Delete example_pipeline_t2d.sh
Colin-Cheng Jul 12, 2018
3e536d2
Delete bmd_gene_evidence.csv
Colin-Cheng Jul 12, 2018
c76577b
Delete bmd_gene_evidence_summary.tsv
Colin-Cheng Jul 12, 2018
e3655f9
Delete t2d_gene_evidence.csv
Colin-Cheng Jul 12, 2018
bc5b510
Delete t2d_gene_evidence_summary.tsv
Colin-Cheng Jul 12, 2018
178411e
Delete venn_bmd.tiff
Colin-Cheng Jul 12, 2018
2e9267e
Delete venn_t2d.tiff
Colin-Cheng Jul 12, 2018
dcb14b5
Delete bmd_LD_complete_gestalt.tsv
Colin-Cheng Jul 12, 2018
9ea7284
Delete bmd_LD_gestalt.tsv
Colin-Cheng Jul 12, 2018
aceee43
Delete bmd_LD_pvalues.tsv
Colin-Cheng Jul 12, 2018
c47a888
Delete bmd_complete_gestalt.tsv
Colin-Cheng Jul 12, 2018
fa036a8
Delete bmd_gestalt.tsv
Colin-Cheng Jul 12, 2018
4711a4a
Delete bmd_nearest_gene_complete_gestalt.tsv
Colin-Cheng Jul 12, 2018
2830904
Delete bmd_nearest_gene_gestalt.tsv
Colin-Cheng Jul 12, 2018
5a46c16
Delete bmd_nearest_gene_pvalues.tsv
Colin-Cheng Jul 12, 2018
5eab148
Delete bmd_pvalues.tsv
Colin-Cheng Jul 12, 2018
091deea
Delete t2d_complete_gestalt.tsv
Colin-Cheng Jul 12, 2018
f91fbff
Delete t2d_gestalt.tsv
Colin-Cheng Jul 12, 2018
591b08f
Delete t2d_pvalues.tsv
Colin-Cheng Jul 12, 2018
74f7924
add index directory
Colin-Cheng Jul 13, 2018
5561a9d
Add output files from visualize
Colin-Cheng Jul 13, 2018
e19aad4
add output form generate_index_files.py
Colin-Cheng Jul 13, 2018
c109aa0
Add files via upload
Colin-Cheng Jul 13, 2018
6c165c6
Add output file from visualize
Colin-Cheng Jul 13, 2018
f225acb
add generate_index_file and visualize
Colin-Cheng Jul 13, 2018
8a53ef1
Update build_custom_tad_genelist.py
Colin-Cheng Jul 13, 2018
cc38837
add run_pipeline.sh
Colin-Cheng Jul 17, 2018
101e8c8
Update README.md
Colin-Cheng Jul 17, 2018
74c353a
Update README.md
Colin-Cheng Jul 17, 2018
445da96
Add files via upload
Colin-Cheng Jul 17, 2018
250f701
Update README.md
Colin-Cheng Jul 17, 2018
2a76ac1
Update README.md
Colin-Cheng Jul 17, 2018
285ed4d
Update README.md
Colin-Cheng Jul 17, 2018
8c8fc74
Update README.md
Colin-Cheng Jul 17, 2018
9103347
Update README.md
Colin-Cheng Jul 18, 2018
eca3507
Update build_custom_tad_genelist.py
Colin-Cheng Dec 18, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 20 additions & 45 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ experimental validation at
First, clone the repository and navigate into the top directory:

```bash
git clone git@github.com:greenelab/tad_pathways_pipeline.git
git clone https://github.com/marislab/tad_pathways_pipeline.git
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not an acceptable change to the master pipeline. This change can live in the marislab fork.

cd tad_pathways_pipeline
```

Expand All @@ -53,75 +53,50 @@ from an existing GWAS or the custom pipeline example for insight on how to run

### Examples

We provide three different examples for a TAD pathways analysis pipeline. To run
each of the analyses:
We provide an example for a TAD pathways analysis pipeline. To run this example:

```bash
source activate tad_pathways

# Example using Bone Mineral Density GWAS
bash example_pipeline_bmd.sh

# Example using Type 2 Diabetes GWAS
bash example_pipeline_t2d.sh

# Example using custom input SNPs
bash example_pipeline_custom.sh
```

### General Usage

There are two ways to implement a TAD_Pathways analysis:
To perform a `TAD_Pathways` analysis, uses need to spicify 3 inputs:

1. GWAS
2. Custom
1. name of the tad cell:

#### GWAS
E.g.: 'hESC'

To perform a `TAD_Pathways` analysis on publicly available GWAS results, simply
browse the `data/gwas_catalog/` directory to select a valid GWAS file. These
files contain a curation of all significant SNPs mapped to specific traits as
distributed by the [NHGRI-EBI GWAS Catalog](https://www.ebi.ac.uk/gwas/).
2. path to the TAD domain file:

Each file in this directory is a tab separated text file of genome-wide
significant SNPs and their genomic location along with their reported nearest
gene and associated PUBMED id. For complete information on how these files were
constructed, refer to https://github.com/greenelab/tad_pathways.
The TAD domain file is a 3-column tab-separated bed file. The first column is the chromsome number. The second column is the start position of the tad. And the third position is the end position of the tad.

Each GWAS has 3 associated files, including files in `data/gwas_catalog/`. The
other files are located in `data/gwas_tad_snps/` and `data/gwas_tad_genes/`.
All files are important for performing a `TAD_Pathways` analysis. See the
GWAS example files for instructions on how to implement the necessary scripts.
E.g.: [`hESC_domains_hg19.bed`](hESC_domains_hg19.bed)

#### Custom
3. path to the SNPs file

To perform a `TAD_Pathways` analysis on a list of custom SNPs, generate a comma
separated text file. The first row of the text file should have group names and
subsequent rows should list the rs numbers of interest. There can be many
columns with variable length rows.
The SNPs file is a comma separated text file. The first row of the text file should have group names and
subsequent rows should list the rs numbers of interest. There can be manycolumns with variable length rows.

E.g.: [`custom_example.csv`](custom_example.csv)
E.g.: [`custom_example.csv`](custom_example.csv)

| Group 1 | Group 2 |
| ------- | ------- |
| rs12345 | rs67891 |
| rs19876 | rs54321 |
| ... | ... |
| Group 1 | Group 2 |
| ------- | ------- |
| rs12345 | rs67891 |
| rs19876 | rs54321 |
| ... | ... |

Then, perform the following steps:

```bash
source activate tad_pathways

# Map custom SNPs to genomic locations
Rscript --vanilla scripts/build_snp_list.R \
--snp_file "custom_example.csv" \
--output_file "mapped_results.tsv"

# Build TAD based genelists for each group
python scripts/build_custom_TAD_genelist.py \
--snp_data_file "mapped_results.tsv" \
--output_file "custom_tad_genelist.tsv"
bash run_pipeline.sh --TAD-Boundary hESC \
--TAD-File hESC_domains_hg19.bed \
--SNP-File custom_example.csv
```

The output of these steps are Group specific text files with all genes in TADs
Expand Down
31 changes: 0 additions & 31 deletions example_pipeline_bmd.sh

This file was deleted.

14 changes: 13 additions & 1 deletion example_pipeline_custom.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
set -o errexit

# Define filenames
tad_file='data/hESC_domains_hg19.bed'
candidate_snp_file='custom_example.csv'
candidate_snp_location_file='results/custom_example_location.tsv'
candidate_snp_tad_file='results/custom_example_tad_results.tsv'
Expand All @@ -14,6 +15,16 @@ trait='custom'
evidence_file='results/custom_gene_evidence.csv'
pathway_p_values_file='gestalt/custom_pvals.tsv'

# Generate index files (maps to TAD identifiers to enable fast lookup)
# 1000G SNP / genes / repeat elements
python scripts/generate_index_files.py --TAD-Boundary 'hESC' --TAD-File $tad_file

# Visualize SNPs and Genes in TADs
# Output histograms and line graphs of SNP/Gene/Repeat locations in TADs
# and gc content distribution across human and mouse tads
python scripts/visualize_genomic_elements.py --TAD-Boundary 'hESC'
python scripts/visualize_gc_and_divergence.py --TAD-Boundary 'hESC' --TAD-File $tad_file

# Map SNPs to genomic location
Rscript --vanilla scripts/build_snp_list.R \
--snp_file $candidate_snp_file \
Expand All @@ -22,7 +33,8 @@ Rscript --vanilla scripts/build_snp_list.R \
# Build a customized genelist based on SNP locations
python scripts/build_custom_tad_genelist.py \
--snp_data_file $candidate_snp_location_file \
--output_file $candidate_snp_tad_file
--output_file $candidate_snp_tad_file \
--TAD-Boundary 'hESC'

# Perform WebGestalt pathway analysis and parse results
Rscript --vanilla scripts/webgestalt_run.R \
Expand Down
33 changes: 0 additions & 33 deletions example_pipeline_t2d.sh

This file was deleted.

Binary file added figures/alu_divergence_hg19_hESC.pdf
Binary file not shown.
Binary file added figures/gc_distribution_hg19_hESC.pdf
Binary file not shown.
Binary file added figures/gene_count_hg19_hESC.pdf
Binary file not shown.
Binary file not shown.
Binary file added figures/gene_types_hg19_hESC.pdf
Binary file not shown.
Binary file added figures/repeat_count_hg19_hESC.pdf
Binary file not shown.
Binary file added figures/repeat_divergence_hg19_hESC.pdf
Binary file not shown.
Binary file added figures/repeat_type_all_distrib_hg19_hESC.pdf
Binary file not shown.
Binary file added figures/repeat_type_hg19_hESC_.pdf
Binary file not shown.
Binary file added figures/snp_count_hg19_hESC.pdf
Binary file not shown.
Binary file added figures/snp_tad_distrib_chromosomes_hg19_hESC.pdf
Binary file not shown.
Binary file added figures/snp_tad_distribution_hg19_hESC.pdf
Binary file not shown.
Loading