forked from igordot/sns
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
36 changed files
with
1,333 additions
and
91 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# Route: atac | ||
|
||
ATAC-seq using Bowtie and MACS. | ||
|
||
Segments: | ||
|
||
* Align to the reference genome (Bowtie2). | ||
* Remove duplicate reads (Sambamba). | ||
* Generate genome browser tracks. | ||
* Call peaks (MACS). | ||
* Call nucleosomes (NucleoATAC). | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Route: rna-snv | ||
|
||
Variant detection in RNA-seq data. | ||
Can be run following `rna-star`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# Route: rna-star-groups-dge | ||
|
||
Differential gene expression using DESeq2 for the `rna-star` results. | ||
|
||
## Usage | ||
|
||
After individual samples are processed with the `rna-star` route, | ||
manually define proper group names in the `samples.groups.csv` sample sheet. | ||
|
||
Run `rna-star-groups-dge` route from the same directory as `rna-star`. | ||
|
||
``` | ||
sns/run rna-star-groups-dge | ||
``` | ||
|
||
## Output | ||
|
||
The `rna-star-groups-dge` route will create a `DGE-DESeq2-*` directory with the results. The name will contain the strand (determined automatically) and the number of samples in the sample sheet. The sample sheet can be modified to exclude problematic samples or change groupings for alternate analysis. | ||
|
||
Results: | ||
|
||
* `counts.raw.csv`: Matrix of raw counts. | ||
* `counts.norm.csv`: Matrix of normalized counts that can be used to check the expression levels of specific genes across samples. | ||
* `counts.norm.xlsx`: Matrix of normalized counts in Excel format to avoid potential auto-conversion of gene names. | ||
* `counts.vst.csv`: Matrix of counts after variance stabilizing transformation (VST) for clustering samples or other machine learning applications. These are log2-transformed and normalized with respect to library size. The point of VST is to remove the dependence of the variance on the mean. | ||
* `plot.pca.png`: PCA plot that shows the samples based on their first two principal components. Useful for visualizing the overall effect of experimental covariates and batch effects. | ||
* `dge.*`: Differential gene expression results between different groups. | ||
* `plot.heatmap.*`: Heatmaps based on differentially expressed genes using multiple cutoffs. | ||
|
||
Additional output: | ||
|
||
* `input.groups.csv`: Input sample sheet. | ||
* `input.counts.txt`: Input gene-sample matrix of raw counts. | ||
* `deseq2.dds.RData`: DESeq2 object (dds) that can be loaded and modified in R if more complex analysis is needed. | ||
* `deseq2.vsd.RData`: VST-transformed DESeq2 object (vsd) that can be loaded and modified in R if more complex analysis is needed. | ||
|
||
General pipeline info: https://github.com/igordot/sns |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
# Route: rna-star | ||
|
||
Alignment and quantification of RNA-seq data. | ||
|
||
Segments: | ||
|
||
* Align to the reference genome (STAR). | ||
* Align to other species and common contaminants (fastq_screen). | ||
* Generate normalized genome browser tracks. | ||
* Determine the distribution of the bases within the transcripts and 5'/3' biases (Picard). | ||
* Determine if the library is stranded and the strand orientation. | ||
* Generate genes-samples counts matrix (featureCounts). | ||
|
||
For differential expression analysis, follow by running `rna-star-groups-dge`. | ||
|
||
## Usage | ||
|
||
Navigate to a clean new project directory. | ||
|
||
``` | ||
cd <project dir> | ||
``` | ||
|
||
Download the code from GitHub. | ||
|
||
``` | ||
git clone --depth 1 https://github.com/igordot/sns | ||
``` | ||
|
||
Generate a sample sheet of FASTQ files (`samples.fastq-raw.csv`). | ||
|
||
``` | ||
sns/gather-fastqs <fastq dir> | ||
``` | ||
|
||
Specify a reference genome, such as `hg19` or `mm10` (stored in `settings.txt`). | ||
|
||
``` | ||
sns/generate-settings <genome> | ||
``` | ||
|
||
Run `rna-star` route. | ||
|
||
``` | ||
sns/run rna-star | ||
``` | ||
|
||
Check for potential problems. | ||
|
||
``` | ||
grep "ERROR:" logs-qsub/* | ||
``` | ||
|
||
## Output | ||
|
||
Results: | ||
|
||
* `BAM-STAR`: BAM files. Can be used for visual inspection of individual reads or additional analysis. | ||
* `BIGWIG`: BigWig files normalized to the total number of reads. Can be used for visual inspection of relative expression levels. | ||
* `quant.featurecounts.counts.txt`: Matrix of raw counts for all genes and samples. | ||
|
||
Run metrics: | ||
|
||
* `summary-combined.rna-star.csv`: Summary table that includes the number of reads, unique and multi-mapping alignment rate, number of counts assigned to genes, fraction of coding/UTR/intronic/intergenic bases. | ||
* `summary.fastqscreen.png`: Alignment rates for common species and contaminants. | ||
* `summary.qc-picard-rnaseqmetrics.png`: Distribution of the bases within the transcripts to determine potential 5'/3' biases. | ||
|
||
Additional output (can usually be deleted or used for troubleshooting): | ||
|
||
* `logs-*`: Logs and intermediate files for various segments. | ||
* `samples.*.csv`: Sample sheet for segments that generate large files. The route will not attempt to generate the files listed. If the files were deleted to save space, additional samples can be added to the same analysis without reprocessing the older samples. | ||
* `summary`: Summary files for individual samples and segments. | ||
* `summary.*.csv`: Combined summary files for each segment. | ||
* `QC-*`: Results of QC steps for individual samples. | ||
* `FASTQ-CLEAN`: Merged FASTQs (one per sample). | ||
* `genes.featurecounts.txt`: Table of genes based on the reference GTF. | ||
* `quant-*`: Raw counts for all genes for individual samples. | ||
|
||
General pipeline info: https://github.com/igordot/sns |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.