Skip to content

Commit

Permalink
Merge pull request #49 from CCBR/feat_sage
Browse files Browse the repository at this point in the history
Feat sage
  • Loading branch information
dnousome authored May 30, 2024
2 parents 76043b0 + 5d83d8e commit ea50c67
Show file tree
Hide file tree
Showing 15 changed files with 308 additions and 201 deletions.
38 changes: 29 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,17 +35,17 @@ LOGAN supports either
LOGAN supports inputs of either
1) paired end fastq files

`--fastq_input`- A glob can be used to include all FASTQ files. Like `--fastq_input "*R{1,2}.fastq.gz"`. Globbing requires quotes
`--fastq_input`- A glob can be used to include all FASTQ files. Like `--fastq_input "*R{1,2}.fastq.gz"`. Globbing requires quotes.

2) Pre aligned BAM files with BAI indices

`--bam_input`- A glob can be used to include all FASTQ files. Like `--bam_input "*.bam"`. Globbing requires quotes
`--bam_input`- A glob can be used to include all FASTQ files. Like `--bam_input "*.bam"`. Globbing requires quotes.

3) A sheet that indicates the sample name and either FASTQs or BAM file locations

`--fastq_file_input`- A headerless tab delimited sheet that has the sample name, R1, and R2 file locations

`--bam_file_input` - A headerless tab delimited sheet that has the sample name, bam and bai file locations
`--bam_file_input` - A headerless tab delimited sheet that has the sample name, bam, and bam index (bai) file locations

### Operating Modes

Expand All @@ -64,30 +64,50 @@ No flags are required

Adding flags determines SNV (germline and/or somatic), SV, and/or CNV calling modes

`--vc`- Enables somatic SNV calling using mutect2, vardict, varscan, octopus, MUSE (TN only), and lofreq (TN only)
`--vc`- Enables somatic SNV calling using mutect2, vardict, varscan, octopus, sage, MUSE (TN only), and lofreq (TN only)


`--germline`- Enables germline using DV

`--sv`- Enables somatic SV calling using Manta and SVABA

`--vc`- Enables somatic CNV calling using FREEC, Sequenza, and Purple (hg38 only)
`--cnv`- Enables somatic CNV calling using FREEC, Sequenza, and Purple (hg38 only)



#### Optional Arguments
`--indelrealign` - Enables indel realignment when running alignment steps. May be helpful for certain callers (VarScan, VarDict)

`--callers`- Comma separated argument for callers, the default is to use all available. Example: `--callers mutect2,octopus,vardict,varscan`
`--callers`- Comma separated argument for callers, the default is to use all available.
Example: `--callers mutect2,octopus`

`--cnvcallers`- - Comma separated argument for cnvcallers. Adding flag allows only certain callers to run.
Example: `--cnvcallers purple`


## Running LOGAN
Example of Tumor_Normal calling mode
```bash
# copy the logan config files to your current directory
logan init
# preview the logan jobs that will run
logan run --mode local -profile ci_stub --genome hg38 --sample_sheet samplesheet.tsv --outdir out --fastq_input "*R{1,2}.fastq.gz" -preview --vc --sv --cnv
# run a stub/dryrun of the logan jobs
logan run --mode local -profile ci_stub --genome hg38 --sample_sheet samplesheet.tsv --outdir out --fastq_input "*R{1,2}.fastq.gz" -stub --vc --sv --cnv
# launch a logan run on slurm with the test dataset
logan run --mode slurm -profile biowulf,slurm --genome hg38 --sample_sheet samplesheet.tsv --outdir out --fastq_input "*R{1,2}.fastq.gz" --vc --sv --cnv
```

Example of Tumor only calling mode
```bash
# copy the logan config files to your current directory
logan init
# preview the logan jobs that will run
logan run --mode local -profile ci_stub --genome hg38 --outdir out --fastq_input "*R{1,2}.fastq.gz" -preview --vc --sv --cnv
logan run --mode local -profile ci_stub --genome hg38 --outdir out --fastq_input "*R{1,2}.fastq.gz" --callers octopus,mutect2 -preview --vc --sv --cnv
# run a stub/dryrun of the logan jobs
logan run --mode local -profile ci_stub --genome hg38 --outdir out --fastq_input "*R{1,2}.fastq.gz" -stub --vc --sv --cnv
logan run --mode local -profile ci_stub --genome hg38 --outdir out --fastq_input "*R{1,2}.fastq.gz" --callers octopus,mutect2 -stub --vc --sv --cnv
# launch a logan run on slurm with the test dataset
logan run --mode slurm -profile biowulf,slurm --genome hg38 --outdir out --fastq_input "*R{1,2}.fastq.gz" --vc --sv --cnv
logan run --mode slurm -profile biowulf,slurm --genome hg38 --outdir out --fastq_input "*R{1,2}.fastq.gz" --callers octopus,mutect2 --vc --sv --cnv
```

We currently support the hg38, hg19 (in progress), and mm10 genomes.
Expand Down
17 changes: 2 additions & 15 deletions bin/flowcell_lane.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,19 +40,6 @@ def usage(message = '', exitcode = 0):
sys.exit(exitcode)


def reader(fname):
"""Returns correct file object handler or reader for gzipped
or non-gzipped FastQ files based on the file extension. Assumes
gzipped files endwith the '.gz' extension.
"""
if fname.endswith('.gz'):
# Opens up file with gzip handler
return gzip.open
else:
# Opens up file normal, uncompressed handler
return open


def get_flowcell_lane(sequence_identifer):
"""Returns flowcell and lane information for different fastq formats.
FastQ files generated with older versions of Casava or downloaded from
Expand Down Expand Up @@ -130,10 +117,10 @@ def md5sum(filename, blocksize = 65536):
md5 = md5sum(filename)

# Get Flowcell and Lane information
handle = reader(filename)
handle = gzip.open if filename.endswith('.gz') else open
meta = {'flowcell': [], 'lane': [], 'flowcell_lane': []}
i = 0 # keeps track of line number
with handle(filename, 'r') as file:
with handle(filename, 'rt') as file:
print('sample_name\ttotal_read_pairs\tflowcell_ids\tlanes\tflowcell_lanes\tmd5_checksum')
for line in file:
line = line.strip()
Expand Down
Empty file modified bin/split_Bed_into_equal_regions.py
100644 → 100755
Empty file.
27 changes: 18 additions & 9 deletions conf/genomes.config
Original file line number Diff line number Diff line change
Expand Up @@ -28,15 +28,18 @@ params {
octopus_gforest= "--forest /data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/octopus/germline.v0.7.4.forest"
SEQUENZAGC = "/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/SEQUENZA/hg38_gc50Base.txt.gz"
chromosomes = ['chr1','chr2','chr3','chr4','chr5','chr6','chr7','chr8','chr9','chr10','chr11','chr12','chr13','chr14','chr15','chr16','chr17','chr18','chr19','chr20','chr21','chr22','chrX','chrY','chrM']
//HMFTOOLS
GENOMEVER = "38"
HOTSPOTS = "-hotspots /data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/hmftools/v5_34/ref/38/variants/KnownHotspots.somatic.38.vcf.gz"
PANELBED = "-panel_bed /data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/hmftools/v5_34/ref/38/variants/ActionableCodingPanel.38.bed.gz"
HCBED = "-high_confidence_bed /data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/hmftools/v5_34/ref/38/variants/HG001_GRCh38_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_nosomaticdel_noCENorHET7.bed.gz"
ENSEMBLCACHE = "-ensembl_data_dir /data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/hmftools/v5_34/ref/38/common/ensembl_data"
//PURPLE
GERMLINEHET = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/PURPLE/GermlineHetPon.38.vcf.gz"
GCPROFILE = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/PURPLE/GC_profile.1000bp.38.cnp"
DIPLODREG = '/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/PURPLE/DiploidRegions.38.bed.gz'
ENSEMBLCACHE = '/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/PURPLE/ensembl_data/'
DRIVERS = '/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/PURPLE/DriverGenePanel.38.tsv'
HOTSPOTS = '/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/PURPLE/KnownHotspots.somatic.38.vcf.gz'

}
GERMLINEHET = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/hmftools/v5_34/ref/38/copy_number/AmberGermlineSites.38.tsv.gz"
GCPROFILE = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/hmftools/v5_34/ref/38/copy_number/GC_profile.1000bp.38.cnp"
DIPLODREG = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/hmftools/v5_34/ref/38/copy_number/DiploidRegions.38.bed.gz"
DRIVERS = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/hmftools/v5_34/ref/38/common/DriverGenePanel.38.tsv"
}

'hg19' {
genome = "/data/CCBR_Pipeliner/db/PipeDB/lib/hg19.with_extra.fa"
Expand Down Expand Up @@ -65,8 +68,14 @@ params {
octopus_gforest= "" //"--forest /data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/octopus/germline.v0.7.4.forest"
SEQUENZAGC = "/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/SEQUENZA/hg38_gc50Base.txt.gz"
chromosomes = ['chr1','chr2','chr3','chr4','chr5','chr6','chr7','chr8','chr9','chr10','chr11','chr12','chr13','chr14','chr15','chr16','chr17','chr18','chr19','chr20','chr21','chr22','chrX','chrY','chrM']
//HMFTOOLS
GENOMEVER = "37"
HOTSPOTS = "-hotspots /data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/hmftools/v5_34/ref/38/variants/KnownHotspots.38.vcf.gz"
PANELBED = "-panel_bed /data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/hmftools/v5_34/ref/38/variants/ActionableCodingPanel.38.bed.gz"
HCBED = "-high_confidence_bed /data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/hmftools/v5_34/ref/38/variants/HG001_GRCh38_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_nosomaticdel_noCENorHET7.bed.gz"
ENSEMBLCACHE = "-ensembl_data_dir /data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/hmftools/v5_34/ref/38/common/ensembl_data"
//PURPLE
GERMLINEHET = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/PURPLE/GermlineHetPon.38.vcf.gz"
GERMLINEHET = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/hmftools/v5_34/ref/38/copy_number/AmberGermlineSites.38.tsv.gz"
GCPROFILE = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/PURPLE/GC_profile.1000bp.38.cnp"
DIPLODREG = '/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/PURPLE/DiploidRegions.38.bed.gz'
ENSEMBLCACHE = '/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/PURPLE/ensembl_data/'
Expand Down
59 changes: 0 additions & 59 deletions docker/lofreq/Dockerfile

This file was deleted.

11 changes: 0 additions & 11 deletions docker/lofreq/build.sh

This file was deleted.

82 changes: 58 additions & 24 deletions docker/logan_base/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -20,16 +20,18 @@ WORKDIR /opt2
RUN apt-get update \
&& apt-get -y upgrade \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y \
bc
bc \
openjdk-17-jdk

# Common bioinformatics tools
# bwa/0.7.17-4 bowtie/1.2.3 bowtie2/2.3.5.1
# bedtools/2.27.1 bedops/2.4.37 samtools/1.10
# bcftools/1.10.2 vcftools/0.1.16
# Previous tools already installed trimmomatic/0.39 tabix/1.10.2
# Previous tools already installed tabix/1.10.2 trimmomatic/0.39
RUN DEBIAN_FRONTEND=noninteractive apt-get install -y \
tabix \
trimmomatic
tabix \
libhts-dev


# Install BWA-MEM2 v2.2.1
RUN wget https://github.com/bwa-mem2/bwa-mem2/releases/download/v2.2.1/bwa-mem2-2.2.1_x64-linux.tar.bz2 \
Expand All @@ -44,13 +46,17 @@ RUN wget https://github.com/biod/sambamba/releases/download/v0.8.1/sambamba-0.8.
&& mv /opt2/sambamba-0.8.1-linux-amd64-static /opt2/sambamba \
&& chmod a+rx /opt2/sambamba

# Install GATK4 (GATK/4.3.0.0)
# Requires Java8 or 1.8
RUN wget https://github.com/broadinstitute/gatk/releases/download/4.3.0.0/gatk-4.3.0.0.zip \
&& unzip /opt2/gatk-4.3.0.0.zip \
&& rm /opt2/gatk-4.3.0.0.zip \
&& /opt2/gatk-4.3.0.0/gatk --list
ENV PATH="/opt2/gatk-4.3.0.0:$PATH"
# Install GATK4 (GATK/4.4.0.0)
# Requires Java17
RUN wget https://github.com/broadinstitute/gatk/releases/download/4.4.0.0/gatk-4.4.0.0.zip \
&& unzip /opt2/gatk-4.4.0.0.zip \
&& rm /opt2/gatk-4.4.0.0.zip \
&& /opt2/gatk-4.4.0.0/gatk --list
ENV PATH="/opt2/gatk-4.4.0.0:$PATH"

# Use DISCVRSeq For CombineVariants Replacement
RUN wget https://github.com/BimberLab/DISCVRSeq/releases/download/1.3.62/DISCVRSeq-1.3.62.jar
ENV DISCVRSeq_JAR="/opt2/DISCVRSeq-1.3.62.jar"

# Install last release of GATK3 (GATK/3.8-1)
# Only being used for the CombineVariants
Expand Down Expand Up @@ -168,29 +174,57 @@ RUN wget https://github.com/AstraZeneca-NGS/VarDictJava/releases/download/v1.8.3
ENV PATH="/opt2/VarDict-1.8.3/bin:$PATH"

# Fastp From Opengene github
RUN wget http://opengene.org/fastp/fastp.0.23.2 \
RUN wget http://opengene.org/fastp/fastp.0.23.4 \
&& mkdir fastp \
&& mv fastp.0.23.2 fastp/fastp \
&& mv fastp.0.23.4 fastp/fastp \
&& chmod a+x fastp/fastp
ENV PATH="/opt2/fastp:$PATH"

# HMFtools for PURPLE/COBALT/AMBER
RUN wget https://github.com/hartwigmedical/hmftools/releases/download/amber-v3.9/amber-3.9.jar \
&& wget https://github.com/hartwigmedical/hmftools/releases/download/cobalt-v1.15.1/cobalt_v1.15.1.jar \
&& wget https://github.com/hartwigmedical/hmftools/releases/download/purple-v3.9/purple_v3.9.jar \
&& mkdir hmftools \
&& mv amber-3.9.jar hmftools/amber.jar \
&& mv cobalt_v1.15.1.jar hmftools/cobalt.jar \
&& mv purple_v3.9.jar hmftools/purple.jar \
&& chmod a+x hmftools/amber.jar
ENV PATH="/opt2/hmftools:$PATH"
# ASCAT
RUN Rscript -e 'devtools::install_github("VanLoo-lab/ascat/ASCAT")'

# SvABA
RUN wget -O svaba_1.2.0 https://github.com/walaj/svaba/releases/download/v1.2.0/svaba \
&& mkdir svaba \
&& mv svaba_1.2.0 svaba/svaba
&& mv svaba_1.2.0 svaba/svaba \
&& chmod a+x svaba/svaba

ENV PATH="/opt2/svaba:$PATH"

# LOFREQ
RUN git clone https://github.com/CSB5/lofreq \
&& cd /opt2/lofreq \
&& ./bootstrap \
&& ./configure --prefix=/opt2/lofreq/ \
&& make \
&& make install

ENV PATH="/opt2/lofreq/bin:$PATH"

# MUSE
RUN wget -O muse_2.0.4.tar.gz https://github.com/wwylab/MuSE/archive/refs/tags/v2.0.4.tar.gz \
&& tar -xzf muse_2.0.4.tar.gz \
&& cd MuSE-2.0.4 \
&& ./install_muse.sh \
&& mv MuSE /opt2/ \
&& chmod a+x /opt2/MuSE \
&& rm -R /opt2/MuSE-2.0.4 \
&& rm /opt2/muse_2.0.4.tar.gz

ENV PATH="/opt2/MuSE:$PATH"

# HMFtools for PURPLE/COBALT/AMBER
RUN wget https://github.com/hartwigmedical/hmftools/releases/download/amber-v4.0/amber-4.0.jar \
&& wget https://github.com/hartwigmedical/hmftools/releases/download/cobalt-v1.16/cobalt_v1.16.jar \
&& wget https://github.com/hartwigmedical/hmftools/releases/download/purple-v4.0/purple_v4.0.jar \
&& wget https://github.com/hartwigmedical/hmftools/releases/download/sage-v3.4/sage_v3.4.jar \
&& mkdir hmftools \
&& mv amber-4.0.jar hmftools/amber.jar \
&& mv cobalt_v1.16.jar hmftools/cobalt.jar \
&& mv purple_v4.0.jar hmftools/purple.jar \
&& mv sage.v3.4.jar hmftools/sage.jar \
&& chmod a+x hmftools/amber.jar
ENV PATH="/opt2/hmftools:$PATH"

# Add Dockerfile and argparse.bash script
# and export environment variables
Expand Down
12 changes: 7 additions & 5 deletions docker/logan_base/build.sh
Original file line number Diff line number Diff line change
@@ -1,14 +1,17 @@

# Build image
#docker buildx create --platform linux/amd64 --use
#docker buildx use upbeat_ganguly
#docker buildx inspect upbeat_ganguly
#docker buildx build --platform linux/amd64 -f Dockerfile -t dnousome/ccbr_logan_base:v0.3.0 -t dnousome/ccbr_logan_base:latest --push .

docker build --platform linux/amd64 --tag ccbr_logan_base:v0.3.0 -f Dockerfile .
docker tag ccbr_logan_base:v0.3.0 dnousome/ccbr_logan_base:v0.3.0
docker tag ccbr_logan_base:v0.3.0 dnousome/ccbr_logan_base
docker build --platform linux/amd64 --tag ccbr_logan_base:v0.3.5 -f Dockerfile .

docker tag ccbr_logan_base:v0.3.5 dnousome/ccbr_logan_base:v0.3.5
docker tag ccbr_logan_base:v0.3.5 dnousome/ccbr_logan_base

docker push dnousome/ccbr_logan_base:v0.3.0

docker push dnousome/ccbr_logan_base:v0.3.5
docker push dnousome/ccbr_logan_base:latest


Expand All @@ -21,4 +24,3 @@ docker push dnousome/ccbr_logan_base:latest
# Push image to DockerHub
#docker push nciccbr/ccbr_wgs_base:v0.1.0
#docker push nciccbr/ccbr_wgs_base:latest

2 changes: 1 addition & 1 deletion docker/logan_base/meta.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
dockerhub_namespace: dnousome
image_name: ccbr_logan_base
version: v0.3.4
version: v0.3.5
container: "$(dockerhub_namespace)/$(image_name):$(version)"
Loading

0 comments on commit ea50c67

Please sign in to comment.