This table shows information about each task in the workflow. Use the search box on the right
+ to filter rows for specific values. Clicking headers will sort the table by that value and
+ scrolling side to side will reveal more columns.
+
+
+
+
+
+
+
+ (tasks table omitted because the dataset is too big)
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/README.md b/README.md
index d1f2064..1b2548d 100644
--- a/README.md
+++ b/README.md
@@ -1,15 +1,16 @@
+
+
# Welcome to QUARS ~ **QUA**lity control for **R**na_**S**eq
QUARS creates a [MultiQC](http://multiqc.info) report out of the [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and [Fastp](https://github.com/OpenGene/fastp) results of both single and paired end RNAseq raw reads (.fq and simmilars). Powered by [Nextflow](https://www.nextflow.io).
## Motivation
-If like me, you have waited several hours or days for some RNAseq pipelines to finish, just to realise that the initial quality control step required further trimming! or simply that my raw reads were so bad they need to be excluded from further analysis. Then, like me, you will enjoy QUARS.
+If like me, you have waited several hours, or days, for some RNAseq pipelines to finish, just to realise that the initial quality control step required further trimming! Or, simply, that my raw reads were so bad they need to be excluded from further analysis! Then, like me, you will enjoy QUARS.
QUARS is an attemp to:
-1. Make **only** a quality control of RNAseq raw reads, before the big steps of annotation quantification and differential expression.
-2. Run in a parallel/threating environment quality control of RNAseq.
+1. Make **only** quality control for RNAseq raw reads, before the big steps of annotation quantification and differential expression.
+2. Run in a parallel/threating environment a pre-alignment quality control test for RNAseq.
## Getting Started
-So you have decided to use QUARS, here is what you'll need
### Prerequisites
QUARS builds upon
@@ -20,46 +21,58 @@ And supported for now,
- fastQC (< v0.11.7)
- multiQC (< v1.6.dev0)
+In future releases, Docker will simplify your use of QUARS, as there will no longer be necessary downloading individual packages (*e.g.* FastQC). Only [Nextflow](https://www.nextflow.io) and [Docker](https://www.docker.com) will be required.
+
### Installation
Integration with [Docker](https://www.docker.com) is in progress.
-Thanks to nextflow. installation is not a must, you just have to call it from command line as:
+Thanks to nextflow, installation is not a must, you just have to call it from command line as:
- nextflow run TainVelasco-Luquez/QUARS
+ nextflow run TainVelasco-Luquez/QUARS --fastq_files '*.fastq.gz' --cpus 16
or
- nextflow run https://github.com/TainVelasco-Luquez/QUARS
+ nextflow run https://github.com/TainVelasco-Luquez/QUARS --fastq_files '*.fastq.gz' --cpus 16
Alternatively you can clone the repo and run it locally by
git clone https://github.com/TainVelasco-Luquez/QUARS
- nextflow run TainVelasco-Luquez/quars.nf
+ nextflow run QUARS/quars.nf --fastq_files '*.fastq.gz' --cpus 16
If you are running on a cluster with [HTCondor](https://research.cs.wisc.edu/htcondor/):
- nextflow run TainVelasco-Luquez/QUARS -profile condor
+ nextflow run TainVelasco-Luquez/QUARS --fastq_files '*.fastq.gz' --cpus 16 -profile condor
+
+To modify memory, cpus and more options when running in clusters, go to [nextflow.config](https://github.com/TainVelasco-Luquez/QUARS/nextflow.config).
### Typical usage
-* For paired end fastq files:
+* For single end fastq files:
- nextflow run quars.nf --fastq_files 'mydir/*.fastq.gz'
+ nextflow run QUARS --fastq_files '*.fastq.gz' --cpus 16
- Which produces this [multiqc_report_paired.html](https://github.com/TainVelasco-Luquez/QUARS/blob/master/Docs/multiqc_report_paired.html)
+ Which produces this [Docs/QUARS_single.html](https://cdn.rawgit.com/TainVelasco-Luquez/QUARS/f2290cb7/Docs/QUARS_single.html)
-* For single end fastq files:
+* For paired end fastq files:
+
+ nextflow run QUARS --fastq_files '*_{1,2}.fastq.gz' --singleEnd false --cpus 16
- nextflow run quars.nf --fastq_files 'mydir/*_{1,2}.fastq.gz' --singleEnd false
+ Which produces this [Docs/QUARS_paired.html](https://cdn.rawgit.com/TainVelasco-Luquez/QUARS/f2290cb7/Docs/QUARS_paired.html)
- Which produces this [multiqc_report_single.html](https://github.com/TainVelasco-Luquez/QUARS/blob/master/Docs/multiqc_report_single.html)
+#### Timeline and report
+In addition to the main `QUARS.html` report, QUARS also generates a processess execution timeline (see [Docs/timeline_QUARS.html](https://cdn.rawgit.com/TainVelasco-Luquez/QUARS/f2290cb7/Docs/timeline_QUARS.html))) and an execution report, with a brief summary of the tasks and their consumption of computational resurces (see [Docs/report_QUARS.html](https://cdn.rawgit.com/TainVelasco-Luquez/QUARS/f2290cb7/Docs/report_QUARS.html)).
#### Arguments
- `--fastq_files` Absolute path to input .fastq data (must be enclosed with single quotes). If no path specified, the default behaviour is search in the current dir for the folder "Data" (_i.e._ "./Data/")
- `--singleEnd` Logical indicating whether the files are single ("true". This is the default beahaviour) or paired end ("false").
+ - `--fastq_files` Absolute path to input .fastq data (must be enclosed with single quotes). If no path specified, the default behaviour is search in the current dir for the folder "Data" (_i.e._ "./Data/")
+ - `--singleEnd` Logical indicating whether the files are single ("true". This is the default beahaviour) or paired end ("false").
+ - `--cpus` Integer specifying the number of cores to use. Be aware of the limits of your machine.
#### Options
- `--outdir ` Absolute path to the output data (must be enclosed in quotes). If no path specified, the default behaviour is search in the current dir for the folder "Results" (_i.e._ "./Results/"). Be sure to add the final "/" to the path.
- `--cpus` Integer specifying the number of cores to use. Be aware of the limits of your machine.
- `-profile condor` Used when in a cluster with the HTCondor executor. For configuration of the HTCondor parameters go to `nextflow.config` and change the required settings.
+ - `--outdir ` Absolute path to the output data (must be enclosed in quotes). If no path specified, the default behaviour is to create in the current dir the folder "Results" (_i.e._ "./Results/").
+ - `-profile condor` Used when in a cluster with the HTCondor executor. For configuration of the HTCondor parameters go to `nextflow.config` and change the required settings.
+ - `--multiqc_config` Input a `.yaml` file to configure multiqc title, comments, subtitles and more. if no supplied, then QUARS assumes is "./multiqc_config.yaml". Customisable items are fully described in [MultiQC documentation](http://multiqc.info/docs/#customising-reports).
+
+#### Getting Help
+
+ nextflow run QUARS --help
## Credits
@TainVelasco-Luquez
diff --git a/multiqc_config.yaml b/multiqc_config.yaml
new file mode 100644
index 0000000..f6d3cab
--- /dev/null
+++ b/multiqc_config.yaml
@@ -0,0 +1,9 @@
+# Heading
+title: "QUARS ~ QUAlity control for Rna_Seq"
+intro_text: "Easy and straightforward pre-alignment quality control for your RNAseq experiments. Let's QUARS!"
+custom_logo: "https://cdn.rawgit.com/TainVelasco-Luquez/QUARS/873d7f38/Docs/QUARS_logo.png"
+custom_logo_url: "https://github.com/TainVelasco-Luquez/QUARS"
+custom_logo_title: "QUARS"
+# Output
+output_fn_name: QUARS.html
+data_dir_name: QUARS_data_from_multiqc
diff --git a/nextflow.config b/nextflow.config
index 8545699..1d1e344 100644
--- a/nextflow.config
+++ b/nextflow.config
@@ -12,29 +12,34 @@ Caution note: This code is heavily influenced by NGI-RNAseq by Phil Ewels.
// for github
manifest {
author = '@TainVelasco-Luquez'
- description = "QUAlity control for RNA_Seq, a nextflow pipeline"
+ description = "Pre-alignment QUAlity control for RNA_Seq, a nextflow pipeline"
mainScript = 'quars.nf'
}
/*
-Params.arg can be supplied when running the command, and automatically replace the default ones, by ussing the format: --arg value (e.g. nextflow run file.nf --fastq_files '/home/tain/')
+Params.arg can be supplied when running the command, and automatically replace the default ones, by ussing the format: --arg value (e.g. nextflow run file.nf --fastq_files '/home/tain/*_{1,2}.fastq')
*/
params {
- fastq_files = './Data/*_{1,2}.fastq'
- outdir = './Results/'
+ fastq_files = "./Data/*_{1,2}.fastq"
+ outdir = "./Results/"
singleEnd = true
cpus = 6
+ multiqc_config = "$baseDir/multiqc_config.yaml"
}
// Pipeline instrospection
timeline {
enabled = true
- file = "${params.outdir}/timeline_RNAseqQC.html"
+ file = "${params.outdir}/timeline_QUARS.html"
}
report {
enabled = true
- file = "${params.outdir}/report_RNAseqQC.html"
+ file = "${params.outdir}/report_QUARS.html"
+}
+dag {
+ enabled = true
+ file = "${params.outdir}/DAG_QUARS.html"
}
// Minimal nextflow version
@@ -53,10 +58,10 @@ profiles {
condor {
process.executor = 'condor'
// How much memory the process is allowed to use
- process.memory = '10GB'
+ process.memory = '15GB'
// Number of (logical) CPU required by the process' task
process.cpus = 20
// How much local disk storage the process is allowed to use
- process.disk = '2 GB'
+ process.disk = '15 GB'
}
}
diff --git a/quars.nf b/quars.nf
index 11ee674..ae4b340 100644
--- a/quars.nf
+++ b/quars.nf
@@ -44,18 +44,22 @@ def usage() {
QUARS creates a MultiQC report out of the FastQC and Fastp results of both single and paired end RNAseq reads. It also saves the Fastp cleaned files to a new dir as .fq.gz.
Typical usage:
- nextflow run quars.nf --fastq_files 'mydir/*_{1,2}.fastq.gz' --singleEnd false
+ nextflow run QUARS --fastq_files 'mydir/*_{1,2}.fastq.gz' --singleEnd false --cpus 16
- nextflow run quars.nf --fastq_files 'mydir/*.fastq.gz'
+ nextflow run QUARS --fastq_files 'mydir/*.fastq.gz' --cpus 16
Mandatory arguments:
- --fastq_files Absolute path to input .fastq data (must be enclosed with single quotes). If no path specified, the default behaviour is search in the current dir for the folder "Data" (i.e. "./Data/")
+ --fastq_files Absolute path to input .fastq data (must be enclosed with single quotes). If no path specified, the default behaviour is search in the current dir for the folder "Data" (i.e. "$baseDir/Data/")
--singleEnd Logical indicating whether the files are single ("true". This is the default beahaviour) or paired end ("false").
+ --cpus Integer specifying the number of cores to use. Be aware of the limits of your machine.
Options:
- --outdir Absolute path to the output data (must be enclosed in quotes). If no path specified, the default behaviour is search in the current dir for the folder "Results" (i.e. "./Results/"). Be sure to add the final "/" to the path.
- --cpus Integer specifying the number of cores to use. Be aware of the limits of your machine.
+ --outdir Absolute path to the output data (must be enclosed in quotes). If no path specified, the default behaviour is to create in the current dir the folder "Results" (i.e. "$baseDir/Results/").
+ --multiqc_config Input .yaml file to configure multiqc title, comments, subtitles and more. if no supplied, then QUARS assumes is "$baseDir/multiqc_config.yaml".
-profile condor Used when in a cluster with the HTCondor executor. For configuration of the HTCondor parameters go to nextflow.config and change the required settings.
+
+ Getting Help
+ nextflow run QUARS --help
""".stripIndent()
}
@@ -76,7 +80,7 @@ Channel
.ifEmpty { error "Cannot find any reads matching: ${params.fastq_files}" }
.set { files_QC_ch }
-println ("\n Fastp is about to run... \n")
+log.info " Fastp is about to run ... "
process fastp {
@@ -101,11 +105,11 @@ process fastp {
if (params.singleEnd == true) {
"""
- fastp -p -w ${params.cpus} --dont_overwrite -i ${fastq_file[0]} -o ${samplename}_fastp.fastq.gz -h ${samplename}_fastp.html -j ${samplename}_fastp.json
+ fastp -w ${params.cpus} --dont_overwrite -i ${fastq_file[0]} -o ${samplename}_fastp.fastq.gz -h ${samplename}_fastp.html -j ${samplename}_fastp.json
"""
} else {
"""
- fastp -p -w ${params.cpus} --dont_overwrite -i ${fastq_file[0]} -I ${fastq_file[1]} -o ${samplename}_1_fastp.fastq.gz -O ${samplename}_2_fastp.fastq.gz -h ${samplename}_fastp.html -j ${samplename}_fastp.json
+ fastp -w ${params.cpus} --dont_overwrite --detect_adapter_for_pe -i ${fastq_file[0]} -I ${fastq_file[1]} -o ${samplename}_1_fastp.fastq.gz -O ${samplename}_2_fastp.fastq.gz -h ${samplename}_fastp.html -j ${samplename}_fastp.json
"""
}
}
@@ -120,7 +124,7 @@ process fastp {
.ifEmpty { error "Cannot find any reads matching: ${params.fastq_files}" }
.set { files_QC_2_ch }
-println ("\n fastQC is about to run... \n")
+log.info " FastQC is about to run ... "
process fastQC {
tag { fastqc_tag }
@@ -156,26 +160,28 @@ process fastQC {
* This step has the code structure from https://github.com/SciLifeLab/NGI-RNAseq/blob/master/main.nf all credit is for its authors.
*/
- println ("\n mutiQC is about to run... \n")
+ log.info " MutiQC is about to run ... "
+ multiqc_config = file(params.multiqc_config)
process multiQC {
- publishDir pattern: "*multiqc_report.html", path: { params.outdir + "multiQC/" }, mode: 'copy'
+ publishDir pattern: "*QUARS.html", path: { params.outdir }, mode: 'copy'
input:
file ('fastQC/*') from fastqc_results_ch.collect()
file ('fastp/*') from fastp_results_ch.collect()
+ file multiqc_config
output:
- file('*multiqc_report.html')
+ file('*QUARS.html')
script:
"""
- multiqc . -f -m fastqc -m fastp
+ multiqc . -f --config ${multiqc_config} -m fastqc -m fastp
"""
}
workflow.onComplete {
println "\n Pipeline completed at: $workflow.complete"
- println "\n Execution status: ${ workflow.success ? 'OK' : 'failed' }"
+ println "\n Execution status: ${ workflow.success ? 'OK' : 'Failed' }"
}