diff --git a/Docs/QUARS_logo.png b/Docs/QUARS_logo.png new file mode 100644 index 0000000..5c5a016 Binary files /dev/null and b/Docs/QUARS_logo.png differ diff --git a/Docs/multiqc_report_paired.html b/Docs/QUARS_paired.html similarity index 100% rename from Docs/multiqc_report_paired.html rename to Docs/QUARS_paired.html diff --git a/Docs/multiqc_report_single.html b/Docs/QUARS_single.html similarity index 82% rename from Docs/multiqc_report_single.html rename to Docs/QUARS_single.html index 2698183..ec9ddda 100644 --- a/Docs/multiqc_report_single.html +++ b/Docs/QUARS_single.html @@ -24,7 +24,7 @@ @@ -999,6 +999,9 @@ width: 120px; margin: 0 8px; } +#fastqc_seq_heatmap_key_pos { + width: 80px; +} .fastqc_seq_heatmap_key > span, .fastqc_seq_heatmap_key div > span { display:inline-block; width: 60px; @@ -1006,7 +1009,8 @@ #fastqc_seq_heatmap_key_t { border-bottom: 1px solid red; } #fastqc_seq_heatmap_key_c { border-bottom: 1px solid blue; } #fastqc_seq_heatmap_key_a { border-bottom: 1px solid green; } -#fastqc_seq_heatmap_key_g { border-bottom: 1px solid black; } +#fastqc_seq_heatmap_key_g { border-bottom: 1px solid black; } + +} + +

FastQC is a quality control tool for high throughput sequence data, written by Simon Andrews at the Babraham Institute in Cambridge.

@@ -5851,7 +5881,7 @@

@@ -6040,7 +6070,7 @@

but doesn't appear at the start of the file for some reason could be missed by this module.

-
2 samples had less than 1% of reads made up of overrepresented sequences
+
16 samples had less than 1% of reads made up of overrepresented sequences

@@ -6098,7 +6128,7 @@

- MultiQC v1.6.dev0 + MultiQC v1.7.dev0 - Written by Phil Ewels, available on GitHub. diff --git a/Docs/report_QUARS.html b/Docs/report_QUARS.html new file mode 100644 index 0000000..67fea55 --- /dev/null +++ b/Docs/report_QUARS.html @@ -0,0 +1,995 @@ + + + + + + + + + + + [furious_bell] Nextflow Workflow Report + + + + + + + +
+
+ +

Nextflow workflow report

+

[furious_bell] (resumed run)

+ + +
+ Workflow execution completed successfully! +
+ + +
+
Run times
+
+ Fri Sep 14 09:31:51 COT 2018 - Fri Sep 14 09:33:29 COT 2018 + (duration: 1m 38s) +
+ +
+
+
  2 succeeded  
+
  31 cached  
+
  0 ignored  
+
  0 failed  
+
+
+ +
Nextflow command
+
nextflow run QUARS --fastq_files ./*.fastq.gz --cpus 16 -resume
+
+ +
+
CPU-Hours
+
2.2 (98.9% cached)
+ +
Launch directory
+
/home/tain/Documents/RNA_seq/santos/Raw_files
+ +
Work directory
+
/home/tain/Documents/RNA_seq/santos/Raw_files/work
+ +
Project directory
+
/home/tain/.nextflow/assets/TainVelasco-Luquez/QUARS
+ + +
Script name
+
quars.nf
+ + + +
Script ID
+
275ca572143a96a8d7fb8981869079d3
+ + +
Workflow session
+
3a609b55-661c-4365-8ccd-9d541d845585
+ + +
Workflow repository
+
https://github.com/TainVelasco-Luquez/QUARS.git, revision master (commit hash 1b9ebb3296c81ebeb4410ddbe307490a38fe06f0)
+ + +
Workflow profile
+
standard
+ + + +
Nextflow version
+
version 0.30.2, build 4867 (16-06-2018 17:49 UTC)
+
+
+
+ +
+

Resource Usage

+

These plots give an overview of the distribution of resource usage for each process.

+ +

CPU Usage

+ +
+
+
+
+
+
+
+ +
+ +

Memory Usage

+ +
+
+
+
+
+
+
+
+ +

Job Duration

+ +
+
+
+
+
+
+
+
+ +

Disk I/O

+ +
+
+
+
+
+
+
+
+ +
+

Tasks

+

This table shows information about each task in the workflow. Use the search box on the right + to filter rows for specific values. Clicking headers will sort the table by that value and + scrolling side to side will reveal more columns.

+
+ + +
+
+
+
+
+
+
+ (tasks table omitted because the dataset is too big) +
+ +
+
+ Generated by Nextflow, version 0.30.2 +
+
+ + + + + diff --git a/Docs/timeline_QUARS.html b/Docs/timeline_QUARS.html new file mode 100644 index 0000000..0d0b3e2 --- /dev/null +++ b/Docs/timeline_QUARS.html @@ -0,0 +1,233 @@ + + + + + + + + + + + + +
+

Processes execution timeline

+

+ Launch time:
+ Elapsed time: +

+
+
+ + + + + + \ No newline at end of file diff --git a/README.md b/README.md index d1f2064..1b2548d 100644 --- a/README.md +++ b/README.md @@ -1,15 +1,16 @@ +QUARS_logo + # Welcome to QUARS ~ **QUA**lity control for **R**na_**S**eq QUARS creates a [MultiQC](http://multiqc.info) report out of the [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and [Fastp](https://github.com/OpenGene/fastp) results of both single and paired end RNAseq raw reads (.fq and simmilars). Powered by [Nextflow](https://www.nextflow.io). ## Motivation -If like me, you have waited several hours or days for some RNAseq pipelines to finish, just to realise that the initial quality control step required further trimming! or simply that my raw reads were so bad they need to be excluded from further analysis. Then, like me, you will enjoy QUARS. +If like me, you have waited several hours, or days, for some RNAseq pipelines to finish, just to realise that the initial quality control step required further trimming! Or, simply, that my raw reads were so bad they need to be excluded from further analysis! Then, like me, you will enjoy QUARS. QUARS is an attemp to: -1. Make **only** a quality control of RNAseq raw reads, before the big steps of annotation quantification and differential expression. -2. Run in a parallel/threating environment quality control of RNAseq. +1. Make **only** quality control for RNAseq raw reads, before the big steps of annotation quantification and differential expression. +2. Run in a parallel/threating environment a pre-alignment quality control test for RNAseq. ## Getting Started -So you have decided to use QUARS, here is what you'll need ### Prerequisites QUARS builds upon @@ -20,46 +21,58 @@ And supported for now, - fastQC (< v0.11.7) - multiQC (< v1.6.dev0) +In future releases, Docker will simplify your use of QUARS, as there will no longer be necessary downloading individual packages (*e.g.* FastQC). Only [Nextflow](https://www.nextflow.io) and [Docker](https://www.docker.com) will be required. + ### Installation Integration with [Docker](https://www.docker.com) is in progress. -Thanks to nextflow. installation is not a must, you just have to call it from command line as: +Thanks to nextflow, installation is not a must, you just have to call it from command line as: - nextflow run TainVelasco-Luquez/QUARS + nextflow run TainVelasco-Luquez/QUARS --fastq_files '*.fastq.gz' --cpus 16 or - nextflow run https://github.com/TainVelasco-Luquez/QUARS + nextflow run https://github.com/TainVelasco-Luquez/QUARS --fastq_files '*.fastq.gz' --cpus 16 Alternatively you can clone the repo and run it locally by git clone https://github.com/TainVelasco-Luquez/QUARS - nextflow run TainVelasco-Luquez/quars.nf + nextflow run QUARS/quars.nf --fastq_files '*.fastq.gz' --cpus 16 If you are running on a cluster with [HTCondor](https://research.cs.wisc.edu/htcondor/): - nextflow run TainVelasco-Luquez/QUARS -profile condor + nextflow run TainVelasco-Luquez/QUARS --fastq_files '*.fastq.gz' --cpus 16 -profile condor + +To modify memory, cpus and more options when running in clusters, go to [nextflow.config](https://github.com/TainVelasco-Luquez/QUARS/nextflow.config). ### Typical usage -* For paired end fastq files: +* For single end fastq files: - nextflow run quars.nf --fastq_files 'mydir/*.fastq.gz' + nextflow run QUARS --fastq_files '*.fastq.gz' --cpus 16 - Which produces this [multiqc_report_paired.html](https://github.com/TainVelasco-Luquez/QUARS/blob/master/Docs/multiqc_report_paired.html) + Which produces this [Docs/QUARS_single.html](https://cdn.rawgit.com/TainVelasco-Luquez/QUARS/f2290cb7/Docs/QUARS_single.html) -* For single end fastq files: +* For paired end fastq files: + + nextflow run QUARS --fastq_files '*_{1,2}.fastq.gz' --singleEnd false --cpus 16 - nextflow run quars.nf --fastq_files 'mydir/*_{1,2}.fastq.gz' --singleEnd false + Which produces this [Docs/QUARS_paired.html](https://cdn.rawgit.com/TainVelasco-Luquez/QUARS/f2290cb7/Docs/QUARS_paired.html) - Which produces this [multiqc_report_single.html](https://github.com/TainVelasco-Luquez/QUARS/blob/master/Docs/multiqc_report_single.html) +#### Timeline and report +In addition to the main `QUARS.html` report, QUARS also generates a processess execution timeline (see [Docs/timeline_QUARS.html](https://cdn.rawgit.com/TainVelasco-Luquez/QUARS/f2290cb7/Docs/timeline_QUARS.html))) and an execution report, with a brief summary of the tasks and their consumption of computational resurces (see [Docs/report_QUARS.html](https://cdn.rawgit.com/TainVelasco-Luquez/QUARS/f2290cb7/Docs/report_QUARS.html)). #### Arguments - `--fastq_files` Absolute path to input .fastq data (must be enclosed with single quotes). If no path specified, the default behaviour is search in the current dir for the folder "Data" (_i.e._ "./Data/") - `--singleEnd` Logical indicating whether the files are single ("true". This is the default beahaviour) or paired end ("false"). + - `--fastq_files` Absolute path to input .fastq data (must be enclosed with single quotes). If no path specified, the default behaviour is search in the current dir for the folder "Data" (_i.e._ "./Data/") + - `--singleEnd` Logical indicating whether the files are single ("true". This is the default beahaviour) or paired end ("false"). + - `--cpus` Integer specifying the number of cores to use. Be aware of the limits of your machine. #### Options - `--outdir ` Absolute path to the output data (must be enclosed in quotes). If no path specified, the default behaviour is search in the current dir for the folder "Results" (_i.e._ "./Results/"). Be sure to add the final "/" to the path. - `--cpus` Integer specifying the number of cores to use. Be aware of the limits of your machine. - `-profile condor` Used when in a cluster with the HTCondor executor. For configuration of the HTCondor parameters go to `nextflow.config` and change the required settings. + - `--outdir ` Absolute path to the output data (must be enclosed in quotes). If no path specified, the default behaviour is to create in the current dir the folder "Results" (_i.e._ "./Results/"). + - `-profile condor` Used when in a cluster with the HTCondor executor. For configuration of the HTCondor parameters go to `nextflow.config` and change the required settings. + - `--multiqc_config` Input a `.yaml` file to configure multiqc title, comments, subtitles and more. if no supplied, then QUARS assumes is "./multiqc_config.yaml". Customisable items are fully described in [MultiQC documentation](http://multiqc.info/docs/#customising-reports). + +#### Getting Help + + nextflow run QUARS --help ## Credits @TainVelasco-Luquez diff --git a/multiqc_config.yaml b/multiqc_config.yaml new file mode 100644 index 0000000..f6d3cab --- /dev/null +++ b/multiqc_config.yaml @@ -0,0 +1,9 @@ +# Heading +title: "QUARS ~ QUAlity control for Rna_Seq" +intro_text: "Easy and straightforward pre-alignment quality control for your RNAseq experiments. Let's QUARS!" +custom_logo: "https://cdn.rawgit.com/TainVelasco-Luquez/QUARS/873d7f38/Docs/QUARS_logo.png" +custom_logo_url: "https://github.com/TainVelasco-Luquez/QUARS" +custom_logo_title: "QUARS" +# Output +output_fn_name: QUARS.html +data_dir_name: QUARS_data_from_multiqc diff --git a/nextflow.config b/nextflow.config index 8545699..1d1e344 100644 --- a/nextflow.config +++ b/nextflow.config @@ -12,29 +12,34 @@ Caution note: This code is heavily influenced by NGI-RNAseq by Phil Ewels. // for github manifest { author = '@TainVelasco-Luquez' - description = "QUAlity control for RNA_Seq, a nextflow pipeline" + description = "Pre-alignment QUAlity control for RNA_Seq, a nextflow pipeline" mainScript = 'quars.nf' } /* -Params.arg can be supplied when running the command, and automatically replace the default ones, by ussing the format: --arg value (e.g. nextflow run file.nf --fastq_files '/home/tain/') +Params.arg can be supplied when running the command, and automatically replace the default ones, by ussing the format: --arg value (e.g. nextflow run file.nf --fastq_files '/home/tain/*_{1,2}.fastq') */ params { - fastq_files = './Data/*_{1,2}.fastq' - outdir = './Results/' + fastq_files = "./Data/*_{1,2}.fastq" + outdir = "./Results/" singleEnd = true cpus = 6 + multiqc_config = "$baseDir/multiqc_config.yaml" } // Pipeline instrospection timeline { enabled = true - file = "${params.outdir}/timeline_RNAseqQC.html" + file = "${params.outdir}/timeline_QUARS.html" } report { enabled = true - file = "${params.outdir}/report_RNAseqQC.html" + file = "${params.outdir}/report_QUARS.html" +} +dag { + enabled = true + file = "${params.outdir}/DAG_QUARS.html" } // Minimal nextflow version @@ -53,10 +58,10 @@ profiles { condor { process.executor = 'condor' // How much memory the process is allowed to use - process.memory = '10GB' + process.memory = '15GB' // Number of (logical) CPU required by the process' task process.cpus = 20 // How much local disk storage the process is allowed to use - process.disk = '2 GB' + process.disk = '15 GB' } } diff --git a/quars.nf b/quars.nf index 11ee674..ae4b340 100644 --- a/quars.nf +++ b/quars.nf @@ -44,18 +44,22 @@ def usage() { QUARS creates a MultiQC report out of the FastQC and Fastp results of both single and paired end RNAseq reads. It also saves the Fastp cleaned files to a new dir as .fq.gz. Typical usage: - nextflow run quars.nf --fastq_files 'mydir/*_{1,2}.fastq.gz' --singleEnd false + nextflow run QUARS --fastq_files 'mydir/*_{1,2}.fastq.gz' --singleEnd false --cpus 16 - nextflow run quars.nf --fastq_files 'mydir/*.fastq.gz' + nextflow run QUARS --fastq_files 'mydir/*.fastq.gz' --cpus 16 Mandatory arguments: - --fastq_files Absolute path to input .fastq data (must be enclosed with single quotes). If no path specified, the default behaviour is search in the current dir for the folder "Data" (i.e. "./Data/") + --fastq_files Absolute path to input .fastq data (must be enclosed with single quotes). If no path specified, the default behaviour is search in the current dir for the folder "Data" (i.e. "$baseDir/Data/") --singleEnd Logical indicating whether the files are single ("true". This is the default beahaviour) or paired end ("false"). + --cpus Integer specifying the number of cores to use. Be aware of the limits of your machine. Options: - --outdir Absolute path to the output data (must be enclosed in quotes). If no path specified, the default behaviour is search in the current dir for the folder "Results" (i.e. "./Results/"). Be sure to add the final "/" to the path. - --cpus Integer specifying the number of cores to use. Be aware of the limits of your machine. + --outdir Absolute path to the output data (must be enclosed in quotes). If no path specified, the default behaviour is to create in the current dir the folder "Results" (i.e. "$baseDir/Results/"). + --multiqc_config Input .yaml file to configure multiqc title, comments, subtitles and more. if no supplied, then QUARS assumes is "$baseDir/multiqc_config.yaml". -profile condor Used when in a cluster with the HTCondor executor. For configuration of the HTCondor parameters go to nextflow.config and change the required settings. + + Getting Help + nextflow run QUARS --help """.stripIndent() } @@ -76,7 +80,7 @@ Channel .ifEmpty { error "Cannot find any reads matching: ${params.fastq_files}" } .set { files_QC_ch } -println ("\n Fastp is about to run... \n") +log.info " Fastp is about to run ... " process fastp { @@ -101,11 +105,11 @@ process fastp { if (params.singleEnd == true) { """ - fastp -p -w ${params.cpus} --dont_overwrite -i ${fastq_file[0]} -o ${samplename}_fastp.fastq.gz -h ${samplename}_fastp.html -j ${samplename}_fastp.json + fastp -w ${params.cpus} --dont_overwrite -i ${fastq_file[0]} -o ${samplename}_fastp.fastq.gz -h ${samplename}_fastp.html -j ${samplename}_fastp.json """ } else { """ - fastp -p -w ${params.cpus} --dont_overwrite -i ${fastq_file[0]} -I ${fastq_file[1]} -o ${samplename}_1_fastp.fastq.gz -O ${samplename}_2_fastp.fastq.gz -h ${samplename}_fastp.html -j ${samplename}_fastp.json + fastp -w ${params.cpus} --dont_overwrite --detect_adapter_for_pe -i ${fastq_file[0]} -I ${fastq_file[1]} -o ${samplename}_1_fastp.fastq.gz -O ${samplename}_2_fastp.fastq.gz -h ${samplename}_fastp.html -j ${samplename}_fastp.json """ } } @@ -120,7 +124,7 @@ process fastp { .ifEmpty { error "Cannot find any reads matching: ${params.fastq_files}" } .set { files_QC_2_ch } -println ("\n fastQC is about to run... \n") +log.info " FastQC is about to run ... " process fastQC { tag { fastqc_tag } @@ -156,26 +160,28 @@ process fastQC { * This step has the code structure from https://github.com/SciLifeLab/NGI-RNAseq/blob/master/main.nf all credit is for its authors. */ - println ("\n mutiQC is about to run... \n") + log.info " MutiQC is about to run ... " + multiqc_config = file(params.multiqc_config) process multiQC { - publishDir pattern: "*multiqc_report.html", path: { params.outdir + "multiQC/" }, mode: 'copy' + publishDir pattern: "*QUARS.html", path: { params.outdir }, mode: 'copy' input: file ('fastQC/*') from fastqc_results_ch.collect() file ('fastp/*') from fastp_results_ch.collect() + file multiqc_config output: - file('*multiqc_report.html') + file('*QUARS.html') script: """ - multiqc . -f -m fastqc -m fastp + multiqc . -f --config ${multiqc_config} -m fastqc -m fastp """ } workflow.onComplete { println "\n Pipeline completed at: $workflow.complete" - println "\n Execution status: ${ workflow.success ? 'OK' : 'failed' }" + println "\n Execution status: ${ workflow.success ? 'OK' : 'Failed' }" }