From eb51d5677e73cbe8c22c8a386ef6833dec3f8e87 Mon Sep 17 00:00:00 2001 From: robert-a-forsyth Date: Fri, 28 Nov 2025 16:13:06 +0100 Subject: [PATCH 01/10] snap, version update --- .nf-core.yml | 2 +- CHANGELOG.md | 6 +++++- README.md | 5 ++--- assets/multiqc_config.yml | 2 +- nextflow.config | 2 +- ro-crate-metadata.json | 18 +++++++++--------- tests/default.nf.test.snap | 4 ++-- 7 files changed, 21 insertions(+), 18 deletions(-) diff --git a/.nf-core.yml b/.nf-core.yml index 662ad88c..f15c6cf9 100644 --- a/.nf-core.yml +++ b/.nf-core.yml @@ -39,4 +39,4 @@ template: outdir: . skip_features: - fastqc - version: 1.0.0 + version: 1.1.0dev diff --git a/CHANGELOG.md b/CHANGELOG.md index 0f89d7ad..e1e04cb4 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,7 +3,11 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). -## v1.0.0dev - [date] +## v1.1.0 - [date] + + + +## v1.0.0 - [28 Nov 2025] Initial release of IntGenomicsLab/lrsomatic, created with the [nf-core](https://nf-co.re/) template. diff --git a/README.md b/README.md index cc18d01b..3a577e73 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ # IntGenomicsLab/lrsomatic [![GitHub Actions CI Status](https://github.com/IntGenomicsLab/lrsomatic/actions/workflows/nf-test.yml/badge.svg)](https://github.com/IntGenomicsLab/lrsomatic/actions/workflows/nf-test.yml) -[![GitHub Actions Linting Status](https://github.com/IntGenomicsLab/lrsomatic/actions/workflows/linting.yml/badge.svg)](https://github.com/IntGenomicsLab/lrsomatic/actions/workflows/linting.yml)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.XXXXXXX-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.XXXXXXX) +[![GitHub Actions Linting Status](https://github.com/IntGenomicsLab/lrsomatic/actions/workflows/linting.yml/badge.svg)](https://github.com/IntGenomicsLab/lrsomatic/actions/workflows/linting.yml)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.17751829-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.17751829) [![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com) [![Nextflow](https://img.shields.io/badge/version-%E2%89%A525.04.0-green?style=flat&logo=nextflow&logoColor=white&color=%230DC09D&link=https%3A%2F%2Fnextflow.io)](https://www.nextflow.io/) @@ -159,8 +159,7 @@ If you would like to contribute to this pipeline, please see the [contributing g ## Citations - - +If you use IntGenomicsLab/lrsomatic for your analysis, please cite it using the following doi: [10.5281/zenodo.17751829](https://doi.org/10.5281/zenodo.17751829) An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file. diff --git a/assets/multiqc_config.yml b/assets/multiqc_config.yml index 4705d98d..95c83b7c 100644 --- a/assets/multiqc_config.yml +++ b/assets/multiqc_config.yml @@ -1,5 +1,5 @@ report_comment: > - This report has been generated by the IntGenomicsLab/lrsomatic analysis pipeline. + This report has been generated by the IntGenomicsLab/lrsomatic analysis pipeline. report_section_order: "IntGenomicsLab-lrsomatic-methods-description": order: -1000 diff --git a/nextflow.config b/nextflow.config index 8b7c724c..b604225b 100644 --- a/nextflow.config +++ b/nextflow.config @@ -344,7 +344,7 @@ manifest { mainScript = 'main.nf' defaultBranch = 'main' nextflowVersion = '!>=25.04.0' - version = '1.0.0' + version = '1.1.0dev' doi = '' } diff --git a/ro-crate-metadata.json b/ro-crate-metadata.json index 67aa24f4..184415cf 100644 --- a/ro-crate-metadata.json +++ b/ro-crate-metadata.json @@ -21,8 +21,8 @@ { "@id": "./", "@type": "Dataset", - "creativeWorkStatus": "Stable", - "datePublished": "2025-11-28T12:51:54+00:00", + "creativeWorkStatus": "InProgress", + "datePublished": "2025-11-28T14:50:58+00:00", "description": "# IntGenomicsLab/lrsomatic\n\n[![GitHub Actions CI Status](https://github.com/IntGenomicsLab/lrsomatic/actions/workflows/nf-test.yml/badge.svg)](https://github.com/IntGenomicsLab/lrsomatic/actions/workflows/nf-test.yml)\n[![GitHub Actions Linting Status](https://github.com/IntGenomicsLab/lrsomatic/actions/workflows/linting.yml/badge.svg)](https://github.com/IntGenomicsLab/lrsomatic/actions/workflows/linting.yml)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.XXXXXXX-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.XXXXXXX)\n[![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com)\n\n[![Nextflow](https://img.shields.io/badge/version-%E2%89%A525.04.0-green?style=flat&logo=nextflow&logoColor=white&color=%230DC09D&link=https%3A%2F%2Fnextflow.io)](https://www.nextflow.io/)\n[![nf-core template version](https://img.shields.io/badge/nf--core_template-3.3.2-green?style=flat&logo=nfcore&logoColor=white&color=%2324B064&link=https%3A%2F%2Fnf-co.re)](https://github.com/nf-core/tools/releases/tag/3.3.2)\n[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)\n[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/)\n[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)\n[![Launch on Seqera Platform](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Seqera%20Platform-%234256e7)](https://cloud.seqera.io/launch?pipeline=https://github.com/IntGenomicsLab/lrsomatic)\n\n## Introduction\n\n**IntGenomicsLab/lrsomatic** is a robust bioinformatics pipeline designed for processing and analyzing **somatic DNA sequencing** data for long-read sequencing technologies from **Oxford Nanopore** and **PacBio**. It supports both canonical base DNA and modified base calling, including specialized applications such as **Fiber-seq**.\n\nThis **end-to-end pipeline** handles the entire workflow \u2014 **from raw read processing and alignment, to comprehensive somatic variant calling**, including single nucleotide variants, indels, structural variants, copy number alterations, and modified bases.\n\nIt can be run in both **matched tumour-normal** and **tumour-only mode**, offering flexibility depending on the users study design.\n\nDeveloped using **Nextflow DSL2**, it offers high portability and scalability across diverse computing environments. By leveraging Docker or Singularity containers, installation is streamlined and results are highly reproducible. Each process runs in an isolated container, simplifying dependency management and updates. Where applicable, pipeline components are sourced from **nf-core/modules**, promoting reuse, interoperability, and consistency within the broader Nextflow and nf-core ecosystems.\n\n## Pipeline summary\n\n**1) Pre-processing:**\n\na. Raw read QC ([`cramino`](https://github.com/wdecoster/cramino))\n\nb. Alignment to the reference genome ([`minimap2`](https://github.com/lh3/minimap2))\n\nc. Post alignment QC ([`cramino`](https://github.com/wdecoster/cramino), [`samtools idxstats`](https://github.com/samtools/samtools), [`samtools flagstats`](https://github.com/samtools/samtools), [`samtools stats`](https://github.com/samtools/samtools))\n\nd. Specific for calling modified base calling ([`Modkit`](https://github.com/nanoporetech/modkit), [`Fibertools`](https://github.com/fiberseq/fibertools-rs))\n\n**2i) Matched mode: small variant calling:**\n\na. Calling Germline SNPs ([`Clair3`](https://github.com/HKU-BAL/Clair3))\n\nb. Phasing and Haplotagging the SNPs in the normal and tumour BAM ([`LongPhase`](https://github.com/twolinin/longphase))\n\nc. Calling somatic SNVs ([`ClairS`](https://github.com/HKU-BAL/ClairS))\n\n**2ii) Tumour only mode: small variant calling:**\n\na. Calling Germline SNPs and somatic SNVs ([`ClairS-TO`](https://github.com/HKU-BAL/ClairS-TO))\n\nb. Phasing and Haplotagging germline SNPs in tumour BAM ([`LongPhase`](https://github.com/twolinin/longphase))\n\n**3) Large variant calling:**\n\na. Somatic structural variant calling ([`Severus`](https://github.com/KolmogorovLab/Severus))\n\nb. Copy number alterion calling; long read version of ([`ASCAT`](https://github.com/VanLoo-lab/ascat))\n\n**4) Annotation:**\n\na. Small variant annotation ([`VEP`](https://github.com/Ensembl/ensembl-vep))\n\nb. Structural variant annotation ([`VEP`](https://github.com/Ensembl/ensembl-vep))\n\n\n\n## Usage\n\n> [!NOTE]\n> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.\n\nFirst prepare a samplesheet with your input data that looks as follows:\n\n```csv\nsample,bam_tumor,bam_normal,platform,sex,fiber\nsample1,tumour.bam,normal.bam,ont,female,n\nsample2,tumour.bam,,ont,female,y\nsample3,tumour.bam,,pb,male,n\nsample4,tumour.bam,normal.bam,pb,male,y\n```\n\nEach row represents a sample. The bam files should always be unaligned bam files. All fields except for `bam_normal` are required. If `bam_normal` is empty, the pipeline will run in tumour only mode. `platform` should be either `ont` or `pb` for Oxford Nanopore Sequencing or PacBio sequencing, respectively. `sex` refers to the biological sex of the sample and should be either `female` or `male`. Finally, `fiber` specifies whether your sample is Fiber-seq data or not and should have either `y` for Yes or `n` for No.\n\nNow, you can run the pipeline using:\n\n```bash\nnextflow run IntGenomicsLab/lrsomatic \\\n -profile \\\n --input samplesheet.csv \\\n --outdir \n```\n\nMore detail is given in our [usage documentation](/docs/usage.md)\n\n> [!WARNING]\n> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files).\n\n## Credits\n\nIntGenomicsLab/lr_somatic was originally written by Luuk Harbers, Robert Forsyth, Alexandra Pan\u010d\u00edkov\u00e1, Marios Eftychiou, Ruben Cools, Laurens Lambrechts, and Jonas Demeulemeester.\n\n## Pipeline output\n\nThis pipeline produces a series of different output files. The main output is an aligned and phased tumour bam file. This bam file can be used by any typical downstream tool that uses bam files as input. Furthermore, we have sample-specific QC outputs from `cramino` (fastq), `cramino` (bam), `mosdepth`, `samtools` (stats/flagstat/idxstats), and optionally `fibertools`. Finally, we have a `multiqc` report from that combines the output from `mosdepth` and `samtools` into one html report.\n\nBesides QC and the aligned and phased bam file, we have output from (structural) variant and copy number callers, of which some are optional. The output from these variant callers can be found in their respective folders. For small and structural variant callers (`clairS`, `clairS-TO`, and `severus`) these will contain, among others, `vcf` files with called variants. For `ascat` these contain files with final copy number information and plots of the copy number profiles.\n\nExample output directory structure:\n\n```\n\u251c\u2500\u2500 Sample 1\n\u2502 \u251c\u2500\u2500 ascat\n\u2502 \u251c\u2500\u2500 bamfiles\n\u2502 \u251c\u2500\u2500 qc\n\u2502 \u2502 \u251c\u2500\u2500 tumor\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 cramino_aln\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 cramino_ubam\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 fibertoolsrs\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 mosdepth\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 samtools\n\u2502 \u251c\u2500\u2500 variants\n\u2502 \u2502 \u251c\u2500\u2500clairS-TO\n\u2502 \u2502 \u251c\u2500\u2500severus\n\u2502 \u251c\u2500\u2500 vep\n\u2502 \u2502 \u251c\u2500\u2500 germline\n\u2502 \u2502 \u251c\u2500\u2500 somatic\n\u2502 \u2502 \u251c\u2500\u2500 SVs\n\u2502\n\u251c\u2500\u2500 Sample 2\n\u2502 \u251c\u2500\u2500 ascat\n\u2502 \u251c\u2500\u2500 bamfiles\n\u2502 \u251c\u2500\u2500 qc\n\u2502 \u2502 \u251c\u2500\u2500 tumor\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 cramino_aln\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 cramino_ubam\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 fibertoolsrs\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 mosdepth\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 samtools\n\u2502 \u2502 \u251c\u2500\u2500 normal\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 cramino_aln\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 cramino_ubam\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 fibertoolsrs\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 mosdepth\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 samtools\n\u2502 \u251c\u2500\u2500 variants\n\u2502 \u2502 \u251c\u2500\u2500 clair3\n\u2502 \u2502 \u251c\u2500\u2500 clairS\n\u2502 \u2502 \u251c\u2500\u2500 severus\n\u2502 \u251c\u2500\u2500 vep\n\u2502 \u2502 \u251c\u2500\u2500 germline\n\u2502 \u2502 \u251c\u2500\u2500 somatic\n\u2502 \u2502 \u251c\u2500\u2500 SVs\n\u251c\u2500\u2500 pipeline_info\n```\n\nmore detail is given in our [output documentation](/docs/output.md)\n\n## Contributions and Support\n\nIf you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md).\n\n## Citations\n\n\n\n\nAn extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.\n\nThis pipeline uses code and infrastructure developed and maintained by the [nf-core](https://nf-co.re) community, reused here under the [MIT license](https://github.com/nf-core/tools/blob/main/LICENSE).\n\n> **The nf-core framework for community-curated bioinformatics pipelines.**\n>\n> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.\n>\n> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x).\n", "hasPart": [ { @@ -96,7 +96,7 @@ }, "mentions": [ { - "@id": "#abec6cbc-500e-43f6-bc1f-0f2530f2956d" + "@id": "#1ab0baa8-fedd-4ff3-867b-bf19e43d77c6" } ], "name": "IntGenomicsLab/lrsomatic" @@ -124,7 +124,7 @@ "ComputationalWorkflow" ], "dateCreated": "", - "dateModified": "2025-11-28T13:51:54Z", + "dateModified": "2025-11-28T15:50:58Z", "dct:conformsTo": "https://bioschemas.org/profiles/ComputationalWorkflow/1.0-RELEASE/", "keywords": [ "nf-core", @@ -144,10 +144,10 @@ }, "url": [ "https://github.com/IntGenomicsLab/lrsomatic", - "https://nf-co.re/IntGenomicsLab/lrsomatic/1.0.0/" + "https://nf-co.re/IntGenomicsLab/lrsomatic/dev/" ], "version": [ - "1.0.0" + "1.1.0dev" ] }, { @@ -163,11 +163,11 @@ "version": "!>=25.04.0" }, { - "@id": "#abec6cbc-500e-43f6-bc1f-0f2530f2956d", + "@id": "#1ab0baa8-fedd-4ff3-867b-bf19e43d77c6", "@type": "TestSuite", "instance": [ { - "@id": "#7f53c2d3-49e7-4c7c-bec5-7e617642b3b6" + "@id": "#e32a6c88-125c-436a-bfa8-ea405ea002d5" } ], "mainEntity": { @@ -176,7 +176,7 @@ "name": "Test suite for IntGenomicsLab/lrsomatic" }, { - "@id": "#7f53c2d3-49e7-4c7c-bec5-7e617642b3b6", + "@id": "#e32a6c88-125c-436a-bfa8-ea405ea002d5", "@type": "TestInstance", "name": "GitHub Actions workflow for testing IntGenomicsLab/lrsomatic", "resource": "repos/IntGenomicsLab/lrsomatic/actions/workflows/nf-test.yml", diff --git a/tests/default.nf.test.snap b/tests/default.nf.test.snap index e8071e7c..1c8662ed 100644 --- a/tests/default.nf.test.snap +++ b/tests/default.nf.test.snap @@ -82,7 +82,7 @@ "wget": "1.21.4" }, "Workflow": { - "IntGenomicsLab/lrsomatic": "v1.0.0" + "IntGenomicsLab/lrsomatic": "v1.1.0dev" } }, [ @@ -369,6 +369,6 @@ "nf-test": "0.9.3", "nextflow": "25.10.0" }, - "timestamp": "2025-11-28T14:26:44.508445086" + "timestamp": "2025-11-28T16:07:44.541483428" } } \ No newline at end of file From b5938f2355188292f7219b1855a5033679bb7d40 Mon Sep 17 00:00:00 2001 From: robert-a-forsyth Date: Fri, 28 Nov 2025 16:14:18 +0100 Subject: [PATCH 02/10] prettier --- CHANGELOG.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index e1e04cb4..9d4c1dab 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,8 +5,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## v1.1.0 - [date] - - ## v1.0.0 - [28 Nov 2025] Initial release of IntGenomicsLab/lrsomatic, created with the [nf-core](https://nf-co.re/) template. From 5b24d2098474ee8f6f4b54398a1e40615e25bcad Mon Sep 17 00:00:00 2001 From: robert-a-forsyth Date: Tue, 2 Dec 2025 15:53:55 +0100 Subject: [PATCH 03/10] view --- subworkflows/local/tumor_normal_happhase.nf | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/subworkflows/local/tumor_normal_happhase.nf b/subworkflows/local/tumor_normal_happhase.nf index 91c7014c..7c32f267 100644 --- a/subworkflows/local/tumor_normal_happhase.nf +++ b/subworkflows/local/tumor_normal_happhase.nf @@ -212,12 +212,15 @@ workflow TUMOR_NORMAL_HAPPHASE { ch_versions = ch_versions.mix(SAMTOOLS_INDEX.out.versions) + mixed_bams_vcf.view() + // Add index to channel mixed_bams_vcf .join(mixed_hapbams) .join(SAMTOOLS_INDEX.out.bai) .set{ mixed_hapbams } + mixed_hapbams.view() // mixed_hapbams -> meta: [id, paired_data, platform, sex, type, fiber, basecall_model] // bams: haplotagged aligned bams // bais: indexes for bam files @@ -246,7 +249,7 @@ workflow TUMOR_NORMAL_HAPPHASE { .join(LONGPHASE_PHASE.out.vcf) .join(LONGPHASE_PHASE.out.tbi) .set{tumor_normal_severus} - + tumor_normal_severus.view() // tumor_normal_severus -> meta: [id, paired_data, platform, sex, fiber, basecall_model] // tumor_bam: haplotagged aligned bam for tumor // tumor_bai: indexes for tumor bam files From 19b15b42a666520d72452fbd83372414a7aba4de Mon Sep 17 00:00:00 2001 From: robert-a-forsyth Date: Tue, 2 Dec 2025 16:08:54 +0100 Subject: [PATCH 04/10] view --- subworkflows/local/tumor_normal_happhase.nf | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/subworkflows/local/tumor_normal_happhase.nf b/subworkflows/local/tumor_normal_happhase.nf index 7c32f267..a10f843d 100644 --- a/subworkflows/local/tumor_normal_happhase.nf +++ b/subworkflows/local/tumor_normal_happhase.nf @@ -163,6 +163,9 @@ workflow TUMOR_NORMAL_HAPPHASE { // Add phased vcf to tumour bams and type information // mix with the normal bams + tumor_bams.view() + LONGPHASE_PHASE.out.vcf.view() + tumor_bams .join(LONGPHASE_PHASE.out.vcf) .map { meta, bam, bai, vcf -> @@ -212,15 +215,12 @@ workflow TUMOR_NORMAL_HAPPHASE { ch_versions = ch_versions.mix(SAMTOOLS_INDEX.out.versions) - mixed_bams_vcf.view() - // Add index to channel mixed_bams_vcf .join(mixed_hapbams) .join(SAMTOOLS_INDEX.out.bai) .set{ mixed_hapbams } - mixed_hapbams.view() // mixed_hapbams -> meta: [id, paired_data, platform, sex, type, fiber, basecall_model] // bams: haplotagged aligned bams // bais: indexes for bam files @@ -249,7 +249,6 @@ workflow TUMOR_NORMAL_HAPPHASE { .join(LONGPHASE_PHASE.out.vcf) .join(LONGPHASE_PHASE.out.tbi) .set{tumor_normal_severus} - tumor_normal_severus.view() // tumor_normal_severus -> meta: [id, paired_data, platform, sex, fiber, basecall_model] // tumor_bam: haplotagged aligned bam for tumor // tumor_bai: indexes for tumor bam files From 77ae92fdbeab4232546c6b83afde5f5a398010a7 Mon Sep 17 00:00:00 2001 From: robert-a-forsyth Date: Tue, 2 Dec 2025 16:28:30 +0100 Subject: [PATCH 05/10] view --- subworkflows/local/tumor_normal_happhase.nf | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/subworkflows/local/tumor_normal_happhase.nf b/subworkflows/local/tumor_normal_happhase.nf index a10f843d..8e9778cb 100644 --- a/subworkflows/local/tumor_normal_happhase.nf +++ b/subworkflows/local/tumor_normal_happhase.nf @@ -22,6 +22,8 @@ workflow TUMOR_NORMAL_HAPPHASE { somatic_vep = Channel.empty() germline_vep = Channel.empty() + mixed_bams.view() + // Branch input bams in normal and tumour mixed_bams .branch{ meta, bam, bai -> @@ -39,7 +41,7 @@ workflow TUMOR_NORMAL_HAPPHASE { return [basecall_model, meta, file] } .set{downloaded_model_files} - + downloaded_model_files.view() mixed_bams.normal .map{ meta, bam, bai -> def basecall_model = (!meta.clair3_model || meta.clair3_model.toString().trim() in ['', '[]']) ? meta.basecall_model : meta.clair3_model @@ -53,6 +55,7 @@ workflow TUMOR_NORMAL_HAPPHASE { return [ basecall_model, new_meta, bam, bai ] } .set { normal_bams_model } + normal_bams_model.view() normal_bams_model .combine(downloaded_model_files,by:0) @@ -61,6 +64,7 @@ workflow TUMOR_NORMAL_HAPPHASE { return [meta, bam, bai, model, platform] } .set{ normal_bams } + normal_bams.view() // normal_bams -> meta: [id, paired_data, platform, sex, fiber, basecall_model] // bam: list of concatenated aligned bams @@ -83,6 +87,7 @@ workflow TUMOR_NORMAL_HAPPHASE { return[new_meta, bam, bai] } .set{ tumor_bams } + tumor_bams.view() // tumor_bams -> meta: [id, paired_data, platform, sex, fiber, basecall_model] // bam: list of concatenated aligned bams From ada3e8f5d42b84805e46864bff9e2ce7eca2d608 Mon Sep 17 00:00:00 2001 From: robert-a-forsyth Date: Tue, 2 Dec 2025 16:53:17 +0100 Subject: [PATCH 06/10] channel fix --- subworkflows/local/tumor_normal_happhase.nf | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/subworkflows/local/tumor_normal_happhase.nf b/subworkflows/local/tumor_normal_happhase.nf index 8e9778cb..83c7f3b2 100644 --- a/subworkflows/local/tumor_normal_happhase.nf +++ b/subworkflows/local/tumor_normal_happhase.nf @@ -23,7 +23,7 @@ workflow TUMOR_NORMAL_HAPPHASE { germline_vep = Channel.empty() mixed_bams.view() - + // Branch input bams in normal and tumour mixed_bams .branch{ meta, bam, bai -> @@ -50,7 +50,7 @@ workflow TUMOR_NORMAL_HAPPHASE { platform: meta.platform, sex: meta.sex, fiber: meta.fiber, - basecall_model: meta.basecall_model, + basecall_model: basecall_model, clairS_model: meta.clairS_model] return [ basecall_model, new_meta, bam, bai ] } @@ -77,12 +77,13 @@ workflow TUMOR_NORMAL_HAPPHASE { // remove type from so that information can be merged easier later mixed_bams.tumor .map{ meta, bam, bai -> + def basecall_model = (!meta.clair3_model || meta.clair3_model.toString().trim() in ['', '[]']) ? meta.basecall_model : meta.clair3_model def new_meta = [id: meta.id, paired_data: meta.paired_data, platform: meta.platform, sex: meta.sex, fiber: meta.fiber, - basecall_model: meta.basecall_model, + basecall_model: basecall_model, clairS_model: meta.clairS_model] return[new_meta, bam, bai] } From 742f0ee695309378decb2118d66e2dcb927416c4 Mon Sep 17 00:00:00 2001 From: robert-a-forsyth Date: Thu, 4 Dec 2025 10:07:40 +0100 Subject: [PATCH 07/10] update ascat for pdf plotting --- conf/modules.config | 3 ++- modules/nf-core/ascat/main.nf | 11 ++++++----- nextflow.config | 1 + 3 files changed, 9 insertions(+), 6 deletions(-) diff --git a/conf/modules.config b/conf/modules.config index d82196bb..081c0892 100644 --- a/conf/modules.config +++ b/conf/modules.config @@ -278,7 +278,8 @@ process { "min_map_qual": params.ascat_min_map_qual, "longread_bins": params.ascat_longread_bins, "allele_counter_flags": params.ascat_allelecounter_flags, - "penalty": params.ascat_penalty + "penalty": params.ascat_penalty, + "pdf_plots": params.ascat_pdf_plots ] } publishDir = [ path: { "${params.outdir}/${meta.id}/ascat" }, diff --git a/modules/nf-core/ascat/main.nf b/modules/nf-core/ascat/main.nf index 366a4ba4..366c9efe 100644 --- a/modules/nf-core/ascat/main.nf +++ b/modules/nf-core/ascat/main.nf @@ -42,6 +42,7 @@ process ASCAT { def penalty = args.penalty ? "$args.penalty" : "NULL" def gc_input = gc_file ? "$gc_file" : "NULL" def rt_input = rt_file ? "$rt_file" : "NULL" + def pdf_plots = args.pdf_plots ? "$args.pdf_plots" : "NULL" def minCounts_arg = args.minCounts ? ",minCounts = $args.minCounts" : "" def bed_file_arg = bed_file ? ",BED_file = '$bed_file'": "" @@ -56,7 +57,7 @@ process ASCAT { def normal_bam = input_normal ? ",normalseqfile = '$input_normal'" : "" def normal_name = input_normal ? ",normalname = '${prefix}.normal'" : "" def longread_bins = args.longread_bins ? ",loci_binsize = $args.longread_bins" : "" - def allele_counter_flags = args.allele_counter_flags ? ",additional_allelecounter_flags = '$args.allele_counter_flags'" : "" + def allele_counter_flags = args.allele_counter_flags ? ",additional_allelecounter_flags = '$args.allele_counter_flags'" : "" """ #!/usr/bin/env Rscript library(RColorBrewer) @@ -153,13 +154,13 @@ process ASCAT { #Run ASCAT to fit every tumor to a model, inferring ploidy, normal cell contamination, and discrete copy numbers #If psi and rho are manually set: if (!is.null($purity) && !is.null($ploidy)){ - ascat.output <- ascat.runAscat(ascat.bc, gamma=1, rho_manual=$purity, psi_manual=$ploidy) + ascat.output <- ascat.runAscat(ascat.bc, gamma=1, rho_manual=$purity, psi_manual=$ploidy, pdfPlot = $pdf_plots) } else if(!is.null($purity) && is.null($ploidy)){ - ascat.output <- ascat.runAscat(ascat.bc, gamma=1, rho_manual=$purity) + ascat.output <- ascat.runAscat(ascat.bc, gamma=1, rho_manual=$purity, pdfPlot = $pdf_plots) } else if(!is.null($ploidy) && is.null($purity)){ - ascat.output <- ascat.runAscat(ascat.bc, gamma=1, psi_manual=$ploidy) + ascat.output <- ascat.runAscat(ascat.bc, gamma=1, psi_manual=$ploidy, pdfPlot = $pdf_plots) } else { - ascat.output <- ascat.runAscat(ascat.bc, gamma=1) + ascat.output <- ascat.runAscat(ascat.bc, gamma=1, pdfPlot = $pdf_plots) } #Extract metrics from ASCAT profiles diff --git a/nextflow.config b/nextflow.config index b604225b..43a2a123 100644 --- a/nextflow.config +++ b/nextflow.config @@ -55,6 +55,7 @@ params { ascat_penalty = 150 ascat_purity = null ascat_longread_bins = 2000 + ascat_pdf_plots = false ascat_allelecounter_flags = "-f 0" ascat_chroms = null // Only use if running on a subset of chromosomes (c(1:22, 'X', 'Y')) From feeee9040366a03fac69920f3196cceccd30493f Mon Sep 17 00:00:00 2001 From: robert-a-forsyth Date: Thu, 4 Dec 2025 10:43:08 +0100 Subject: [PATCH 08/10] change var --- modules/nf-core/ascat/main.nf | 8 ++++---- nextflow.config | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/modules/nf-core/ascat/main.nf b/modules/nf-core/ascat/main.nf index 366c9efe..1a809687 100644 --- a/modules/nf-core/ascat/main.nf +++ b/modules/nf-core/ascat/main.nf @@ -154,13 +154,13 @@ process ASCAT { #Run ASCAT to fit every tumor to a model, inferring ploidy, normal cell contamination, and discrete copy numbers #If psi and rho are manually set: if (!is.null($purity) && !is.null($ploidy)){ - ascat.output <- ascat.runAscat(ascat.bc, gamma=1, rho_manual=$purity, psi_manual=$ploidy, pdfPlot = $pdf_plots) + ascat.output <- ascat.runAscat(ascat.bc, gamma=1, rho_manual=$purity, psi_manual=$ploidy, pdfPlot = "$pdf_plots") } else if(!is.null($purity) && is.null($ploidy)){ - ascat.output <- ascat.runAscat(ascat.bc, gamma=1, rho_manual=$purity, pdfPlot = $pdf_plots) + ascat.output <- ascat.runAscat(ascat.bc, gamma=1, rho_manual=$purity, pdfPlot = "$pdf_plots") } else if(!is.null($ploidy) && is.null($purity)){ - ascat.output <- ascat.runAscat(ascat.bc, gamma=1, psi_manual=$ploidy, pdfPlot = $pdf_plots) + ascat.output <- ascat.runAscat(ascat.bc, gamma=1, psi_manual=$ploidy, pdfPlot = "$pdf_plots") } else { - ascat.output <- ascat.runAscat(ascat.bc, gamma=1, pdfPlot = $pdf_plots) + ascat.output <- ascat.runAscat(ascat.bc, gamma=1, pdfPlot = "$pdf_plots") } #Extract metrics from ASCAT profiles diff --git a/nextflow.config b/nextflow.config index 43a2a123..12a79dea 100644 --- a/nextflow.config +++ b/nextflow.config @@ -55,7 +55,7 @@ params { ascat_penalty = 150 ascat_purity = null ascat_longread_bins = 2000 - ascat_pdf_plots = false + ascat_pdf_plots = "False" ascat_allelecounter_flags = "-f 0" ascat_chroms = null // Only use if running on a subset of chromosomes (c(1:22, 'X', 'Y')) From 0e9e2569653737743206183c906f271290242c1d Mon Sep 17 00:00:00 2001 From: robert-a-forsyth Date: Thu, 4 Dec 2025 18:01:36 +0100 Subject: [PATCH 09/10] prettier and linting --- ro-crate-metadata.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ro-crate-metadata.json b/ro-crate-metadata.json index 184415cf..f1344b13 100644 --- a/ro-crate-metadata.json +++ b/ro-crate-metadata.json @@ -23,7 +23,7 @@ "@type": "Dataset", "creativeWorkStatus": "InProgress", "datePublished": "2025-11-28T14:50:58+00:00", - "description": "# IntGenomicsLab/lrsomatic\n\n[![GitHub Actions CI Status](https://github.com/IntGenomicsLab/lrsomatic/actions/workflows/nf-test.yml/badge.svg)](https://github.com/IntGenomicsLab/lrsomatic/actions/workflows/nf-test.yml)\n[![GitHub Actions Linting Status](https://github.com/IntGenomicsLab/lrsomatic/actions/workflows/linting.yml/badge.svg)](https://github.com/IntGenomicsLab/lrsomatic/actions/workflows/linting.yml)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.XXXXXXX-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.XXXXXXX)\n[![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com)\n\n[![Nextflow](https://img.shields.io/badge/version-%E2%89%A525.04.0-green?style=flat&logo=nextflow&logoColor=white&color=%230DC09D&link=https%3A%2F%2Fnextflow.io)](https://www.nextflow.io/)\n[![nf-core template version](https://img.shields.io/badge/nf--core_template-3.3.2-green?style=flat&logo=nfcore&logoColor=white&color=%2324B064&link=https%3A%2F%2Fnf-co.re)](https://github.com/nf-core/tools/releases/tag/3.3.2)\n[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)\n[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/)\n[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)\n[![Launch on Seqera Platform](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Seqera%20Platform-%234256e7)](https://cloud.seqera.io/launch?pipeline=https://github.com/IntGenomicsLab/lrsomatic)\n\n## Introduction\n\n**IntGenomicsLab/lrsomatic** is a robust bioinformatics pipeline designed for processing and analyzing **somatic DNA sequencing** data for long-read sequencing technologies from **Oxford Nanopore** and **PacBio**. It supports both canonical base DNA and modified base calling, including specialized applications such as **Fiber-seq**.\n\nThis **end-to-end pipeline** handles the entire workflow \u2014 **from raw read processing and alignment, to comprehensive somatic variant calling**, including single nucleotide variants, indels, structural variants, copy number alterations, and modified bases.\n\nIt can be run in both **matched tumour-normal** and **tumour-only mode**, offering flexibility depending on the users study design.\n\nDeveloped using **Nextflow DSL2**, it offers high portability and scalability across diverse computing environments. By leveraging Docker or Singularity containers, installation is streamlined and results are highly reproducible. Each process runs in an isolated container, simplifying dependency management and updates. Where applicable, pipeline components are sourced from **nf-core/modules**, promoting reuse, interoperability, and consistency within the broader Nextflow and nf-core ecosystems.\n\n## Pipeline summary\n\n**1) Pre-processing:**\n\na. Raw read QC ([`cramino`](https://github.com/wdecoster/cramino))\n\nb. Alignment to the reference genome ([`minimap2`](https://github.com/lh3/minimap2))\n\nc. Post alignment QC ([`cramino`](https://github.com/wdecoster/cramino), [`samtools idxstats`](https://github.com/samtools/samtools), [`samtools flagstats`](https://github.com/samtools/samtools), [`samtools stats`](https://github.com/samtools/samtools))\n\nd. Specific for calling modified base calling ([`Modkit`](https://github.com/nanoporetech/modkit), [`Fibertools`](https://github.com/fiberseq/fibertools-rs))\n\n**2i) Matched mode: small variant calling:**\n\na. Calling Germline SNPs ([`Clair3`](https://github.com/HKU-BAL/Clair3))\n\nb. Phasing and Haplotagging the SNPs in the normal and tumour BAM ([`LongPhase`](https://github.com/twolinin/longphase))\n\nc. Calling somatic SNVs ([`ClairS`](https://github.com/HKU-BAL/ClairS))\n\n**2ii) Tumour only mode: small variant calling:**\n\na. Calling Germline SNPs and somatic SNVs ([`ClairS-TO`](https://github.com/HKU-BAL/ClairS-TO))\n\nb. Phasing and Haplotagging germline SNPs in tumour BAM ([`LongPhase`](https://github.com/twolinin/longphase))\n\n**3) Large variant calling:**\n\na. Somatic structural variant calling ([`Severus`](https://github.com/KolmogorovLab/Severus))\n\nb. Copy number alterion calling; long read version of ([`ASCAT`](https://github.com/VanLoo-lab/ascat))\n\n**4) Annotation:**\n\na. Small variant annotation ([`VEP`](https://github.com/Ensembl/ensembl-vep))\n\nb. Structural variant annotation ([`VEP`](https://github.com/Ensembl/ensembl-vep))\n\n\n\n## Usage\n\n> [!NOTE]\n> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.\n\nFirst prepare a samplesheet with your input data that looks as follows:\n\n```csv\nsample,bam_tumor,bam_normal,platform,sex,fiber\nsample1,tumour.bam,normal.bam,ont,female,n\nsample2,tumour.bam,,ont,female,y\nsample3,tumour.bam,,pb,male,n\nsample4,tumour.bam,normal.bam,pb,male,y\n```\n\nEach row represents a sample. The bam files should always be unaligned bam files. All fields except for `bam_normal` are required. If `bam_normal` is empty, the pipeline will run in tumour only mode. `platform` should be either `ont` or `pb` for Oxford Nanopore Sequencing or PacBio sequencing, respectively. `sex` refers to the biological sex of the sample and should be either `female` or `male`. Finally, `fiber` specifies whether your sample is Fiber-seq data or not and should have either `y` for Yes or `n` for No.\n\nNow, you can run the pipeline using:\n\n```bash\nnextflow run IntGenomicsLab/lrsomatic \\\n -profile \\\n --input samplesheet.csv \\\n --outdir \n```\n\nMore detail is given in our [usage documentation](/docs/usage.md)\n\n> [!WARNING]\n> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files).\n\n## Credits\n\nIntGenomicsLab/lr_somatic was originally written by Luuk Harbers, Robert Forsyth, Alexandra Pan\u010d\u00edkov\u00e1, Marios Eftychiou, Ruben Cools, Laurens Lambrechts, and Jonas Demeulemeester.\n\n## Pipeline output\n\nThis pipeline produces a series of different output files. The main output is an aligned and phased tumour bam file. This bam file can be used by any typical downstream tool that uses bam files as input. Furthermore, we have sample-specific QC outputs from `cramino` (fastq), `cramino` (bam), `mosdepth`, `samtools` (stats/flagstat/idxstats), and optionally `fibertools`. Finally, we have a `multiqc` report from that combines the output from `mosdepth` and `samtools` into one html report.\n\nBesides QC and the aligned and phased bam file, we have output from (structural) variant and copy number callers, of which some are optional. The output from these variant callers can be found in their respective folders. For small and structural variant callers (`clairS`, `clairS-TO`, and `severus`) these will contain, among others, `vcf` files with called variants. For `ascat` these contain files with final copy number information and plots of the copy number profiles.\n\nExample output directory structure:\n\n```\n\u251c\u2500\u2500 Sample 1\n\u2502 \u251c\u2500\u2500 ascat\n\u2502 \u251c\u2500\u2500 bamfiles\n\u2502 \u251c\u2500\u2500 qc\n\u2502 \u2502 \u251c\u2500\u2500 tumor\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 cramino_aln\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 cramino_ubam\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 fibertoolsrs\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 mosdepth\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 samtools\n\u2502 \u251c\u2500\u2500 variants\n\u2502 \u2502 \u251c\u2500\u2500clairS-TO\n\u2502 \u2502 \u251c\u2500\u2500severus\n\u2502 \u251c\u2500\u2500 vep\n\u2502 \u2502 \u251c\u2500\u2500 germline\n\u2502 \u2502 \u251c\u2500\u2500 somatic\n\u2502 \u2502 \u251c\u2500\u2500 SVs\n\u2502\n\u251c\u2500\u2500 Sample 2\n\u2502 \u251c\u2500\u2500 ascat\n\u2502 \u251c\u2500\u2500 bamfiles\n\u2502 \u251c\u2500\u2500 qc\n\u2502 \u2502 \u251c\u2500\u2500 tumor\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 cramino_aln\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 cramino_ubam\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 fibertoolsrs\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 mosdepth\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 samtools\n\u2502 \u2502 \u251c\u2500\u2500 normal\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 cramino_aln\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 cramino_ubam\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 fibertoolsrs\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 mosdepth\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 samtools\n\u2502 \u251c\u2500\u2500 variants\n\u2502 \u2502 \u251c\u2500\u2500 clair3\n\u2502 \u2502 \u251c\u2500\u2500 clairS\n\u2502 \u2502 \u251c\u2500\u2500 severus\n\u2502 \u251c\u2500\u2500 vep\n\u2502 \u2502 \u251c\u2500\u2500 germline\n\u2502 \u2502 \u251c\u2500\u2500 somatic\n\u2502 \u2502 \u251c\u2500\u2500 SVs\n\u251c\u2500\u2500 pipeline_info\n```\n\nmore detail is given in our [output documentation](/docs/output.md)\n\n## Contributions and Support\n\nIf you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md).\n\n## Citations\n\n\n\n\nAn extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.\n\nThis pipeline uses code and infrastructure developed and maintained by the [nf-core](https://nf-co.re) community, reused here under the [MIT license](https://github.com/nf-core/tools/blob/main/LICENSE).\n\n> **The nf-core framework for community-curated bioinformatics pipelines.**\n>\n> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.\n>\n> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x).\n", + "description": "# IntGenomicsLab/lrsomatic\n\n[![GitHub Actions CI Status](https://github.com/IntGenomicsLab/lrsomatic/actions/workflows/nf-test.yml/badge.svg)](https://github.com/IntGenomicsLab/lrsomatic/actions/workflows/nf-test.yml)\n[![GitHub Actions Linting Status](https://github.com/IntGenomicsLab/lrsomatic/actions/workflows/linting.yml/badge.svg)](https://github.com/IntGenomicsLab/lrsomatic/actions/workflows/linting.yml)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.17751829-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.17751829)\n[![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com)\n\n[![Nextflow](https://img.shields.io/badge/version-%E2%89%A525.04.0-green?style=flat&logo=nextflow&logoColor=white&color=%230DC09D&link=https%3A%2F%2Fnextflow.io)](https://www.nextflow.io/)\n[![nf-core template version](https://img.shields.io/badge/nf--core_template-3.3.2-green?style=flat&logo=nfcore&logoColor=white&color=%2324B064&link=https%3A%2F%2Fnf-co.re)](https://github.com/nf-core/tools/releases/tag/3.3.2)\n[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)\n[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/)\n[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)\n[![Launch on Seqera Platform](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Seqera%20Platform-%234256e7)](https://cloud.seqera.io/launch?pipeline=https://github.com/IntGenomicsLab/lrsomatic)\n\n## Introduction\n\n**IntGenomicsLab/lrsomatic** is a robust bioinformatics pipeline designed for processing and analyzing **somatic DNA sequencing** data for long-read sequencing technologies from **Oxford Nanopore** and **PacBio**. It supports both canonical base DNA and modified base calling, including specialized applications such as **Fiber-seq**.\n\nThis **end-to-end pipeline** handles the entire workflow \u2014 **from raw read processing and alignment, to comprehensive somatic variant calling**, including single nucleotide variants, indels, structural variants, copy number alterations, and modified bases.\n\nIt can be run in both **matched tumour-normal** and **tumour-only mode**, offering flexibility depending on the users study design.\n\nDeveloped using **Nextflow DSL2**, it offers high portability and scalability across diverse computing environments. By leveraging Docker or Singularity containers, installation is streamlined and results are highly reproducible. Each process runs in an isolated container, simplifying dependency management and updates. Where applicable, pipeline components are sourced from **nf-core/modules**, promoting reuse, interoperability, and consistency within the broader Nextflow and nf-core ecosystems.\n\n## Pipeline summary\n\n**1) Pre-processing:**\n\na. Raw read QC ([`cramino`](https://github.com/wdecoster/cramino))\n\nb. Alignment to the reference genome ([`minimap2`](https://github.com/lh3/minimap2))\n\nc. Post alignment QC ([`cramino`](https://github.com/wdecoster/cramino), [`samtools idxstats`](https://github.com/samtools/samtools), [`samtools flagstats`](https://github.com/samtools/samtools), [`samtools stats`](https://github.com/samtools/samtools))\n\nd. Specific for calling modified base calling ([`Modkit`](https://github.com/nanoporetech/modkit), [`Fibertools`](https://github.com/fiberseq/fibertools-rs))\n\n**2i) Matched mode: small variant calling:**\n\na. Calling Germline SNPs ([`Clair3`](https://github.com/HKU-BAL/Clair3))\n\nb. Phasing and Haplotagging the SNPs in the normal and tumour BAM ([`LongPhase`](https://github.com/twolinin/longphase))\n\nc. Calling somatic SNVs ([`ClairS`](https://github.com/HKU-BAL/ClairS))\n\n**2ii) Tumour only mode: small variant calling:**\n\na. Calling Germline SNPs and somatic SNVs ([`ClairS-TO`](https://github.com/HKU-BAL/ClairS-TO))\n\nb. Phasing and Haplotagging germline SNPs in tumour BAM ([`LongPhase`](https://github.com/twolinin/longphase))\n\n**3) Large variant calling:**\n\na. Somatic structural variant calling ([`Severus`](https://github.com/KolmogorovLab/Severus))\n\nb. Copy number alterion calling; long read version of ([`ASCAT`](https://github.com/VanLoo-lab/ascat))\n\n**4) Annotation:**\n\na. Small variant annotation ([`VEP`](https://github.com/Ensembl/ensembl-vep))\n\nb. Structural variant annotation ([`VEP`](https://github.com/Ensembl/ensembl-vep))\n\n\n\n## Usage\n\n> [!NOTE]\n> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.\n\nFirst prepare a samplesheet with your input data that looks as follows:\n\n```csv\nsample,bam_tumor,bam_normal,platform,sex,fiber\nsample1,tumour.bam,normal.bam,ont,female,n\nsample2,tumour.bam,,ont,female,y\nsample3,tumour.bam,,pb,male,n\nsample4,tumour.bam,normal.bam,pb,male,y\n```\n\nEach row represents a sample. The bam files should always be unaligned bam files. All fields except for `bam_normal` are required. If `bam_normal` is empty, the pipeline will run in tumour only mode. `platform` should be either `ont` or `pb` for Oxford Nanopore Sequencing or PacBio sequencing, respectively. `sex` refers to the biological sex of the sample and should be either `female` or `male`. Finally, `fiber` specifies whether your sample is Fiber-seq data or not and should have either `y` for Yes or `n` for No.\n\nNow, you can run the pipeline using:\n\n```bash\nnextflow run IntGenomicsLab/lrsomatic \\\n -profile \\\n --input samplesheet.csv \\\n --outdir \n```\n\nMore detail is given in our [usage documentation](/docs/usage.md)\n\n> [!WARNING]\n> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files).\n\n## Credits\n\nIntGenomicsLab/lr_somatic was originally written by Luuk Harbers, Robert Forsyth, Alexandra Pan\u010d\u00edkov\u00e1, Marios Eftychiou, Ruben Cools, Laurens Lambrechts, and Jonas Demeulemeester.\n\n## Pipeline output\n\nThis pipeline produces a series of different output files. The main output is an aligned and phased tumour bam file. This bam file can be used by any typical downstream tool that uses bam files as input. Furthermore, we have sample-specific QC outputs from `cramino` (fastq), `cramino` (bam), `mosdepth`, `samtools` (stats/flagstat/idxstats), and optionally `fibertools`. Finally, we have a `multiqc` report from that combines the output from `mosdepth` and `samtools` into one html report.\n\nBesides QC and the aligned and phased bam file, we have output from (structural) variant and copy number callers, of which some are optional. The output from these variant callers can be found in their respective folders. For small and structural variant callers (`clairS`, `clairS-TO`, and `severus`) these will contain, among others, `vcf` files with called variants. For `ascat` these contain files with final copy number information and plots of the copy number profiles.\n\nExample output directory structure:\n\n```\n\u251c\u2500\u2500 Sample 1\n\u2502 \u251c\u2500\u2500 ascat\n\u2502 \u251c\u2500\u2500 bamfiles\n\u2502 \u251c\u2500\u2500 qc\n\u2502 \u2502 \u251c\u2500\u2500 tumor\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 cramino_aln\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 cramino_ubam\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 fibertoolsrs\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 mosdepth\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 samtools\n\u2502 \u251c\u2500\u2500 variants\n\u2502 \u2502 \u251c\u2500\u2500clairS-TO\n\u2502 \u2502 \u251c\u2500\u2500severus\n\u2502 \u251c\u2500\u2500 vep\n\u2502 \u2502 \u251c\u2500\u2500 germline\n\u2502 \u2502 \u251c\u2500\u2500 somatic\n\u2502 \u2502 \u251c\u2500\u2500 SVs\n\u2502\n\u251c\u2500\u2500 Sample 2\n\u2502 \u251c\u2500\u2500 ascat\n\u2502 \u251c\u2500\u2500 bamfiles\n\u2502 \u251c\u2500\u2500 qc\n\u2502 \u2502 \u251c\u2500\u2500 tumor\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 cramino_aln\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 cramino_ubam\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 fibertoolsrs\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 mosdepth\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 samtools\n\u2502 \u2502 \u251c\u2500\u2500 normal\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 cramino_aln\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 cramino_ubam\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 fibertoolsrs\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 mosdepth\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 samtools\n\u2502 \u251c\u2500\u2500 variants\n\u2502 \u2502 \u251c\u2500\u2500 clair3\n\u2502 \u2502 \u251c\u2500\u2500 clairS\n\u2502 \u2502 \u251c\u2500\u2500 severus\n\u2502 \u251c\u2500\u2500 vep\n\u2502 \u2502 \u251c\u2500\u2500 germline\n\u2502 \u2502 \u251c\u2500\u2500 somatic\n\u2502 \u2502 \u251c\u2500\u2500 SVs\n\u251c\u2500\u2500 pipeline_info\n```\n\nmore detail is given in our [output documentation](/docs/output.md)\n\n## Contributions and Support\n\nIf you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md).\n\n## Citations\n\nIf you use IntGenomicsLab/lrsomatic for your analysis, please cite it using the following doi: [10.5281/zenodo.17751829](https://doi.org/10.5281/zenodo.17751829)\n\nAn extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.\n\nThis pipeline uses code and infrastructure developed and maintained by the [nf-core](https://nf-co.re) community, reused here under the [MIT license](https://github.com/nf-core/tools/blob/main/LICENSE).\n\n> **The nf-core framework for community-curated bioinformatics pipelines.**\n>\n> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.\n>\n> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x).\n", "hasPart": [ { "@id": "main.nf" From c2c0c155dc6507d562f88159a9675680d2ff1f11 Mon Sep 17 00:00:00 2001 From: robert-a-forsyth Date: Mon, 8 Dec 2025 11:07:25 +0100 Subject: [PATCH 10/10] views --- workflows/lrsomatic.nf | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/workflows/lrsomatic.nf b/workflows/lrsomatic.nf index 2b84c973..3614365d 100644 --- a/workflows/lrsomatic.nf +++ b/workflows/lrsomatic.nf @@ -580,11 +580,13 @@ workflow LRSOMATIC { } .join(SEVERUS.out.all_vcf) .set { wakhan_input } - + wakhan_input.view() + ch_fasta.view() WAKHAN ( wakhan_input, ch_fasta ) + ch_versions = ch_versions.mix(WAKHAN.out.versions) }