nf-core · ochkalova · Jun 4, 2026 · Apr 29, 2026 · May 12, 2026 · May 12, 2026
diff --git a/README.md b/README.md
@@ -22,11 +22,12 @@
 ## Introduction
 
 **nf-core/seqsubmit** is a Nextflow pipeline for submitting sequence data to [ENA](https://www.ebi.ac.uk/ena/browser/home).
-Currently, the pipeline supports three submission modes, each routed to a dedicated workflow and requiring its own input samplesheet structure:
+Currently, the pipeline supports four submission modes, each routed to a dedicated workflow and requiring its own input samplesheet structure:
 
 - `mags` for Metagenome Assembled Genomes (MAGs) submission with `GENOMESUBMIT` workflow
 - `bins` for bins submission with `GENOMESUBMIT` workflow
 - `metagenomic_assemblies` for assembly submission with `ASSEMBLYSUBMIT` workflow
+- `reads` for raw sequencing reads submission with `READSUBMIT` workflow
 
 ![seqsubmit workflow diagram](assets/seqsubmit_schema.png)
 
@@ -123,6 +124,38 @@ assembly_2,data/contigs_2.fasta.gz,,,42.7,ERR011323,MEGAHIT,1.2.9
 > [!IMPORTANT]
 > **Samplesheet column requirements**: All columns shown in the example above must be present in your samplesheet, even if some values are empty. Columns must be in exactly the same order as shown.
 
+### `reads` mode (`READSUBMIT`)
+
+The input must follow `assets/schema_input_reads.json`.
+
+Required columns:
+
+- `sample`
+- `sample_accession`
+- `fastq_1`
+- `fastq_2`
+- `platform`
+- `instrument`
+- `library_source`
+- `library_selection`
+- `library_strategy`
+
+Optional columns:
+
+- `insert_size`
+- `library_name`
+- `description`
+
+Example `samplesheet_reads.csv`:
+
+```csv
+sample,sample_accession,fastq_1,fastq_2,platform,instrument,library_source,library_selection,library_strategy,insert_size,library_name,description
+illumina_run_001,SAMEA1234567,data/reads_R1.fastq.gz,data/reads_R2.fastq.gz,ILLUMINA,Illumina HiSeq 2000,GENOMIC,RANDOM,WGS,500,HiSeq_library_001,Illumina sequencing of sample XYZ
+```
+
+> [!IMPORTANT]
+> **Samplesheet column requirements**: All columns shown in the example above must be present in your samplesheet, even if some values are empty. Columns must be in exactly the same order as shown.
+
 ## Usage
 
 > [!NOTE]
@@ -142,7 +175,7 @@ The `mags`/`bins` workflow requires databases for completeness/contamination est
 
 | Parameter                                  | Description                                                                                                       |
 | ------------------------------------------ | ----------------------------------------------------------------------------------------------------------------- |
-| `--mode`                                   | Type of the data to be submitted. Options: `[mags, bins, metagenomic_assemblies]`                                 |
+| `--mode`                                   | Type of the data to be submitted. Options: `[mags, bins, metagenomic_assemblies, reads]`                          |
 | `--input`                                  | Path to the samplesheet describing the data to be submitted                                                       |
 | `--outdir`                                 | Path to the output directory for pipeline results                                                                 |
 | `--submission_study` OR `--study_metadata` | ENA study accession (PRJ/ERP) to submit the data to OR metadata file in JSON/TSV/CSV format to register new study |
@@ -161,7 +194,7 @@ General command template:
 ```bash
 nextflow run nf-core/seqsubmit \
    -profile <docker/singularity/...> \
-   --mode <mags|bins|metagenomic_assemblies> \
+   --mode <mags|bins|metagenomic_assemblies|reads> \
    --input <samplesheet.csv> \
    --centre_name <your_centre> \
    --submission_study <your_study> \

diff --git a/assets/schema_input_reads.json b/assets/schema_input_reads.json
@@ -0,0 +1,223 @@
+{
+    "$schema": "https://json-schema.org/draft/2020-12/schema",
+    "$id": "https://raw.githubusercontent.com/nf-core/seqsubmit/main/assets/schema_input_reads.json",
+    "title": "nf-core/seqsubmit pipeline - params.input schema",
+    "description": "Schema for the sample sheet provided with params.input if params.mode is set to 'reads'",
+    "type": "array",
+    "items": {
+        "type": "object",
+        "properties": {
+            "sample": {
+                "type": "string",
+                "pattern": "^\\S+$",
+                "errorMessage": "Sample must be provided and cannot contain spaces",
+                "meta": ["id"],
+                "description": "Unique experiment/run name"
+            },
+            "sample_accession": {
+                "type": "string",
+                "pattern": "^\\S+$",
+                "errorMessage": "Sample accession must be provided and cannot contain spaces",
+                "description": "ENA sample accession of the sample used to generate the reads"
+            },
+            "fastq_1": {
+                "type": "string",
+                "format": "file-path",
+                "exists": true,
+                "pattern": "^\\S+\\.(fq|fastq)(\\.gz)?$",
+                "errorMessage": "FASTQ file must have extension '.fq' or '.fastq' (optionally gzipped)",
+                "description": "Forward reads FASTQ file (single-end or paired-end)"
+            },
+            "fastq_2": {
+                "anyOf": [
+                    {
+                        "type": "string",
+                        "format": "file-path",
+                        "exists": true,
+                        "pattern": "^\\S+\\.(fq|fastq)(\\.gz)?$"
+                    },
+                    {
+                        "type": "string",
+                        "maxLength": 0
+                    }
+                ],
+                "errorMessage": "FASTQ file for reverse reads must have extension '.fq' or '.fastq' (optionally gzipped)",
+                "description": "Reverse reads FASTQ file if paired-end. Leave empty for single-end reads"
+            },
+            "platform": {
+                "type": "string",
+                "enum": [
+                    "BGISEQ",
+                    "CAPILLARY",
+                    "DNBSEQ",
+                    "ELEMENT",
+                    "GENAPSYS",
+                    "GENEMIND",
+                    "HELICOS",
+                    "ILLUMINA",
+                    "ION_TORRENT",
+                    "LS454",
+                    "OXFORD_NANOPORE",
+                    "PACBIO_SMRT",
+                    "TAPESTRI",
+                    "ULTIMA",
+                    "VELA_DIAGNOSTICS"
+                ],
+                "description": "Sequencing platform. Must be one of the ENA controlled vocabulary values listed in the enum."
+            },
+            "instrument": {
+                "type": "string",
+                "pattern": "^[^\\n]+$",
+                "errorMessage": "Instrument must be provided and cannot span multiple lines",
+                "description": "Sequencer model (e.g., 'Illumina HiSeq 2000', 'PacBio Sequel')"
+            },
+            "library_source": {
+                "type": "string",
+                "enum": [
+                    "GENOMIC",
+                    "GENOMIC SINGLE CELL",
+                    "TRANSCRIPTOMIC",
+                    "TRANSCRIPTOMIC SINGLE CELL",
+                    "METAGENOMIC",
+                    "METATRANSCRIPTOMIC",
+                    "SYNTHETIC",
+                    "VIRAL RNA",
+                    "OTHER"
+                ],
+                "description": "Library source. Must be one of the ENA controlled vocabulary values listed in the enum."
+            },
+            "library_selection": {
+                "type": "string",
+                "enum": [
+                    "RANDOM",
+                    "PCR",
+                    "RANDOM PCR",
+                    "RT-PCR",
+                    "HMPR",
+                    "MF",
+                    "repeat fractionation",
+                    "size fractionation",
+                    "MSLL",
+                    "cDNA",
+                    "cDNA_randomPriming",
+                    "cDNA_oligo_dT",
+                    "PolyA",
+                    "Oligo-dT",
+                    "Inverse rRNA",
+                    "Inverse rRNA selection",
+                    "ChIP",
+                    "ChIP-Seq",
+                    "MNase",
+                    "DNase",
+                    "Hybrid Selection",
+                    "Reduced Representation",
+                    "Restriction Digest",
+                    "5-methylcytidine antibody",
+                    "MBD2 protein methyl-CpG binding domain",
+                    "CAGE",
+                    "RACE",
+                    "MDA",
+                    "padlock probes capture method",
+                    "other",
+                    "unspecified"
+                ],
+                "description": "Library selection. Must be one of the ENA controlled vocabulary values listed in the enum."
+            },
+            "library_strategy": {
+                "type": "string",
+                "enum": [
+                    "WGS",
+                    "WGA",
+                    "WXS",
+                    "RNA-Seq",
+                    "snRNA-seq",
+                    "ssRNA-seq",
+                    "miRNA-Seq",
+                    "ncRNA-Seq",
+                    "FL-cDNA",
+                    "EST",
+                    "Hi-C",
+                    "ATAC-seq",
+                    "WCS",
+                    "RAD-Seq",
+                    "CLONE",
+                    "POOLCLONE",
+                    "AMPLICON",
+                    "CLONEEND",
+                    "FINISHING",
+                    "ChIP-Seq",
+                    "MNase-Seq",
+                    "Ribo-Seq",
+                    "DNase-Hypersensitivity",
+                    "Bisulfite-Seq",
+                    "CTS",
+                    "ChM-Seq",
+                    "GBS",
+                    "MRE-Seq",
+                    "MeDIP-Seq",
+                    "MBD-Seq",
+                    "NOMe-Seq",
+                    "Tn-Seq",
+                    "VALIDATION",
+                    "FAIRE-seq",
+                    "SELEX",
+                    "RIP-Seq",
+                    "ChIA-PET",
+                    "Synthetic-Long-Read",
+                    "Targeted-Capture",
+                    "Tethered Chromatin Conformation Capture",
+                    "OTHER"
+                ],
+                "description": "Library strategy. Must be one of the ENA controlled vocabulary values listed in the enum."
+            },
+            "insert_size": {
+                "anyOf": [
+                    {
+                        "type": "number",
+                        "minimum": 0
+                    },
+                    {
+                        "type": "string",
+                        "maxLength": 0
+                    }
+                ],
+                "errorMessage": "Insert size must be a positive number or empty",
+                "description": "Fragment/insert size for paired-end reads (optional)"
+            },
+            "library_name": {
+                "anyOf": [
+                    {
+                        "type": "string"
+                    },
+                    {
+                        "type": "string",
+                        "maxLength": 0
+                    }
+                ],
+                "description": "Descriptive library name (optional)"
+            },
+            "description": {
+                "anyOf": [
+                    {
+                        "type": "string"
+                    },
+                    {
+                        "type": "string",
+                        "maxLength": 0
+                    }
+                ],
+                "description": "Free-text description of the experiment (optional)"
+            }
+        },
+        "required": [
+            "sample",
+            "sample_accession",
+            "fastq_1",
+            "platform",
+            "instrument",
+            "library_source",
+            "library_selection",
+            "library_strategy"
+        ]
+    }
+}
diff --git a/assets/seqsubmit_schema.png b/assets/seqsubmit_schema.png
diff --git a/conf/modules.config b/conf/modules.config
@@ -176,7 +176,7 @@ process {
         ]
     }
 
-    withName: 'REGISTERSTUDY|GENERATE_ASSEMBLY_MANIFEST' {
+    withName: 'REGISTERSTUDY|GENERATE_ASSEMBLY_MANIFEST|CREATE_READS_MANIFEST' {
         publishDir = [
             enabled: false
         ]

diff --git a/conf/test_reads_paired.config b/conf/test_reads_paired.config
@@ -0,0 +1,34 @@
+/*
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    Nextflow config file for running minimal tests
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    Defines input files and everything required to run a fast and simple pipeline test.
+
+    Use as follows:
+        nextflow run nf-core/seqsubmit -profile test_reads_paired,<docker/singularity> --outdir <OUTDIR>
+
+----------------------------------------------------------------------------------------
+*/
+
+process {
+    resourceLimits = [
+        cpus: 2,
+        memory: '8.GB',
+        time: '1.h'
+    ]
+}
+
+params {
+    config_profile_name        = 'Test --mode reads profile'
+    config_profile_description = 'Minimal test profile for reads submission'
+
+    // Input data
+    input  = params.pipelines_testdata_base_path + 'seqsubmit/samplesheets/samplesheet_reads.csv'
+    outdir = 'test_output'
+
+    mode             = "reads"
+    submission_study = "PRJEB98843"
+    centre_name      = "TEST_CENTER"
+
+    test_upload      = true
+}
diff --git a/docs/output.md b/docs/output.md
@@ -8,7 +8,7 @@ The directories listed below will be created in the results directory (set with
 
 ## Pipeline overview
 
-The pipeline is built using [Nextflow](https://www.nextflow.io/) and performs automated submission of sequence data to ENA. Exact steps and generated outputs depend on the data type and `--mode` executed (`mags`, `bins` or `metagenomic_assemblies`).
+The pipeline is built using [Nextflow](https://www.nextflow.io/) and performs automated submission of sequence data to ENA. Exact steps and generated outputs depend on the data type and `--mode` executed (`mags`, `bins`, `metagenomic_assemblies` or `reads`).
 
 ## `mags` and `bins` outputs
 
@@ -59,6 +59,20 @@ Assembly study registration, manifest generation, and Webin-CLI submission are e
 > Users should read the ENA documentation on referencing submitted data: \
 > metagenomic assemblies: https://ena-docs.readthedocs.io/en/latest/submit/assembly/metagenome/primary.html#assigned-accession-numbers
 
+## `reads` outputs
+
+When `--mode reads` is used, results are written under `reads/`.
+
+<details markdown="1">
+<summary>Output files</summary>
+
+- `reads/`
+  - `upload/reads_accessions.tsv`: run accessions assigned to submitted reads.
+
+</details>
+
+Manifest generation and Webin-CLI submission are executed by the workflow, but their intermediate outputs are not currently published into `--outdir` by the pipeline.
+
 ## Common outputs
 
 ### MultiQC