Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [#165](https://github.com/nf-core/riboseq/issues/165) - Add `--extended_orf_analysis` (default `false`) which routes the hybrid GTF into the genome-BAM ORF callers (Ribo-TISH `predict`, Ribotricer `prepare-orfs`) so novel intergenic ORFs are within scope. RiboCode and transcriptome-BAM consumers stay on the canonical backbone. When the flag is set without a novel-transcript source, the pipeline warns and falls back to canonical ([@pinin4fjords](https://github.com/pinin4fjords))
- [#171](https://github.com/nf-core/riboseq/issues/171) - Under `--extended_orf_analysis true`, run a second STAR pass for Ribo-seq samples against the hybrid transcriptome so RiboCode can call ORFs on novel transcripts; RiboCode then consumes the hybrid transcriptome BAM and hybrid GTF. Outputs land under `<outdir>/hybrid_star/`; the default-off path is unchanged ([@pinin4fjords](https://github.com/pinin4fjords))
- [#170](https://github.com/nf-core/riboseq/issues/170) - Add PRICE (Erhard et al. 2018) as an opt-in ORF caller via `--run_price` (default `false`). Invoked one-shot across the riboseq cohort (`gedi -e IndexGenome` + `gedi -e Price`); calls flow into the cross-caller ORF catalogue. Container via `bioconda::gedi=1.0.6a` ([@pinin4fjords](https://github.com/pinin4fjords))
- [#169](https://github.com/nf-core/riboseq/issues/169) - Add Rp-Bp (Malone et al. 2017) as an opt-in ORF caller via `--run_rpbp`. Implemented via the upstream `fasta_gtf_bam_rpbp` subworkflow (per-tool processes for extract-metagene-profiles, estimate-metagene-profile-bayes-factors, select-periodic-offsets, extract-orf-profiles, estimate-orf-bayes-factors, select-final-prediction-set, plus a shared prepare-genome). Honours `--extended_orf_analysis` by feeding the hybrid GTF when active. Per-sample predicted-ORF BED + DNA + protein FASTA under `<outdir>/orf_predictions/rpbp/`. Expect ~20-24h per replicate at genome-wide scale; pipeline emits a runtime warning when enabled ([@pinin4fjords](https://github.com/pinin4fjords))

### `Fixed`

Expand Down Expand Up @@ -82,6 +83,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
| | `--run_price` |
| | `--extra_price_indexgenome_args` |
| | `--extra_price_price_args` |
| | `--run_rpbp` |
| | `--extra_rpbp_preparegenome_args` |
| | `--extra_rpbp_predictorfs_args` |

### `Dependencies`

Expand Down
16 changes: 16 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -1266,4 +1266,20 @@ process {
]
}

withName: 'RPBP_PREPAREGENOME' {
ext.args = { params.extra_rpbp_preparegenome_args ?: '' }
}
withName: 'RPBP_ESTIMATEMETAGENEBAYESFACTORS|RPBP_ESTIMATEORFBAYESFACTORS' {
// Stan MCMC, 20-24h per replicate at genome scale; first-attempt headroom.
time = { 30.h * task.attempt }
}
withName: '.*:FASTA_GTF_BAM_RPBP:RPBP_SELECTFINALPREDICTIONSET' {
ext.args = { "--select-longest-by-stop --select-best-overlapping ${params.extra_rpbp_predictorfs_args ?: ''}".trim() }
publishDir = [
path: { "${params.outdir}/orf_predictions/rpbp" },
mode: params.publish_dir_mode,
pattern: "*.predicted-orfs.{bed.gz,dna.fa,protein.fa}"
]
}

}
16 changes: 16 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [Ribo-TISH predict](#ribo-tish-predict)
- [Ribotricer detect-orfs](#ribotricer-detect-orfs)
- [RiboCode](#ribocode)
- [Rp-Bp](#rp-bp)
- [PRICE](#price)
- [P-site identification](#p-site-identification)
- [riboWaltz](#ribowaltz)
- [plastid](#plastid)
Expand Down Expand Up @@ -441,6 +443,20 @@ The `-f0_percent`, `-pv1`, and `-pv2` parameters belong to the **metaplots** ste

If RiboCode is not needed for your analysis, you can skip it entirely with `--skip_ribocode`.

### Rp-Bp

<details markdown="1">
<summary>Output files</summary>

- `orf_predictions/rpbp/`
- `*.predicted-orfs.bed.gz`: per-sample predicted-ORF BED with Bayes factor scores (column 5) after the final-prediction-set filter (`--select-longest-by-stop --select-best-overlapping`).
- `*.predicted-orfs.dna.fa`: per-sample predicted-ORF nucleotide FASTA matching the BED.
- `*.predicted-orfs.protein.fa`: per-sample predicted-ORF protein FASTA matching the BED.

</details>

Produced only when `--run_rpbp true` is set. Rp-Bp's Bayesian fit is slow (~20-24h per replicate at genome-wide scale); see [Rp-Bp in usage.md](usage.md#rp-bp-opt-in-overnight). Rp-Bp's Bayes factor is stable across replicates and is retained in the cross-caller rank-aggregation set. When `--extended_orf_analysis true` is set, Rp-Bp consumes the hybrid GTF and so reports novel intergenic ORFs alongside annotated ones.

### PRICE

<details markdown="1">
Expand Down
18 changes: 18 additions & 0 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -334,6 +334,22 @@ By default the pipeline calls ORFs with two tools, Ribo-TISH `predict` and RiboC

Ribotricer is available as a third caller but is off by default. Enable it with `--run_ribotricer true` for broader recall, after which an ORF is agreed on a majority vote (2 of 3). It is opt-in because its ORF-score column is unstable across biological replicates even though its binary call set is reproducible. When enabled, its binary calls count toward agreement but its score is excluded from cross-caller rank aggregation, and the pipeline warns at runtime.

### Rp-Bp (opt-in, overnight)

[Rp-Bp](https://github.com/dieterich-lab/rp-bp) (Malone et al., 2017) is a Bayesian-strict ORF caller that complements RiboCode's permissive canonical-CDS calls. It is the recommended second caller when statistical rigour matters more than turnaround time. Activate with `--run_rpbp true`.

> :warning: **Runtime cost.** Rp-Bp's Bayesian MCMC fit dominates wall-clock and takes roughly **20-24 hours per replicate** at genome-wide scale. The pipeline emits a runtime warning when `--run_rpbp` is set. Plan compute time, queue limits and instance lifetimes accordingly.

Rp-Bp's score column (Bayes factor) is stable and is retained in the cross-caller rank-aggregation set alongside RiboCode and Ribo-TISH; Ribotricer's score column is excluded due to known instability but Rp-Bp's is not.

Rp-Bp runs through the upstream `nf-core/rpbp/*` modules driven by the `FASTA_GTF_BAM_RPBP` nf-core subworkflow, which orchestrates `prepare-rpbp-genome`, `extract-metagene-profiles`, `estimate-metagene-profile-bayes-factors`, `select-periodic-offsets`, `get-periodic-lengths-offsets`, `extract-orf-profiles`, `estimate-orf-bayes-factors` and `select-final-prediction-set` from your `--fasta` / `--gtf` inputs without you having to author a YAML config. Tool CLI overrides are exposed via `--extra_rpbp_preparegenome_args` and `--extra_rpbp_predictorfs_args`.

Per-sample final-prediction outputs - filtered BED of predicted ORFs (with Bayes factor in column 5), plus matched nucleotide and protein FASTAs - are published under `<outdir>/orf_predictions/rpbp/`.

**Annotation.** Rp-Bp is given the full multi-isoform `--gtf` annotation, not the one-transcript-per-gene canonical backbone that the pipeline uses elsewhere to disambiguate P-site quantification. Rp-Bp enumerates candidate ORFs across every transcript isoform (deduplicating identical ORFs by genomic coordinate) and then resolves redundant and overlapping ORFs itself - the longest ORF per stop codon, then the highest Bayes factor among overlaps. Collapsing the annotation to one isoform per gene would silently remove ORFs that exist only on non-canonical isoforms (alternative-5'UTR uORFs, isoform-specific N-terminal extensions or truncations, retained-intron and alternative-exon ORFs) and bias the reported ORF types toward canonical CDS, with no compensating benefit; PRICE is handled the same way and for the same reason ([Malone et al., 2017](https://academic.oup.com/nar/article/45/6/2960/2953491)). Under `--extended_orf_analysis true` Rp-Bp instead receives the hybrid GTF, so novel transcripts are within discovery scope in the same way as Ribo-TISH `predict` and Ribotricer.

> :information_source: **STAR alignment params vs upstream rpbp.** rpbp's own pipeline runs STAR with Ribo-seq-tuned settings (`outFilterMismatchNmax 1`, `outFilterMismatchNoverLmax 0.04`, `outFilterType BySJout`, `sjdbOverhang 33`, `winAnchorMultimapNmax 100`, `seedSearchStartLmaxOverLread 0.5`). We use the pipeline's standard STAR alignment (shared with the RNA-seq side of paired runs), which is more permissive. Practical impact: rpbp processes whatever alignments it gets, but periodicity / Bayes-factor distributions will differ from a standalone rpbp run on the same FASTQs. If you need bit-identical-to-standalone-rpbp output, override with `--extra_star_align_args '--outFilterMismatchNmax 1 --outFilterMismatchNoverLmax 0.04 --outFilterType BySJout --winAnchorMultimapNmax 100 --seedSearchStartLmaxOverLread 0.5'`. Note that `sjdbOverhang` is baked into the STAR index and cannot be changed post-hoc - it would require regenerating the index with `--sjdbOverhang 33`, and that change would only be appropriate for a Ribo-seq-only run (RNA-seq reads are too long for that setting). Tracked for future work: [#173](https://github.com/nf-core/riboseq/issues/173).

### PRICE (opt-in)

[PRICE](https://github.com/erhard-lab/gedi/wiki/Price) (Erhard et al., 2018) is a Bayesian ORF caller distributed as part of the [Gedi](https://github.com/erhard-lab/gedi) Java framework. Unlike the per-sample callers, PRICE estimates a shared codon-position model across the riboseq cohort by EM and is invoked one-shot rather than per-sample. Activate with `--run_price true`.
Expand All @@ -342,6 +358,8 @@ Ribotricer is available as a third caller but is off by default. Enable it with

The pipeline builds a binary `.oml` genome index via `gedi -e IndexGenome` once per run, then calls PRICE once across the cohort with the index plus the riboseq BAMs. PRICE's primary output is `${prefix}.orfs.tsv`, a table of all called ORFs with start-codon score, range score, p-value (uncorrected) and per-condition / total read counts. Tool CLI arguments can be appended via `--extra_price_indexgenome_args` and `--extra_price_price_args`.

**Annotation.** Like Rp-Bp, PRICE is given the full multi-isoform `--gtf` annotation rather than the one-transcript-per-gene canonical backbone: it resolves overlapping ORFs and rescues multimappers with its own EM, so restricting it to a single isoform per gene would only narrow ORF discovery and bias ORF-type classification toward canonical CDS.

When `--extended_orf_analysis true` is set, PRICE's IndexGenome receives the hybrid GTF so ORFs on novel intergenic transcripts are within its discovery scope.

PRICE's CLI banner reports `Price version 1.0.4` while the Bioconda package is `gedi 1.0.6a` (Price is one tool inside the Gedi umbrella). The pipeline captures the package version via `gedi -e Version` for `versions.yml`.
Expand Down
45 changes: 45 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,46 @@
"git_sha": "b59f74e059a49fce82f19fbf684e2876da85ee39",
"installed_by": ["modules"]
},
"rpbp/estimatemetagenebayesfactors": {
"branch": "master",
"git_sha": "617c552da369f63371648679983483736e52f3b8",
"installed_by": ["fasta_gtf_bam_rpbp", "modules"]
},
"rpbp/estimateorfbayesfactors": {
"branch": "master",
"git_sha": "bbd642353cae3464d405ab6ae7366532648164e7",
"installed_by": ["fasta_gtf_bam_rpbp", "modules"]
},
"rpbp/extractmetageneprofiles": {
"branch": "master",
"git_sha": "df05573454925ff87ff6ea6c4afaef70c64b7248",
"installed_by": ["fasta_gtf_bam_rpbp", "modules"]
},
"rpbp/extractorfprofiles": {
"branch": "master",
"git_sha": "ad585e77451e4638e876bcf895185f0ac7f85fae",
"installed_by": ["fasta_gtf_bam_rpbp", "modules"]
},
"rpbp/getperiodiclengthsoffsets": {
"branch": "master",
"git_sha": "caba7d6afa153b377581449609109fcd43775cab",
"installed_by": ["fasta_gtf_bam_rpbp", "modules"]
},
"rpbp/preparegenome": {
"branch": "master",
"git_sha": "50b83e9601f411ef670b67b205db78e707a88f01",
"installed_by": ["fasta_gtf_bam_rpbp", "modules"]
},
"rpbp/selectfinalpredictionset": {
"branch": "master",
"git_sha": "c2b3bb2252858a79872a27c7918cdb4b2a43f778",
"installed_by": ["fasta_gtf_bam_rpbp", "modules"]
},
"rpbp/selectperiodicoffsets": {
"branch": "master",
"git_sha": "2f2f5662c0719221637811c6a6e443ec3414e0df",
"installed_by": ["fasta_gtf_bam_rpbp", "modules"]
},
"rsem/preparereference": {
"branch": "master",
"git_sha": "004e773fc35ebd24063ca4cbef057c94a24208aa",
Expand Down Expand Up @@ -377,6 +417,11 @@
"git_sha": "2fc6aef2691483864904e31973ccafd2ed68fd56",
"installed_by": ["subworkflows"]
},
"fasta_gtf_bam_rpbp": {
"branch": "master",
"git_sha": "2111854cad9111c2b8057a6e050e0656d41acf32",
"installed_by": ["subworkflows"]
},
"fastq_align_star": {
"branch": "master",
"git_sha": "cebe21bbd158c15c8fab172e37cfe97a239f4b77",
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

42 changes: 42 additions & 0 deletions modules/nf-core/rpbp/estimatemetagenebayesfactors/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

79 changes: 79 additions & 0 deletions modules/nf-core/rpbp/estimatemetagenebayesfactors/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading