Skip to content

feat: Rp-Bp opt-in ORF caller#186

Merged
pinin4fjords merged 9 commits into
devfrom
feat/169-rpbp
Jun 25, 2026
Merged

feat: Rp-Bp opt-in ORF caller#186
pinin4fjords merged 9 commits into
devfrom
feat/169-rpbp

Conversation

@pinin4fjords

@pinin4fjords pinin4fjords commented May 22, 2026

Copy link
Copy Markdown
Member

Summary

Adds Rp-Bp (Malone et al. 2017) as an opt-in ORF caller via --run_rpbp (default false), wired into the orf_caller_dispatch subworkflow through the upstream fasta_gtf_bam_rpbp nf-core subworkflow (8 modules: seven per-tool steps plus a shared prepare-genome). The pipeline's standard STAR alignment is reused rather than re-running Rp-Bp's own alignment, and each step caches independently on resume.

Changes

  • Install the nf-core rpbp/* modules (8) + the fasta_gtf_bam_rpbp subworkflow.
  • New params: --run_rpbp, --extra_rpbp_preparegenome_args, --extra_rpbp_predictorfs_args.
  • orf_caller_dispatch gains a --run_rpbp-gated Rp-Bp block; rpbp joins the runtime-enabled caller set and (unlike Ribotricer) its Bayes-factor score is retained in cross-caller rank aggregation.
  • Honours --extended_orf_analysis (hybrid GTF when active).
  • conf/modules.config: time budget on the Bayesian-fit modules, final-prediction-set filtering args, publish to <outdir>/orf_predictions/rpbp/.
  • Docs (usage + output) and CHANGELOG.

Annotation: full multi-isoform GTF

Rp-Bp is given the full --gtf annotation, not the one-transcript-per-gene canonical backbone (mirroring PRICE). Rp-Bp enumerates candidate ORFs across every transcript isoform and resolves redundant/overlapping ORFs itself (longest-per-stop, best Bayes factor), so collapsing to canonical would silently drop ORFs that exist only on non-canonical isoforms (alternative-5'UTR uORFs, isoform-specific N-terminal extensions/truncations, retained-intron and alternative-exon ORFs) and bias the reported ORF types toward canonical CDS, with no compensating benefit. Rationale and citations are in the Rp-Bp section of docs/usage.md (Malone et al. 2017, NAR). Under --extended_orf_analysis Rp-Bp instead takes the hybrid GTF so novel transcripts are within discovery scope.

Caveats

  • Rp-Bp's Stan MCMC fit takes ~20-24h per replicate at genome-wide scale; the pipeline emits a runtime warning and sets a 30h-per-attempt time budget on the Bayes-factor steps.
  • Ribo-seq footprints are aligned with the pipeline's general-purpose STAR settings (shared with the RNA-seq side of paired runs), which are looser for short reads than rp-bp's own defaults (src/rpbp/defaults.py): most notably STAR-default mismatch filtering (outFilterMismatchNmax 10 / outFilterMismatchNoverLmax 0.3, vs rp-bp's 1 / 0.04), and the absence of outFilterType BySJout, outFilterIntronMotifs RemoveNoncanonicalUnannotated, seedSearchStartLmaxOverLread 0.5 and winAnchorMultimapNmax 100. This affects every genome-BAM ORF caller, not just Rp-Bp. We will address this in #173 by applying Ribo-seq-tuned STAR parameters per sample type (tightened for Ribo-seq, unchanged for RNA-seq); in the meantime --extra_star_align_args lets you opt in for Ribo-seq-only runs. Rp-Bp processes whatever alignments it is given, so this shifts which footprints are counted rather than being a correctness issue. See docs/usage.md for the full parameter list and override hint.

Part of the #174 modernisation stack.

Closes #169

🤖 Generated with Claude Code

Add Rp-Bp (Malone et al. 2017) as an opt-in ORF caller via
--run_rpbp (default false). Implemented via the upstream
fasta_gtf_bam_rpbp subworkflow (nf-core/modules#11695):
six per-tool processes (extract-metagene-profiles,
estimate-metagene-profile-bayes-factors, select-periodic-offsets,
extract-orf-profiles, estimate-orf-bayes-factors,
select-final-prediction-set) plus a shared prepare-genome.

Honours --extended_orf_analysis: the upstream subworkflow takes the
hybrid GTF when extended-ORF mode is active.

New params:
  --run_rpbp
  --extra_rpbp_preparegenome_args
  --extra_rpbp_predictorfs_args

Per-sample predicted-ORF BED + DNA + protein FASTA under
<outdir>/orf_predictions/rpbp/. Stan MCMC fit takes ~20-24h per
replicate at genome-wide scale; pipeline emits a runtime warning
when enabled.

modules.json currently pins rpbp modules + fasta_gtf_bam_rpbp
subworkflow to nf-core/modules#11695 (branch
rpbp-add-modules-and-subworkflows, sha c07e919d). Once #11695 merges,
run nf-core modules update + nf-core subworkflows update to swap the
pins to master.
@nf-core-bot

Copy link
Copy Markdown
Member

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.5.1.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

Base automatically changed from feat/170-price to dev June 25, 2026 08:48
pinin4fjords and others added 3 commits June 25, 2026 10:21
Reconcile the Rp-Bp slice onto the post-#185 dev tree: relocate Rp-Bp from
the pre-refactor inline workflow form into orf_caller_dispatch (gated on
--run_rpbp), take the upstream rpbp modules and fasta_gtf_bam_rpbp
subworkflow at their merged-to-master pins, and land the --run_rpbp
wiring, config, schema, docs and changelog entry.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Rp-Bp builds its candidate ORF set per transcript isoform and deduplicates
by genomic coordinates, then resolves overlaps itself (longest-per-stop,
best Bayes factor). Feeding it the one-transcript-per-gene canonical backbone
silently dropped isoform-specific ORFs (alt-5'UTR uORFs, N-terminal
extensions/truncations, retained-intron/alt-exon ORFs) and biased ORF-type
labelling toward canonical CDS, with no compensating benefit. Route it to the
full ch_gtf (default) like PRICE; extended mode keeps the hybrid GTF for novel
transcripts.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…annotation

Document, in the Rp-Bp and PRICE usage sections, that both callers receive the
full --gtf rather than the canonical one-transcript-per-gene backbone: each
enumerates/handles ORFs across all isoforms and resolves overlaps itself, so
collapsing to canonical would drop isoform-specific ORFs and bias ORF-type
labels. Cites Malone et al. 2017.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 05bd7a9

+| ✅ 283 tests passed       |+
#| ❔   6 tests were ignored |#
!| ❗   5 tests had warnings |!
Details

❗ Test warnings:

  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_if_empty_null - ifEmpty(null) found in /home/runner/work/riboseq/riboseq/subworkflows/local/prepare_genome/main.nf: _ versions = ch_versions.ifEmpty(null) // channel: [ versions.yml ]
    _
  • schema_lint - Input mimetype is missing or empty

❔ Tests ignored:

  • nextflow_config - Config default ignored: params.ribo_database_manifest
  • nf_test_content - nf_test_content
  • files_unchanged - File ignored due to lint config: assets/nf-core-riboseq_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-riboseq_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-riboseq_logo_dark.png
  • files_unchanged - File ignored due to lint config: .gitignore or .prettierignore

✅ Tests passed:

Run details

  • nf-core/tools version 3.5.1
  • Run at 2026-06-25 10:05:55

@pinin4fjords pinin4fjords merged commit b7428c9 into dev Jun 25, 2026
35 checks passed
@pinin4fjords pinin4fjords deleted the feat/169-rpbp branch June 25, 2026 12:47
@pinin4fjords

Copy link
Copy Markdown
Member Author

Thanks @suhrig !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants