feat: Rp-Bp opt-in ORF caller#186
Merged
Merged
Conversation
Add Rp-Bp (Malone et al. 2017) as an opt-in ORF caller via --run_rpbp (default false). Implemented via the upstream fasta_gtf_bam_rpbp subworkflow (nf-core/modules#11695): six per-tool processes (extract-metagene-profiles, estimate-metagene-profile-bayes-factors, select-periodic-offsets, extract-orf-profiles, estimate-orf-bayes-factors, select-final-prediction-set) plus a shared prepare-genome. Honours --extended_orf_analysis: the upstream subworkflow takes the hybrid GTF when extended-ORF mode is active. New params: --run_rpbp --extra_rpbp_preparegenome_args --extra_rpbp_predictorfs_args Per-sample predicted-ORF BED + DNA + protein FASTA under <outdir>/orf_predictions/rpbp/. Stan MCMC fit takes ~20-24h per replicate at genome-wide scale; pipeline emits a runtime warning when enabled. modules.json currently pins rpbp modules + fasta_gtf_bam_rpbp subworkflow to nf-core/modules#11695 (branch rpbp-add-modules-and-subworkflows, sha c07e919d). Once #11695 merges, run nf-core modules update + nf-core subworkflows update to swap the pins to master.
Member
|
Warning Newer version of the nf-core template is available. Your pipeline is using an old version of the nf-core template: 3.5.1. For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation. |
10 tasks
Reconcile the Rp-Bp slice onto the post-#185 dev tree: relocate Rp-Bp from the pre-refactor inline workflow form into orf_caller_dispatch (gated on --run_rpbp), take the upstream rpbp modules and fasta_gtf_bam_rpbp subworkflow at their merged-to-master pins, and land the --run_rpbp wiring, config, schema, docs and changelog entry. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Rp-Bp builds its candidate ORF set per transcript isoform and deduplicates by genomic coordinates, then resolves overlaps itself (longest-per-stop, best Bayes factor). Feeding it the one-transcript-per-gene canonical backbone silently dropped isoform-specific ORFs (alt-5'UTR uORFs, N-terminal extensions/truncations, retained-intron/alt-exon ORFs) and biased ORF-type labelling toward canonical CDS, with no compensating benefit. Route it to the full ch_gtf (default) like PRICE; extended mode keeps the hybrid GTF for novel transcripts. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…annotation Document, in the Rp-Bp and PRICE usage sections, that both callers receive the full --gtf rather than the canonical one-transcript-per-gene backbone: each enumerates/handles ORFs across all isoforms and resolves overlaps itself, so collapsing to canonical would drop isoform-specific ORFs and bias ORF-type labels. Cites Malone et al. 2017. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
suhrig
approved these changes
Jun 25, 2026
Member
Author
|
Thanks @suhrig ! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds Rp-Bp (Malone et al. 2017) as an opt-in ORF caller via
--run_rpbp(defaultfalse), wired into theorf_caller_dispatchsubworkflow through the upstreamfasta_gtf_bam_rpbpnf-core subworkflow (8 modules: seven per-tool steps plus a sharedprepare-genome). The pipeline's standard STAR alignment is reused rather than re-running Rp-Bp's own alignment, and each step caches independently on resume.Changes
rpbp/*modules (8) + thefasta_gtf_bam_rpbpsubworkflow.--run_rpbp,--extra_rpbp_preparegenome_args,--extra_rpbp_predictorfs_args.orf_caller_dispatchgains a--run_rpbp-gated Rp-Bp block;rpbpjoins the runtime-enabled caller set and (unlike Ribotricer) its Bayes-factor score is retained in cross-caller rank aggregation.--extended_orf_analysis(hybrid GTF when active).conf/modules.config: time budget on the Bayesian-fit modules, final-prediction-set filtering args, publish to<outdir>/orf_predictions/rpbp/.Annotation: full multi-isoform GTF
Rp-Bp is given the full
--gtfannotation, not the one-transcript-per-gene canonical backbone (mirroring PRICE). Rp-Bp enumerates candidate ORFs across every transcript isoform and resolves redundant/overlapping ORFs itself (longest-per-stop, best Bayes factor), so collapsing to canonical would silently drop ORFs that exist only on non-canonical isoforms (alternative-5'UTR uORFs, isoform-specific N-terminal extensions/truncations, retained-intron and alternative-exon ORFs) and bias the reported ORF types toward canonical CDS, with no compensating benefit. Rationale and citations are in the Rp-Bp section ofdocs/usage.md(Malone et al. 2017, NAR). Under--extended_orf_analysisRp-Bp instead takes the hybrid GTF so novel transcripts are within discovery scope.Caveats
src/rpbp/defaults.py): most notably STAR-default mismatch filtering (outFilterMismatchNmax 10/outFilterMismatchNoverLmax 0.3, vs rp-bp's1/0.04), and the absence ofoutFilterType BySJout,outFilterIntronMotifs RemoveNoncanonicalUnannotated,seedSearchStartLmaxOverLread 0.5andwinAnchorMultimapNmax 100. This affects every genome-BAM ORF caller, not just Rp-Bp. We will address this in #173 by applying Ribo-seq-tuned STAR parameters per sample type (tightened for Ribo-seq, unchanged for RNA-seq); in the meantime--extra_star_align_argslets you opt in for Ribo-seq-only runs. Rp-Bp processes whatever alignments it is given, so this shifts which footprints are counted rather than being a correctness issue. Seedocs/usage.mdfor the full parameter list and override hint.Part of the #174 modernisation stack.
Closes #169
🤖 Generated with Claude Code