feat: cross-sample ORF catalogue#187
Merged
Merged
Conversation
Gather per-sample, per-caller ORF predictions (Ribo-TISH, RiboCode,
Ribotricer, Rp-Bp, PRICE), normalise each to a unified BED12 + sidecar
TSV, then merge into a cohort-level catalogue with a class-aware
strategy (transcript-ID grouping for annotated multi-exon CDS, 80%
reciprocal overlap for single-exon novel intergenic and smORFs).
Emits orf_catalogue.{bed12,tsv}, orf_to_gene.tsv, and an AA FASTA
under <outdir>/orf_catalogue/, plus a MultiQC custom-content per-class
count table.
Implementation uses the upstream orftable_fasta_gtf_buildorfcatalogue
subworkflow (nf-core/modules#11740): CUSTOM_ORFNORMALISE per caller,
CUSTOM_ORFMERGE for cohort-level merge, BEDTOOLS_GETFASTA +
SEQKIT_TRANSLATE to produce the catalogue AA FASTA.
Per-caller prediction channels (ch_*_predictions) default to
Channel.empty() and are overridden inside each caller's if-block,
gating the catalogue invocation on extended_orf_active +
at-least-one-caller.
modules.json currently pins custom/orfnormalise, custom/orfmerge,
and the orftable_fasta_gtf_buildorfcatalogue subworkflow to
nf-core/modules#11740 (branch custom-orf-catalogue, sha 6597190c).
Once #11740 merges, run nf-core modules update / subworkflows update
to swap pins to master.
Member
|
Warning Newer version of the nf-core template is available. Your pipeline is using an old version of the nf-core template: 3.5.1. For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation. |
Rebuild the cross-sample ORF catalogue slice on the reconciled dev base (ORF_CALLER_DISPATCH structure). The catalogue's per-caller prediction channels now come from ORF_CALLER_DISPATCH.out.* rather than the inline caller blocks of the original branch base. Catalogue components taken from the upstream-pinned form: custom/orfnormalise, custom/orfmerge, custom/orfcollapse, mmseqs/easycluster, bedtools/getfasta, seqkit/translate and the orftable_fasta_gtf_buildorfcatalogue subworkflow, all pinned to nf-core/modules master. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
custom/orfmerge gains --min-callers / --min-samples (default 1, no
filtering) and emits an additional consensus catalogue view
(*.consensus.{bed12,tsv,orf_to_gene.tsv}) restricted to ORFs supported by
at least that many distinct callers and recurring in at least that many
samples. The full unfiltered catalogue is still published as before; the
consensus view is published under <outdir>/orf_catalogue/consensus/.
Surfaced via --orf_min_callers / --orf_min_samples; 2+ gives a consensus
catalogue that tames downstream ORF-level multiple testing.
The peptide-level smORF collapse is now behind --skip_orf_collapse
(default off, so collapse runs as before), wired to the catalogue
subworkflow's val_collapse argument.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The smORF-only restriction and locus-agnostic amino-acid clustering are this pipeline's design choices, not properties of the GENCODE Ribo-seq ORF consolidation. MMseqs2 global identity (--min-seq-id 0.9) approximates rather than reproduces GENCODE's longest-shared-substring / P-site-overlap collapse_cutoff 0.9. Docstring reworded to state both departures explicitly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a --min-samples 2 case asserting the consensus catalogue is a non-empty strict subset of the full catalogue and every retained ORF meets the recurrence threshold, while the full catalogue is unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
novel_gtf and stringtie_extended now produce the cross-sample ORF catalogue (orf_catalogue/, including the consensus/ view and the normalised/ per-caller inputs) under --extended_orf_analysis true. Regenerated on x86_64 (NXF 25.04.8, nf-test 0.9.3, --profile=+docker). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…onsensus-view code [skip ci] Match the vendored orfmerge template (two-pass catalogue write), its regenerated module snapshot, and the orftable subworkflow (consensus emits + regenerated snapshot) to the upstream nf-core/modules implementation, so the eventual re-pin is a modules.json sha bump with no file changes. Output is byte-identical, so the pipeline snapshots are unaffected. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…th upstream [skip ci] Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sync custom/orfcollapse + the orftable subworkflow to the upstream consensus-after-collapse implementation, and route the published orf_catalogue/consensus/ from CUSTOM_ORFCOLLAPSE (the de-redundified catalogue) when collapse runs, falling back to CUSTOM_ORFMERGE when --skip_orf_collapse is set. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…s #12167 merge custom/orfmerge, custom/orfcollapse and the orftable_fasta_gtf_buildorfcatalogue subworkflow now match their pinned sha (76e959312e), clearing the module_changes divergence carried while #12167 was in review. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
suhrig
approved these changes
Jun 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Builds a cohort-level cross-caller ORF catalogue under$\leq$ 100 aa), recording cross-caller (
--extended_orf_analysis true. Each enabled caller's per-sample output (Ribo-TISH, RiboCode, Ribotricer, Rp-Bp, PRICE) is normalised to a unified BED12 + sidecar TSV, then merged class-aware (transcript-ID grouping for annotated multi-exon CDS; 80% reciprocal overlap for single-exon novel intergenic and smORFscalled_by_*/score_*) and cross-sample (n_samples) evidence per ORF.smORFs are then peptide-level deduplicated with MMseqs2 (
--min-seq-id 0.9 -c 0.8), folding micropeptides encoded at multiple loci down to one representative (GENCODE Ribo-seq ORF catalogue convention, Mudge et al. 2022); opt out with--skip_orf_collapse.Outputs land under
<outdir>/orf_catalogue/:*.catalogue.{bed12,tsv,orf_to_gene.tsv,fasta}plus a MultiQC custom-content per-class count table.Consensus view
A consensus view is published under
<outdir>/orf_catalogue/consensus/, filtered to ORFs supported by at least--orf_min_callersdistinct callers and recurring in at least--orf_min_samplessamples (both default1, i.e. no filtering, so it equals the full catalogue out of the box). The filter runs after the smORF collapse, so it is the high-confidence subset of the de-redundified catalogue and a micropeptide folded across loci is judged on its combined cross-caller / cross-sample evidence. The full unfiltered catalogue is always published regardless; raising either threshold (e.g.--orf_min_callers 2) gives a consensus catalogue that tames downstream ORF-level multiple testing.Components
Built from upstream nf-core/modules components (all pinned to
master): theorftable_fasta_gtf_buildorfcataloguesubworkflow pluscustom/orfnormalise,custom/orfmerge,custom/orfcollapse,mmseqs/easycluster,bedtools/getfastaandseqkit/translate.The catalogue runs once per pipeline invocation, gated on
--extended_orf_analysis trueand a non-empty enabled-caller set; the default-off path is unchanged.Closes #167
🤖 Generated with Claude Code