Add MAGMA as --mode magma (pinned to v0.3.0; version-isolated; experimental)#18
Draft
abhi18av wants to merge 8 commits into
Draft
Add MAGMA as --mode magma (pinned to v0.3.0; version-isolated; experimental)#18abhi18av wants to merge 8 commits into
abhi18av wants to merge 8 commits into
Conversation
Integrates the TORCH-consortium/magma nf-core port (mycobactopia-org/MAGMA
branch nf-core-tbanalyzer) as the second analysis mode, following the
orchestration-layer principle: each bundled pipeline keeps its OWN modules and
tool versions; tbanalyzer only facilitates execution.
- Vendored MAGMA's complete module set version-isolated under
modules/local/magma/ (17 custom) + modules/local/magma/nf-core/ (24 nf-core
module copies at MAGMA's pins) so they never collide with tbanalyzer's
existing modules/nf-core/{gatk4,samtools,bcftools,fastqc,...}.
- subworkflows/local/magma/ (7), workflows/magmanf.nf (entry workflow MAGMA),
bin/ (15 MAGMA python scripts), data/magma-references/ (~28MB bundled H37Rv +
VQSR truth VCFs + GVCF panel + SnpEff DB).
- Include paths rewritten to the magma-scoped layout; pipeline-boilerplate util
repointed to tbanalyzer's utils_nfcore_tbanalyzer_pipeline; nf-core utils
subworkflow shared.
- conf/magma_base.config (61 magma_ param defaults, refs -> data/magma-references)
and conf/magma_modules.config (per-process ext.args/when/publishDir), both
mode-gated in nextflow.config. test_magma profile + conf/magma_test.config
(3 downsampled Zenodo samples). main.nf dispatches --mode magma -> MAGMA.
- nextflow_schema.json: magma_options group (61 params) + mode enum adds 'magma'.
Config parses under strict v2 for both modes. NOTE: PIPELINE_INITIALISATION
still validates --input against the mtbseq-oriented schema_input.json; MAGMA's
native 9-column samplesheet needs the samplesheet-handling reconciled next.
Per the mode-permissive samplesheet decision: replace the single required [sample, fastq_1] with an anyOf so the shared input schema validates BOTH the mtbseq lowercase shape (sample+fastq_1) and MAGMA's native capitalized shape (Sample+R1). MAGMA's extra columns stay 'additional' (not added to the parsed tuple, so mtbseq's [meta,fastq_1,fastq_2] destructure is unaffected); MAGMA's own samplesheet_validation.py does the real per-row validation from params.input.
…modules) The mode-gated includeConfig in nextflow.config evaluates params.mode at parse time (default 'mtbseq'), so -profile test_magma alone didn't trigger the magma base/modules includes. conf/magma_test.config now includes them explicitly, so the profile works without also passing --mode magma on the CLI. Verified via -stub-run -profile test_magma: config loads, the MAGMA-native samplesheet passes the anyOf schema_input.json validation, and the full MAGMA DAG compiles/schedules. Execution then hits a PRE-EXISTING upstream port bug (GATK_MARK_DUPLICATES_DELLY call/signature mismatch, magmanf.nf:410 — identical in mycobactopia-org/MAGMA feat/plan1-nfcore-modules) from the incomplete nf-core module migration. Per the don't-modify-pipeline-code principle, that's to be fixed upstream and re-vendored, not patched here.
Re-syncs the 3 fixes made on mycobactopia-org/MAGMA@nf-core-tbanalyzer: - magmanf.nf: GATK_MARK_DUPLICATES_DELLY uses SAMTOOLS_MERGE_DELLY.out.bam (was .out → expanded to multiple args → 'declares 3 inputs but called with N'). - tbprofiler/collate + utils/eliminate_annotation: def prefix/annotationPrefix added to stub blocks. -stub-run -profile test_magma now compiles the full MAGMA DAG and clears the GATK error; it then stops at a splitJson() on the SAMPLESHEET_VALIDATION stub's empty output — a stub-mode limitation (the workflow parses real JSON emitted by the validator at runtime), so biological validation is via the Docker run.
|
Replaces the earlier vendoring (from the divergent feat/plan1-nfcore-modules tip) with the SciVer-validated v0.3.0 release tag, per the inclusion recommendation. Source: mycobactopia-org/MAGMA branch tbanalyzer-v0.3.0 (v0.3.0 + magma_ namespacing + stub-def fixes). - v0.3.0 splits the workflow into 16 subworkflows (adds call_wf/map_wf/merge_wf/ quality_check_wf/reports_wf/structural_variants_analysis_wf/validate_fastqs_wf); re-vendored under subworkflows/local/magma with sibling ./ includes preserved. - modules/local/magma (36) + modules/local/magma/nf-core (23 version-isolated copies); conf/magma_base.config + conf/magma_modules.config (96 process scopes) + magma_options schema regenerated from v0.3.0; data/magma-references re-synced. - The GATK_MARK_DUPLICATES_DELLY bug I had to hand-fix on feat/plan1 is ABSENT in v0.3.0 (correctly structured there); deferred trio (variantrecalibrator, snpsites, select_variants PHYLOGENY) intentionally local — hermetic, no impact. Verified: both modes parse (strict v2); full MAGMA DAG compiles & schedules under -stub-run (stops only at splitJson on stub output — stub-mode limitation; biological validation via Docker).
Member
Author
|
Re-based the vendoring onto the pinned MAGMA v0.3.0 tag (
Both modes parse (strict v2); full MAGMA DAG compiles under |
MAGMA's local modules (python utils + tool wrappers) carry no container directive — they rely on per-process-name container assignments that lived in MAGMA's conf/abc_cluster.config (not vendored). On Nomad/Docker this surfaced as '[NOMAD] Missing container image for process ...:MAGMA:SAMPLESHEET_VALIDATION'. Adds conf/magma_containers.config (mode-gated + loaded by test_magma) mirroring MAGMA v0.3.0's validated container layout: magma-container-1 (python utils, GATK/LOFREQ/FASTQC/MULTIQC/ISMAPPER), magma-container-2 (BWA/SAMTOOLS/IQTREE/ BCFTOOLS/BGZIP/SNPEFF/SNPSITES/CLUSTERPICKER/SNPDISTS), per-tool biocontainers (SPOTYPING/RDANALYZER/TB-/NTM-PROFILER), and the functional DELLY/BCFTOOLS_VIEW overrides. Dropped MAGMA's errorStrategy=ignore (so failures surface) and the seedling-specific gcp-spot nomadOptions.
.gitignore excludes data/, so the 30 data/magma-references files (H37Rv genome + BWA indices, VQSR truth VCFs, GVCF panel, SnpEff DB, region lists) were never committed — the cluster clone lacked them and BWA_MEM_DELLY failed with 'Can't stage file NC-000962-3-H37Rv.fa -- file does not exist'. Force-added (git add -f), mirroring how data/mtbseq-references is tracked.
NTMPROFILER_PROFILE failed on the downsampled (~11x) test samples on the cluster. It's an optional contamination QC profiler whose output only feeds NTMPROFILER_COLLATE -> reports (not the variant/cohort/phylogeny path), so it's safe to skip for the minimal test — consistent with the already-skipped spotyping/rdanalyzer/ismapper/tbprofiler_fastq profilers. Full NTMprofiler coverage remains in MAGMA's own SciVer validation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add MAGMA as the second analysis mode (
--mode magma)Integrates the TORCH-consortium/magma nf-core port (from mycobactopia-org/MAGMA, branch
nf-core-tbanalyzer) into the tbanalyzer meta-pipeline as the second bundled pipeline, after MTBseq (--mode mtbseq, #17).Warning
MAGMA mode is experimental / not yet biologically validated. The full Docker run (GATK/VQSR/phylogeny) is deferred to a follow-up.
--mode mtbseqis unaffected. Draft until the MAGMA Docker test is green.Design principle: version-isolated, orchestration-only
Each bundled pipeline may need different versions of the same tools, so tbanalyzer does not dedup/share modules and does not modify the pipelines' logic — it only facilitates execution. MAGMA's complete module set is vendored under a magma-scoped path with its own container pins:
modules/local/magma/(17 custom modules) +modules/local/magma/nf-core/(24 nf-core module copies at MAGMA's own versions — not sharing tbanalyzer'smodules/nf-core/{gatk4,samtools,bcftools,fastqc,…})subworkflows/local/magma/(7),workflows/magmanf.nf(entryworkflow MAGMA),bin/(15 scripts),data/magma-references/(~28 MB H37Rv + VQSR truth VCFs + GVCF panel + SnpEff DB)Wiring
main.nfdispatches--mode magma→MAGMA; mode enum +magma_optionsschema group (61 params, allmagma_-namespaced; tool*_pathparams dropped)conf/magma_base.config+conf/magma_modules.config;test_magmaprofile +conf/magma_test.config(3 downsampled M. tuberculosis samples on Zenodo, doi 10.5281/zenodo.20671479)assets/schema_input.jsonusesanyOfso the shared input schema accepts both the mtbseq (sample,fastq_1) and MAGMA-native (Sample,R1) samplesheet shapes; MAGMA self-validates its own CSVUpstream fixes (made on mycobactopia-org/MAGMA, then re-vendored)
Completed gaps in the in-progress nf-core-module migration:
GATK_MARK_DUPLICATES_DELLYcall (.out→.out.bam), anddef prefix/def annotationPrefixin two stub blocks. Portnextflow lintis clean; the MAGMA DAG now compiles.Validation
-stub-runcan't fully exercise MAGMA (its workflow doessplitJson()on validator output, which stubs leave empty — a stub-mode limitation). Biological validation = the deferred Docker run.dev → master is the intended release flow.