tbpore-modules and tests#9
Conversation
Add modules for magma
…he new nf-core template
Add mtbseq workflow
|
Thanks for all your efforts here @ycode-sh , please allow me sometime to go through this PR especially the
|
I'm not sure but I think there is a problem with how nextflow and tbpore interacts. |
|
This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation. |
# Conflicts: # main.nf # nextflow.config
Bring the validated, refactored MTBseq-nf components into the tbanalyzer meta-pipeline: - Sync the 11 param-free tb* modules (B refactor) + environment.yml, and add the per-step MTBseq ext.args block to conf/modules.config. - Add the self-contained MTBSEQ subworkflow (subworkflows/local/mtbseq/main.nf, D refactor): QC + serial/parallel MTBseq, emits multiqc_files + versions. - Rewire workflows/mtbseqnf.nf (MTBSEQ_NF) to call the MTBSEQ subworkflow and use the 4.0.2 meta-tuple MULTIQC call. - quality_check: drop the FASTQC .out.versions emit (4.x emits versions via the 'versions' topic). - main.nf NFCORE_TBANALYZER: initialise multiqc_report (Channel.empty) and error on unknown --mode; default params.mode = 'mtbseq'. Verified: -stub-run completes end-to-end in serial and parallel modes (RENAME -> FASTQC -> TBFULL -> TBJOIN -> TBAMEND -> TBGROUPS -> MULTIQC), with process.resourceLimits capping the heavy MTBseq labels.
…s groups) Document and validate the mtbseq mode params in nextflow_schema.json so runs no longer emit unrecognised-parameter warnings and nf-core schema lint passes (51 params). Adds the 'mode' selector (enum: mtbseq), execution_modes (mtbseq_only_qc/mtbseq_parallel) and mtbseq_options (coverage/quality knobs with 0-100 bounds for percentages, plus reference paths).
- nextflow lint (strict): env(USER) -> env('USER') across the 11 tb* modules
(the bareword was flagged as an undefined variable).
- prettier: nextflow_schema.json + assets/schema_input.json.
(Pre-existing nextflow-lint errors in modules/nf-core/{snpsites,bcftools/view}
are unrelated to this integration and present on dev.)
- conf/test.config: point -profile test at downsampled M. tuberculosis reads (nf-core/test-datasets, branch tbanalyzer) in mtbseq mode, with a cohort TSV; drop the template viralrecon/R64 placeholders. - tests/default.nf.test: assert successful completion + presence of key MTBseq (Strain_Classification.tab) and MultiQC outputs, instead of a content snapshot (MTBseq's BWA/GATK calls are not bit-stable run-to-run).
…zenodo.20665781) Point the test/test_mtbseq/test_mtbseq_parallel profiles at the published Zenodo record (samplesheet + cohort + 2 downsampled M. tuberculosis samples) instead of the (push-restricted) nf-core/test-datasets branch.
… reads The v1 reads were named <sample>_<library>_R*.fastq.gz, colliding with the name RENAME_FILES copies to (cp same-file error). Renamed to ENA-accession-based files (ERR553286_*/ERR551038_*) in a new Zenodo version; samplesheet maps sample/library to them. Verified: local -profile test stub-run completes end-to-end with the real Zenodo reads staged.
- nextflow.config: replace the 'if (params.mode==...)' includeConfig with a ternary include. The new Nextflow v2 config parser (used by 'nextflow config -flat' in nf-core lint) rejects if-statements mixed with config statements. - .pre-commit-config.yaml: exclude vendored modules/nf-core and subworkflows/nf-core from the nextflow-lint hook, matching the existing trailing-whitespace/end-of-file-fixer exclusions. Vendored nf-core code is linted upstream; the hook was flagging pre-existing false-positives in bcftools/view (def index across script/stub) and snpsites (env CONSTANT_SITES). - .nf-core.yml: ignore .pre-commit-config.yaml in files_unchanged so the customised hook config does not trip the template lint check.
…m whitespace hooks
- conf/mtbseq_{base,test,parallel_test}.config: drop trailing blank lines so
the end-of-file-fixer hook is satisfied.
- .pre-commit-config.yaml: exclude data/mtbseq-references/ from the
trailing-whitespace/end-of-file-fixer hooks. These are vendored MTBseq
reference files (resistance lists, gene categories) that must be preserved
byte-for-byte; the hook was otherwise rewriting their CRLF line endings.
…lback
- modules/local/utils/rename_files.nf -> modules/local/utils/rename_files/
{main.nf,environment.yml,meta.yml}. nf-core 4.x container-config lint expects
every included module to be a directory containing meta.yml; the flat-file
form made the linter try to open 'rename_files.nf/meta.yml' and crash with
NotADirectoryError. Updated the include in mtbseq/quality_check.nf and added
a stub block to match the other local modules.
- nextflow.config: use conf/empty.config (a real, empty placeholder) instead of
/dev/null as the no-op fallback for the mode-conditional includeConfig, so the
lint's config walk resolves to a regular file.
Verified: -profile test -stub-run completes (9 processes), nextflow config -flat
parses under the strict v2 parser, and the container-config lint no longer crashes.
Extends the test dataset with the three canonical MAGMA pipeline test runs (SRA SRR26331590, SRR26331595, SRR26331599), each seqtk-downsampled to ~11x coverage of H37Rv to match the existing MTBseq-nf samples (ENA ERR553286, ERR551038). The test cohort is now 5 M. tuberculosis samples. Reads + updated test_samplesheet.csv / test_cohort.tsv published as a new version of the Zenodo record (concept DOI 10.5281/zenodo.20665780, version DOI 10.5281/zenodo.20671479). conf/test.config, conf/mtbseq_test.config and conf/mtbseq_parallel_test.config now reference record 20671479. Verified: 5-sample -stub-run completes the full pipeline locally, and the published Zenodo URLs stage correctly via -profile test.
- nextflow_schema.json: correct the mtbseq reference-data default paths to
data/mtbseq-references/... (were data/references/...).
- .nf-core.yml: ignore the config_defaults check for the five mtbseq reference
paths — they legitimately use ${projectDir}, whose resolved absolute value
can never equal a portable schema default; and disable nf_test_content, since
the default nf-test deliberately asserts success + key-output existence rather
than content snapshots (the pipeline wraps non-deterministic BWA/GATK tools).
Integrate MTBseq as --mode mtbseq + nf-core template 4.0.2
Hi @abhi18av and @Mxrcon
urllib.error.URLError: <urlopen error [Errno -3] Temporary failure in name resolution. So decontamination db has to be passed using a Channel (fromPath)Error calling minimap2 -aL2 -x map-ont -t 6 -o .tbpore/test.decontaminated.sam remove_contam.map-ont.mmi .tbpore/test.fq.gz (return code -9)This seems like a memory problem but I doubt it. I ran the test first on a 32GB RAM PC, and for the benefit of doubt on a 128GB Workstation. Spent about 15 hours of my precious time trying to debug to know avail.