Skip to content

tbpore-modules and tests#9

Open
ycode-sh wants to merge 45 commits into
tbpore-modulesfrom
dev
Open

tbpore-modules and tests#9
ycode-sh wants to merge 45 commits into
tbpore-modulesfrom
dev

Conversation

@ycode-sh

@ycode-sh ycode-sh commented Nov 3, 2024

Copy link
Copy Markdown

Hi @abhi18av and @Mxrcon

  1. I have wrapped tbpore-process and tbpore-cluster as separate modules
  2. Could not do the same with tbpore-download (the download keeps failing
    urllib.error.URLError: <urlopen error [Errno -3] Temporary failure in name resolution. So decontamination db has to be passed using a Channel (fromPath)
  3. Wrote tests for both processes (tbpore-process and tbpore-cluster)
  4. The tests too keep failing (not becaus of the code but because of minimap2 runtime failure)
    Error calling minimap2 -aL2 -x map-ont -t 6 -o .tbpore/test.decontaminated.sam remove_contam.map-ont.mmi .tbpore/test.fq.gz (return code -9) This seems like a memory problem but I doubt it. I ran the test first on a 32GB RAM PC, and for the benefit of doubt on a 128GB Workstation. Spent about 15 hours of my precious time trying to debug to know avail.

@abhi18av

abhi18av commented Nov 6, 2024

Copy link
Copy Markdown
Member

Thanks @ycode-sh , does this mean that we can close #6 without merging?

@abhi18av

abhi18av commented Nov 6, 2024

Copy link
Copy Markdown
Member

Thanks for all your efforts here @ycode-sh , please allow me sometime to go through this PR especially the minimap2 issue 🤔

The tests too keep failing (not becaus of the code but because of minimap2 runtime failure)
Error calling minimap2 -aL2 -x map-ont -t 6 -o .tbpore/test.decontaminated.sam remove_contam.map-ont.mmi .tbpore/test.fq.gz (return code -9) This seems like a memory problem but I doubt it. I ran the test first on a 32GB RAM PC, and for the benefit of doubt on a 128GB Workstation. Spent about 15 hours of my precious time trying to debug to know avail.

@ycode-sh

ycode-sh commented Nov 7, 2024

Copy link
Copy Markdown
Author

Thanks @ycode-sh , does this mean that we can close #6 without merging?

The modules are okay to merge

@ycode-sh

ycode-sh commented Nov 7, 2024

Copy link
Copy Markdown
Author

Thanks for all your efforts here @ycode-sh , please allow me sometime to go through this PR especially the minimap2 issue 🤔

I'm not sure but I think there is a problem with how nextflow and tbpore interacts. minimap2 did not fail when running tbpore without nextflow. return code -9 indicates insufficient memory. I'm wondering how that's even possible since the process was run with a 128GB memory workstation and label 'process_high'

@github-advanced-security

Copy link
Copy Markdown

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.

nf-core-bot and others added 19 commits June 3, 2025 11:02
# Conflicts:
#	main.nf
#	nextflow.config
Bring the validated, refactored MTBseq-nf components into the tbanalyzer
meta-pipeline:

- Sync the 11 param-free tb* modules (B refactor) + environment.yml, and
  add the per-step MTBseq ext.args block to conf/modules.config.
- Add the self-contained MTBSEQ subworkflow (subworkflows/local/mtbseq/main.nf,
  D refactor): QC + serial/parallel MTBseq, emits multiqc_files + versions.
- Rewire workflows/mtbseqnf.nf (MTBSEQ_NF) to call the MTBSEQ subworkflow and
  use the 4.0.2 meta-tuple MULTIQC call.
- quality_check: drop the FASTQC .out.versions emit (4.x emits versions via the
  'versions' topic).
- main.nf NFCORE_TBANALYZER: initialise multiqc_report (Channel.empty) and error
  on unknown --mode; default params.mode = 'mtbseq'.

Verified: -stub-run completes end-to-end in serial and parallel modes
(RENAME -> FASTQC -> TBFULL -> TBJOIN -> TBAMEND -> TBGROUPS -> MULTIQC), with
process.resourceLimits capping the heavy MTBseq labels.
…s groups)

Document and validate the mtbseq mode params in nextflow_schema.json so runs
no longer emit unrecognised-parameter warnings and nf-core schema lint passes
(51 params). Adds the 'mode' selector (enum: mtbseq), execution_modes
(mtbseq_only_qc/mtbseq_parallel) and mtbseq_options (coverage/quality knobs with
0-100 bounds for percentages, plus reference paths).
- nextflow lint (strict): env(USER) -> env('USER') across the 11 tb* modules
  (the bareword was flagged as an undefined variable).
- prettier: nextflow_schema.json + assets/schema_input.json.

(Pre-existing nextflow-lint errors in modules/nf-core/{snpsites,bcftools/view}
are unrelated to this integration and present on dev.)
- conf/test.config: point -profile test at downsampled M. tuberculosis reads
  (nf-core/test-datasets, branch tbanalyzer) in mtbseq mode, with a cohort TSV;
  drop the template viralrecon/R64 placeholders.
- tests/default.nf.test: assert successful completion + presence of key MTBseq
  (Strain_Classification.tab) and MultiQC outputs, instead of a content snapshot
  (MTBseq's BWA/GATK calls are not bit-stable run-to-run).
…zenodo.20665781)

Point the test/test_mtbseq/test_mtbseq_parallel profiles at the published
Zenodo record (samplesheet + cohort + 2 downsampled M. tuberculosis samples)
instead of the (push-restricted) nf-core/test-datasets branch.
… reads

The v1 reads were named <sample>_<library>_R*.fastq.gz, colliding with the name
RENAME_FILES copies to (cp same-file error). Renamed to ENA-accession-based files
(ERR553286_*/ERR551038_*) in a new Zenodo version; samplesheet maps sample/library
to them. Verified: local -profile test stub-run completes end-to-end with the real
Zenodo reads staged.
- nextflow.config: replace the 'if (params.mode==...)' includeConfig with a
  ternary include. The new Nextflow v2 config parser (used by 'nextflow config
  -flat' in nf-core lint) rejects if-statements mixed with config statements.
- .pre-commit-config.yaml: exclude vendored modules/nf-core and
  subworkflows/nf-core from the nextflow-lint hook, matching the existing
  trailing-whitespace/end-of-file-fixer exclusions. Vendored nf-core code is
  linted upstream; the hook was flagging pre-existing false-positives in
  bcftools/view (def index across script/stub) and snpsites (env CONSTANT_SITES).
- .nf-core.yml: ignore .pre-commit-config.yaml in files_unchanged so the
  customised hook config does not trip the template lint check.
…m whitespace hooks

- conf/mtbseq_{base,test,parallel_test}.config: drop trailing blank lines so
  the end-of-file-fixer hook is satisfied.
- .pre-commit-config.yaml: exclude data/mtbseq-references/ from the
  trailing-whitespace/end-of-file-fixer hooks. These are vendored MTBseq
  reference files (resistance lists, gene categories) that must be preserved
  byte-for-byte; the hook was otherwise rewriting their CRLF line endings.
…lback

- modules/local/utils/rename_files.nf -> modules/local/utils/rename_files/
  {main.nf,environment.yml,meta.yml}. nf-core 4.x container-config lint expects
  every included module to be a directory containing meta.yml; the flat-file
  form made the linter try to open 'rename_files.nf/meta.yml' and crash with
  NotADirectoryError. Updated the include in mtbseq/quality_check.nf and added
  a stub block to match the other local modules.
- nextflow.config: use conf/empty.config (a real, empty placeholder) instead of
  /dev/null as the no-op fallback for the mode-conditional includeConfig, so the
  lint's config walk resolves to a regular file.

Verified: -profile test -stub-run completes (9 processes), nextflow config -flat
parses under the strict v2 parser, and the container-config lint no longer crashes.
Extends the test dataset with the three canonical MAGMA pipeline test runs
(SRA SRR26331590, SRR26331595, SRR26331599), each seqtk-downsampled to ~11x
coverage of H37Rv to match the existing MTBseq-nf samples (ENA ERR553286,
ERR551038). The test cohort is now 5 M. tuberculosis samples.

Reads + updated test_samplesheet.csv / test_cohort.tsv published as a new
version of the Zenodo record (concept DOI 10.5281/zenodo.20665780, version
DOI 10.5281/zenodo.20671479). conf/test.config, conf/mtbseq_test.config and
conf/mtbseq_parallel_test.config now reference record 20671479.

Verified: 5-sample -stub-run completes the full pipeline locally, and the
published Zenodo URLs stage correctly via -profile test.
- nextflow_schema.json: correct the mtbseq reference-data default paths to
  data/mtbseq-references/... (were data/references/...).
- .nf-core.yml: ignore the config_defaults check for the five mtbseq reference
  paths — they legitimately use ${projectDir}, whose resolved absolute value
  can never equal a portable schema default; and disable nf_test_content, since
  the default nf-test deliberately asserts success + key-output existence rather
  than content snapshots (the pipeline wraps non-deterministic BWA/GATK tools).
Integrate MTBseq as --mode mtbseq + nf-core template 4.0.2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants