Bug fix consensus#165
Open
robert-a-forsyth wants to merge 14 commits into
Open
Conversation
Supports bcftools view -T <targets> to filter a VCF to only positions present in a targets VCF file. Used to strip germline variants from the phased somatic+germline VCF before emitting phased_somatic_vcf. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
After LONGPHASE_PHASE_SOMATIC, use BCFTOOLS_VIEW -T to filter the combined phased somatic+germline VCF against the original somatic VCF positions. The emitted phased_somatic_vcf (used by SOMATIC_VEP) now contains only somatic variants. Phase tags (PS/HP) are preserved. This prevents germline variants from being double-annotated by both GERMLINE_VEP and SOMATIC_VEP. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Scoped to the PHASING_HAPLOTYPING subworkflow to avoid matching any future BCFTOOLS_VIEW calls. Output is intermediate (publishDir disabled). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…PHASE_SOMATIC LONGPHASE_PHASE_SOMATIC now produces an intermediate combined somatic+germline VCF (renamed to somatic_smallvariants_combined, publish disabled). BCFTOOLS_VIEW produces the final somatic-only phased VCF published as somatic_smallvariants to variants/phased/, matching the previous output name and location. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- clair_only: rewritten to use test_sheet_2.csv (5 samples). Adds assertions for sample4 replicate merging (rep1/rep2 QC dirs, single merged output), sample5 custom ONT model override (VCF header check), phased VCF existence/size, BAM files, QC dirs, and VEP outputs. - deep_only: enhanced with phased VCF size assertions and loop-based pattern consistent with other tests. - consensus: new test — both callers, germline_var_combine='consensus', somatic_var_combine='consensus'. Asserts both caller raw outputs and phased consensus VCFs exist with data. - union: new test — both callers, germline_var_combine='all', somatic_var_combine='all'. Explicit test for the union combine path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…n PRs Test tagging: - default.nf.test: tag "small" — runs on every PR - clair_only, deep_only, consensus, union: tag "extended" — skipped on PRs Assertion enhancements (deep_only, consensus, union now match clair_only): - BAM .bai index checks added to all three tests - QC directory checks (cramino_aln, nanoplot_aln, mosdepth, samtools) - VEP output directory checks (germline, somatic, SVs for paired) - Severus SV outputs added to deep_only CI (nf-test.yml): - Pass tags: "small" to get-shards and nf-test actions on pull_request events - Releases and workflow_dispatch continue to run the full extended suite Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Contributor
There was a problem hiding this comment.
Pull request overview
This PR appears to address filename collisions and output correctness in the small-variant consensus/phasing parts of the lrsomatic Nextflow pipeline, and expands nf-test coverage around different caller-selection / combine-mode scenarios.
Changes:
- Avoid VCF basename collisions by renaming DeepVariant/DeepSomatic outputs and by re-sorting/renaming
bcftools isecconsensus outputs in consensus mode. - Add a local
bcftools viewmodule and use it in phasing to publish a somatic-only phased VCF (filtering from the phased somatic+germline VCF using somatic calls as targets). - Add multiple new nf-test scenarios (union/all, consensus, deep-only, clair-only) and adjust CI to run only
small-tagged tests on pull requests.
Reviewed changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
subworkflows/local/small_variant_consensus.nf |
Renames/sorts consensus-mode outputs to prevent downstream filename collisions. |
modules/nf-core/deepvariant/postprocessvariants/main.nf |
Changes DeepVariant output prefix to include _germline to avoid ambiguous basenames. |
modules/local/deepsomatic/postprocessvariants/main.nf |
Changes DeepSomatic output prefix to include _somatic to avoid ambiguous basenames. |
subworkflows/local/phasing_haplotyping.nf |
Adds bcftools view filtering so published phased somatic VCF is somatic-only. |
modules/local/bcftools/view/main.nf |
New local module implementing bcftools view -T filtering with index writing. |
modules/local/bcftools/view/meta.yml |
Metadata for the new bcftools view module. |
modules/local/bcftools/view/environment.yml |
Conda environment for the new bcftools view module. |
conf/modules.config |
Adjusts Longphase somatic prefix/publishing; publishes filtered somatic VCF; configures consensus-sort renaming. |
subworkflows/local/paired/paired_smallvar_germline.nf |
Normalizes params.germline_var_keep handling (list vs scalar) and initializes channels. |
subworkflows/local/paired/paired_smallvar_somatic.nf |
Normalizes params.somatic_var_keep handling (list vs scalar) and initializes channels. |
subworkflows/local/tumor_only/tumoronly_smallvar.nf |
Normalizes *_var_keep handling (list vs scalar) and initializes channels. |
tests/default.nf.test |
Adds small tag and updates expected DeepVariant/DeepSomatic filenames. |
tests/default.nf.test.snap |
Snapshot updates for renamed outputs and added BCFTOOLS_VIEW version reporting. |
tests/consensus.nf.test |
New nf-test for consensus combine mode. |
tests/consensus.nf.test.snap |
Snapshot for consensus combine mode. |
tests/union.nf.test |
New nf-test for “all/union” combine mode. |
tests/union.nf.test.snap |
Snapshot for “all/union” combine mode. |
tests/deep_only.nf.test |
New nf-test for running only DeepVariant+DeepSomatic. |
tests/deep_only.nf.test.snap |
Snapshot for deep-only mode. |
tests/clair_only.nf.test |
New nf-test for running only Clair callers (uses an extended samplesheet). |
tests/clair_only.nf.test.snap |
Snapshot for clair-only mode. |
conf/test.config |
Changes test profile genome to GRCh38. |
nextflow.config |
Changes default caller selection/priorities (global defaults). |
.github/workflows/nf-test.yml |
Runs only small-tagged nf-tests on pull requests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+260
to
+264
| // MODULE: BCFTOOLS_VIEW (label: process_medium) | ||
| // Filter the phased somatic+germline VCF to somatic-only positions. | ||
| // Uses the original somatic VCF as a targets (-T) file so only positions | ||
| // called as somatic are retained. Phase tags (PS/HP) on somatic variants | ||
| // are preserved; germline records are dropped. |
Comment on lines
70
to
77
| "germline_var_keep": { | ||
| "type": "array", | ||
| "description": "List of germline variant callers to use. Must include at least one of [deepvariant, clair].", | ||
| "items": { | ||
| "type": "string", | ||
| "default": "['clair']", | ||
| "enum": ["deepvariant", "clair"] | ||
| }, |
Comment on lines
83
to
+85
| "items": { | ||
| "type": "string", | ||
| "default": "['clair']", |
| @@ -121,7 +121,7 @@ process DEEPSOMATIC_POSTPROCESSVARIANTS { | |||
| if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) { | |||
| error "DEEPVARIANT module does not support Conda. Please use Docker / Singularity / Podman instead." | |||
| when { | ||
| params { | ||
| outdir = "$outputDir" | ||
| input = "https://raw.githubusercontent.com/IntGenomicsLab/test-datasets/refs/heads/main/samplesheets/test_sheet_2.csv" |
| NFT_VER: ${{ env.NFT_VER }} | ||
| with: | ||
| max_shards: 7 | ||
| tags: ${{ github.event_name == 'pull_request' && 'small' || '' }} |
| profile: ${{ matrix.profile }} | ||
| shard: ${{ matrix.shard }} | ||
| total_shards: ${{ env.TOTAL_SHARDS }} | ||
| tags: ${{ github.event_name == 'pull_request' && 'small' || '' }} |
Comment on lines
16
to
+22
| // Small variant calling options | ||
| germline_var_keep = ['deepvariant', 'clair'] | ||
| somatic_var_keep = ['deepsomatic', 'clair'] | ||
| germline_var_keep = ['clair'] | ||
| somatic_var_keep = ['clair'] | ||
| germline_var_combine = 'all' | ||
| somatic_var_combine = 'all' | ||
| prioritize_caller_germline = 'deepvariant' | ||
| prioritize_caller_somatic = 'deepsomatic' | ||
| prioritize_caller_germline = 'clair' | ||
| prioritize_caller_somatic = 'clair' |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR checklist
nf-core pipelines lint).nextflow run . -profile test,docker --outdir <OUTDIR>).nextflow run . -profile debug,test,docker --outdir <OUTDIR>).docs/usage.mdis updated.docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).