add long_phase supplementary alignment tag to extended args#163
add long_phase supplementary alignment tag to extended args#163AmberVerhasselt wants to merge 7 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new pipeline parameter to control whether Longphase haplotag includes supplementary alignments, wiring it through config/schema/docs and into the module args assembly.
Changes:
- Add
params.longphase_tag_supplementary(defaultfalse) and pass--tagSupplementarytoLONGPHASE_HAPLOTAGwhen enabled. - Extend
nextflow_schema.jsonanddocs/usage.mdto expose/document the new option. - Update documented default for
--severus_minsupportto3(matching config/schema).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
nextflow_schema.json |
Adds Longphase option metadata, but currently introduces schema-structure and default-value issues. |
nextflow.config |
Defines new longphase_tag_supplementary param default. |
docs/usage.md |
Documents the new Longphase parameter and updates Severus default shown to users. |
conf/modules.config |
Conditionally appends --tagSupplementary to Longphase haplotag args. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "enum": ["deepvariant", "clair"] | ||
| }, | ||
| "minItems": 1 | ||
| "minItems": 1, | ||
| "default": "['deepvariant', 'clair']" | ||
| }, |
There was a problem hiding this comment.
germline_var_keep is typed as an array, but the schema default is currently a string (and not valid JSON). This will break downstream tooling that reads defaults from the schema. Set default to a JSON array (e.g., ["deepvariant","clair"]) instead of a quoted string.
| }, | ||
| "minItems": 1 | ||
| "minItems": 1, | ||
| "default": "['deepsomatic', 'clair']" |
There was a problem hiding this comment.
somatic_var_keep is typed as an array, but the schema default is currently a string (and not valid JSON). This will confuse validation/docs generation that expects an array default. Change default to a JSON array (e.g., ["deepsomatic","clair"]).
| "default": "['deepsomatic', 'clair']" | |
| "default": ["deepsomatic", "clair"] |
| "longphase_options": { | ||
| "title": "Longphase options", | ||
| "type": "object", | ||
| "description": "Options for Longphase phasing of small variants", | ||
| "properties": { |
There was a problem hiding this comment.
longphase_options is added under $defs, but it isn't referenced in the top-level allOf list. As a result, longphase_tag_supplementary defined there won't be part of the effective schema grouping/metadata. Add a $ref to #/$defs/longphase_options in allOf (and avoid duplicating the param elsewhere) so it shows up correctly in the schema-driven UI/docs.
| "properties": { | ||
| "generate_gvcf": { | ||
| "type": "boolean" | ||
| }, | ||
| "longphase_tag_supplementary": { | ||
| "type": "boolean" | ||
| }, | ||
| "skip_modkit": { | ||
| "type": "boolean" | ||
| } |
There was a problem hiding this comment.
The new top-level properties block defines generate_gvcf, longphase_tag_supplementary, and skip_modkit with only type, dropping their descriptions/defaults and bypassing the existing $defs grouping pattern used elsewhere in this schema. Instead, define these parameters in the appropriate $defs sections (e.g., generate_gvcf in small-variant options, skip_modkit in skip options, longphase_tag_supplementary in longphase options) and reference those groups from allOf, removing this partial top-level properties block.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "germline_var_keep": { | ||
| "type": "array", | ||
| "description": "List of germline variant callers to use. Must include at least one of [deepvariant, clair].", | ||
| "items": { | ||
| "type": "string", | ||
| "enum": ["deepvariant", "clair"] | ||
| }, | ||
| "minItems": 1 | ||
| "minItems": 1, | ||
| "default": "['deepvariant', 'clair']" | ||
| }, | ||
| "somatic_var_keep": { | ||
| "type": "array", | ||
| "description": "List of somatic variant callers to use. Must include at least one of [deepsomatic, clair].", | ||
| "items": { | ||
| "type": "string", | ||
| "enum": ["deepsomatic", "clair"] | ||
| }, | ||
| "minItems": 1 | ||
| "minItems": 1, | ||
| "default": "['deepsomatic', 'clair']" | ||
| }, |
There was a problem hiding this comment.
germline_var_keep and somatic_var_keep are typed as arrays, but their default values are currently JSON strings (e.g. "['deepvariant', 'clair']"). In JSON Schema, default should match the declared type; this will break schema validation / generated docs. Set these defaults to actual JSON arrays (e.g. ["deepvariant", "clair"]).
| "properties": { | ||
| "generate_gvcf": { | ||
| "type": "boolean" | ||
| }, | ||
| "skip_modkit": { | ||
| "type": "boolean" | ||
| } | ||
| } |
There was a problem hiding this comment.
The new top-level properties entries for generate_gvcf and skip_modkit are incomplete (no description/default) and also bypass the existing parameter grouping via $defs/allOf (e.g. skip_modkit should live under #/$defs/skip_options). This will make schema-generated docs/UI inconsistent with docs/usage.md and other params. Consider moving these definitions into the appropriate $defs sections and adding description (and default if the schema is intended to carry defaults).
| "longphase_options": { | ||
| "title": "Longphase options", | ||
| "type": "object", | ||
| "description": "Options for Longphase phasing of small variants", | ||
| "properties": { | ||
| "longphase_tag_supplementary": { | ||
| "type": "boolean", | ||
| "description": "Whether to include supplementary alignments in Longphase haplotype tagging.", | ||
| "default": false | ||
| } | ||
| } | ||
| }, |
There was a problem hiding this comment.
PR title/description indicate this is only about adding a Longphase supplementary-alignment haplotag option, but this PR also changes schema defaults for *_var_keep, adds new schema properties for generate_gvcf / skip_modkit, and updates the documented default for severus_minsupport. Please update the PR description (or split into separate PRs) so reviewers understand the full scope and rationale for these additional changes.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "germline_var_keep": { | ||
| "type": "array", | ||
| "type": "string", | ||
| "description": "List of germline variant callers to use. Must include at least one of [deepvariant, clair].", | ||
| "items": { | ||
| "type": "string", | ||
| "enum": ["deepvariant", "clair"] | ||
| }, | ||
| "minItems": 1 | ||
| "minItems": 1, | ||
| "default": "['deepvariant', 'clair']" | ||
| }, |
There was a problem hiding this comment.
germline_var_keep is used throughout the pipeline as a Groovy List (e.g., .contains(), .size(), and comparisons to ['clair'] / ['deepvariant']). Changing the schema type to string (and setting a string default like "['deepvariant', 'clair']") will encourage invalid user input and can break runtime logic; it also makes the schema invalid because items/minItems apply to arrays, not strings. Please revert this property to type: "array" with items.type: "string", keep minItems, and use a real JSON array for default (e.g. ["deepvariant","clair"]).
| "somatic_var_keep": { | ||
| "type": "array", | ||
| "type": "string", | ||
| "description": "List of somatic variant callers to use. Must include at least one of [deepsomatic, clair].", | ||
| "items": { | ||
| "type": "string", | ||
| "enum": ["deepsomatic", "clair"] | ||
| }, | ||
| "minItems": 1 | ||
| "minItems": 1, | ||
| "default": "['deepsomatic', 'clair']" |
There was a problem hiding this comment.
Same issue for somatic_var_keep: the workflow expects a List (uses .contains() / .size() patterns), but the schema now declares it as string with items/minItems and a stringified list default. This makes the schema inconsistent/invalid and can lead to runtime misbehavior if users supply a string. Please keep it as type: "array" with an enum-constrained items, and set default to a JSON array like ["deepsomatic","clair"].
| "skip_qc": { | ||
| "type": "boolean", | ||
| "default": false, | ||
| "description": "Skips all QC steps" | ||
| }, | ||
| "skip_cramino": { |
There was a problem hiding this comment.
A number of boolean parameters had their default: false removed from the schema (e.g. all skip_* flags, use_gpu, plus others elsewhere like igenomes_ignore, download_vep_cache, ascat_pdf_plots, and generic help/version flags). Even though JSON Schema default is informational, nf-core tooling/docs and UI generation typically rely on it to display correct defaults and keep the schema aligned with nextflow.config. Please restore the default values (at least for the params that are explicitly defaulted in nextflow.config).
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "generate_gvcf": { | ||
| "type": "boolean" | ||
| }, |
There was a problem hiding this comment.
Top-level properties.generate_gvcf is missing a description/help_text/default and is not grouped with the other small variant calling options in $defs. This makes the schema inconsistent and may cause nf-core schema docs / UI grouping to omit or misplace the parameter metadata.
| "generate_gvcf": { | |
| "type": "boolean" | |
| }, |
| }, | ||
| "skip_modkit": { | ||
| "type": "boolean" |
There was a problem hiding this comment.
Top-level properties.skip_modkit is defined without description/help_text/default and outside the existing skip_options group. To keep schema validation + rendered docs consistent, this should be defined alongside the other skip parameters (with matching metadata) rather than as a bare root property.
| }, | |
| "skip_modkit": { | |
| "type": "boolean" |
| "type": ["string", "array"], | ||
| "description": "List of germline variant callers to use. Must include at least one of [deepvariant, clair].", | ||
| "items": { | ||
| "type": "string", | ||
| "enum": ["deepvariant", "clair"] | ||
| }, |
There was a problem hiding this comment.
germline_var_keep now allows type: "string", but the allowed values are only constrained via items.enum (which applies only when the value is an array). As a result, any arbitrary string would validate successfully and bypass the intended [deepvariant, clair] restriction. Consider using oneOf/anyOf to separately validate the string form (e.g., enum/pattern) vs the array form, or keep this as an array-only parameter.
PR checklist
nf-core pipelines lint).nextflow run . -profile test,docker --outdir <OUTDIR>).nextflow run . -profile debug,test,docker --outdir <OUTDIR>).docs/usage.mdis updated.docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).