Skip to content

gatk4/postprocessgermlinecnvcalls incorrect file type extension #12169

Description

@dkolbe

Have you checked the docs?

Description of the bug

The module GATK4 postprocessgermlinecnvcalls specifies its output as:

    output:
    tuple val(meta), path("*_genotyped_intervals.vcf.gz"), emit: intervals, optional: true
    tuple val(meta), path("*_genotyped_segments.vcf.gz"), emit: segments, optional: true
    tuple val(meta), path("*_denoised.vcf.gz"), emit: denoised, optional: true

with consistent naming given for output files in the script section.

    gatk --java-options "-Xmx${avail_mem}M -XX:-UsePerfData" \\
        PostprocessGermlineCNVCalls \\
        ${calls_command} \\
        ${model_command} \\
        ${ploidy_command} \\
        ${args} \\
        --output-genotyped-intervals ${prefix}_genotyped_intervals.vcf.gz \\
        --output-genotyped-segments ${prefix}_genotyped_segments.vcf.gz \\
        --output-denoised-copy-ratios ${prefix}_denoised.vcf.gz

However, the output produced by this tool with the --output-denoised-copy-ratios flag is not in vcf format; it is a tab-separated plain text file (GATK's interval_list format), usually given a .tsv suffix. (See documentation) Additionally, it doesn't look like GATK4 supports automatic compression of this file.

I encountered this in development and don't currently have a minimal example, but I can try to produce one if really needed. Actual Nextflow run is completely clean and successful.

Command used and terminal output

Relevant files

s1_denoised.tsv.gz

System information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions