Merge pull request #59 from wm75/ONT-Artic-wf-update

ONT-Artic workflow fixes and enhancements
galaxyproject · Sep 23, 2021 · cfe57c8 · cfe57c8
2 parents c9709ef + efed240
commit cfe57c8
Show file tree

Hide file tree

Showing 4 changed files with 863 additions and 360 deletions.
diff --git a/...ws/sars-cov-2-variant-calling/sars-cov-2-ont-artic-variant-calling/CHANGELOG.md b/...ws/sars-cov-2-variant-calling/sars-cov-2-ont-artic-variant-calling/CHANGELOG.md
@@ -1,5 +1,66 @@
 # Changelog
 
+## [0.3] 2021-09-22
+
+### Changed
+
+This version changes the way variants get called and and how key call
+statistics are calculated:
+
+- Switch to medaka_variant version 1.3.2+galaxy1 for extracting variants from
+  medaka consensus data.
+
+  This new version of the tool is more robust against input data peculiarities
+  at the VCF annotation stage:
+
+  * it doesn't fail on empty BAM input
+  * it doesn't crash on variant calls of unusually high quality that previously
+    resulted in math domain errors when trying to calculate PHRED scores from
+    very small error probabilities.
+
+  This tool update also means that key INFO fields (DP, DP4, AF) are
+  now based on calculations carried out by medaka tools annotate instead of by
+  custom code using samtools mpileup. This has the following consequences:
+
+  * the tool can now emit variant calls at complex sites with > 1 lengths of
+    both the REF and the ALT allele, which were previously dropped
+
+  * the workflow became more complex; to account for shortcomings of medaka
+    tools annotate, the variant call statistics of regular variants and of
+    primer binding site variants have to be determined in separate runs of the
+    tool
+
+  * All key INFO fields (DP, DP4, AF) will change slightly in this version of
+    the workflow
+
+This version also adds some of the changes around trimming of primer sequences,
+which have been introduced into version 0.3 of the PE Illumina worflow for
+amplicon data before:
+
+- Update ivar trim to version 1.3.1
+
+- Run ivar trim as the last mapped reads processing step before variant
+  calling, i.e., after left-alignment of indels
+
+and:
+
+- Rename the output of the ivar trim step to "Fully processed reads for
+  variant calling (primer-trimmed, realigned reads)" like the corresponding
+  output of the PE Illumina workflow
+
+- Fix a typo in the allowed input formats for the collection of sequenced
+  reads, which caused fastqsanger.gz data to undergo an implicit and
+  unnecessary decompression step.
+
+### Added
+
+- Add a step to filter out failed datasets before flattening the Qualimap BamQC
+  data for use by MultiQC.
+
+  Qualimap BamQC fails on empty BAM input and trying to flatten the resulting
+  collection containing failed datasets would cause the invocation of the
+  workflow to fail.
+
 ## [0.2.1] 2021-07-23
 
 ### Added

diff --git a/...s-cov-2-variant-calling/sars-cov-2-ont-artic-variant-calling/ont-artic-variation-test.yml b/...s-cov-2-variant-calling/sars-cov-2-ont-artic-variant-calling/ont-artic-variation-test.yml
@@ -3,12 +3,9 @@
     NC_045512.2 FASTA sequence of SARS-CoV-2:
       class: File
       location: 'https://zenodo.org/record/4555735/files/NC_045512.2_reference.fasta?download=1'
-    ARTIC primer BED:
+    Primer binding sites info in BED format:
       class: File
       location: 'https://zenodo.org/record/4555735/files/ARTIC_nCoV-2019_v3.bed?download=1'
-    ARTIC primers to amplicon assignments:
-      class: File
-      location: 'https://zenodo.org/record/4555735/files/ARTIC_amplicon_info_v3.tsv?download=1'
     ONT-sequenced reads:
       class: Collection
       collection_type: 'list'