-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the covid-19 consensus workflow #31
Conversation
This is the last of the covid19.galaxyproject.org genomics workflows, which still isn't deposited outside Galaxy.
The version in this PR actually contains a couple of bugs and quirks, which I've fixed this week. This one should be it's own version cause it's been used, e.g., to construct some of the COG-UK tracking project consensus sequences and we should have a proper release to refer to. |
We need the |
@@ -0,0 +1,1735 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wm75 I think this needs to be called consensus-from-variation.ga for planemo to pick up the test (or rename the test file to consensus-test.yml).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, that looks better:
Applying linter structure... WARNING
.. WARNING: Workflow contained output without a label
Applying linter tests... CHECK
.. CHECK: Tests appear structurally correct for workflows/sars-cov-2-variant-calling/sars-cov-2-consensus-from-variation/consensus-from-variation.ga
Need to install the missing tools on main ... one of the TODO items from #29 😆. I'm on it. |
@wm75 this seems pretty complex... why not use ivar consensus? |
@bwlang with the collection of sars-cov-2-genomics WFs we are handling variant calling already (in a very reliable and sensitive way), and this WF here tries to build a consensus sequence from such a list of called variants. ivar consensus is, of course, a much simpler solution, but it's also calling variants internally from its bam input again using samtools mpileup. So this is simply a different use case: if you want to get a fast consensus use ivar consensus, but we want a consensus sequence that incorporates the exact same set of variants that we called upstream. In addition, most of the complexity in this WF comes from the aimed-for behavior described in the README. The core business of building the consensus FASTA is also just a single step (bcftools consensus) here. A bit of simplification will also follow in the first update of the WF. That's the other important point: we need proper releases of this particular WF because just like the ARTIC PE variation WF and the reporting WF it's being used in our national viral genome surveillance tracking efforts (see https://usegalaxy-eu.github.io/posts/2021/04/29/sars-cov-2-monitoring/plain.html). The version in this PR has been run on ~ 35,000 COG-UK and a few Estonian samples already, and the updated version, which I would like to become iwc release 0.2, will be run on many more. All this is not to say that we shouldn't have more (and possibly more lightweight) WFs for doing viral genomic analyses, and such WFs are still on my to do list. It's just that the ones we're already running on a lot of data are a priority for moving to iwc currently. |
This is the last of the covid19.galaxyproject.org genomics workflows,
which still isn't deposited outside Galaxy.