Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MSA workflow #11

Open
pjotrp opened this issue Apr 13, 2020 · 9 comments
Open

Add MSA workflow #11

pjotrp opened this issue Apr 13, 2020 · 9 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@pjotrp
Copy link
Collaborator

pjotrp commented Apr 13, 2020

@ekg is working on an MSA workflow

@pjotrp pjotrp added the enhancement New feature or request label Apr 13, 2020
@pjotrp pjotrp added this to the Later milestone Apr 13, 2020
@pjotrp
Copy link
Collaborator Author

pjotrp commented Apr 13, 2020

MSA is quadratic slow for growing datasets. So we need a way to split that data into smaller families. @ekg writes: You can script it out with mash and minhash distances from the pangenome.

@pjotrp
Copy link
Collaborator Author

pjotrp commented Apr 25, 2020

@mr-c added a workflow at https://github.com/common-workflow-lab/2020-covid-19-bh/tree/master/msa. We need to split it up in subsections though - running MSA on all data is too slow.

@ekg
Copy link
Collaborator

ekg commented Apr 25, 2020 via email

@pjotrp
Copy link
Collaborator Author

pjotrp commented Apr 25, 2020

We have not tried that yet on the GFA. Can you try and see if it works on
https://workbench.lugli.arvadosapi.com/collections/lugli-4zz18-z513nlpqm03hpca

@mr-c
Copy link
Collaborator

mr-c commented Apr 25, 2020

https://workbench.lugli.arvadosapi.com/container_requests/lugli-xvhdp-a2w5l0y4sh6efd1 is the latest run of mafft+iqtree, but iqtree didn't converge

@mr-c
Copy link
Collaborator

mr-c commented Apr 25, 2020

@pjotrp
Copy link
Collaborator Author

pjotrp commented Apr 29, 2020

@ekg we may need a maffer example to split the population into smaller subsets to run MSA on.

@pjotrp
Copy link
Collaborator Author

pjotrp commented Oct 29, 2020

Maybe never.

@pjotrp pjotrp closed this as completed Oct 29, 2020
@AndreaGuarracino
Copy link
Collaborator

AndreaGuarracino commented Oct 30, 2020

maffer is currently deprecated.

@pjotrp, if we still want to have an MSA, in the current state of PubSeq, the fastest way to get it is to take advantage of the fact that we are currently apply SPOA on the whole input sequences. This can already give us a huge MSA, from which I can create a huge MAF output too.

@AndreaGuarracino AndreaGuarracino self-assigned this Nov 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants