Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expression Atlas Smartseq/Droplet downstream workflow #64

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

pcm32
Copy link
Member

@pcm32 pcm32 commented Oct 4, 2021

This PR aims to add the EMBL-EBI Single Cell Expression Atlas downstream analysis workflow as used to produce data with the resource, with a hopefully lightweight example.

I have a few concerns though:

1.- I would hope that the WorkflowHub.eu entry can be annotated to give credits to the person who wrote the workflow. I think that it will currently show as IWC, and while I of course value all the excellent feedback that the IWC community will provide, I still think that the main credit should go to the person writing the workflow (Jon Manning in this case).
2.- It would be great as well if, besides setting the author of the workflow to be passed to WorkflowHub.eu, additional organisations could be added (so that we could add, for instance, the EBI Gene Expression Team, next to IWC).
3.- Our workflow setup currently allows certains steps to fail. For instance, when it comes to the cluster phase, I think that the process relies on some convergence criteria, and for some resolutions value a dataset might not converge if I remember well. Will the testing here consider the workflow failed if one of those steps failed? On our setup (https://github.com/ebi-gene-expression-group/galaxy-workflow-executor) we allow certain steps to fail while still considering the workflow to be successful.
4.- We keep our SC workflow versioned controlled on another repo, and I figured that the best way to keep this in-sync would be as a hidden git submodule. Open to suggestions if you would like to avoid this. I suspect it might break tests initially if the checkout behaviour is not doing submodules. Happy to fix this of course.

I would be happy to PR needed changes to deal with 1 and 2 if given some directions on where you would do these changes and how you would like them.

@pcm32
Copy link
Member Author

pcm32 commented Oct 4, 2021

Workflow execution fails probably because our tools from the toolshed (user ebi-gxa) are not on the CVMFS setup. How can we add those there? Thanks!

@mvdbeek
Copy link
Member

mvdbeek commented Oct 4, 2021

Thanks @pcm32, those are really cool workflows!

1.- I would hope that the WorkflowHub.eu entry can be annotated to give credits to the person who wrote the workflow. I think that it will currently show as IWC, and while I of course value all the excellent feedback that the IWC community will provide, I still think that the main credit should go to the person writing the workflow (Jon Manning in this case).

You can set the author in the workflow file and it should appear in TRS interface ultimately (we have some work to do there, but it should be straightforward for dockstore at least).

2.- It would be great as well if, besides setting the author of the workflow to be passed to WorkflowHub.eu, additional organisations could be added (so that we could add, for instance, the EBI Gene Expression Team, next to IWC).

That is functionality workflowhub would need to expose, I'm not familiar enough to say if this is possible currently. This can be done on dockstore. You can definitely add multiple organizations in the workflow editor interface, and that will be the source of truth eventually.

3.- Our workflow setup currently allows certains steps to fail. For instance, when it comes to the cluster phase, I think that the process relies on some convergence criteria, and for some resolutions value a dataset might not converge if I remember well. Will the testing here consider the workflow failed if one of those steps failed? On our setup (https://github.com/ebi-gene-expression-group/galaxy-workflow-executor) we allow certain steps to fail while still considering the workflow to be successful.

No, I don't think that's something we'll allow in the intermediate future. We could think about re-running failed steps a couple of times though, would that help ? If we then still can't get the step to work reliably I would say that is a problem and not a workflow we should consider best-practice since it wouldn't work (without tuning the Galaxy instance) on any Galaxy instance.

4.- We keep our SC workflow versioned controlled on another repo, and I figured that the best way to keep this in-sync would be as a hidden git submodule. Open to suggestions if you would like to avoid this. I suspect it might break tests initially if the checkout behaviour is not doing submodules. Happy to fix this of course.

So the long term plan is that the infrastructure that we are developing for the iwc can be easily adopted by other groups (mostly through improving planemo-ci-action), and https://github.com/ebi-gene-expression-group/scxa-workflows looks like an excellent candidate for this approach. We could then still take up workflows that are published by ebi-gene-expression-group into a collection on dockstore.

I think the submodule approach would be a bit of a barrier both for contributions (although we wouldn't have to enforce this ...) and reviews, but I'm curious what other people in @galaxyproject/iwc think about this. It is certainly elegant.

@mvdbeek
Copy link
Member

mvdbeek commented Oct 4, 2021

Ah, and for installing the missing tools on cvmfs / usegalaxy.org, that would happen via a pull-request against https://github.com/galaxyproject/usegalaxy-tools. I will do this tomorrow, the plan is that we automate that.

@pcm32
Copy link
Member Author

pcm32 commented Oct 4, 2021

No, I don't think that's something we'll allow in the intermediate future. We could think about re-running failed steps a couple of times though, would that help ? If we then still can't get the step to work reliably I would say that is a problem and not a workflow we should consider best-practice since it wouldn't work (without tuning the Galaxy instance) on any Galaxy instance.

This is anyway on an intermediate step, which then gets a filtering of failed datasets on a collection, so the final outputs don't get affected anyway. But this is not due to a tool error, but more likely that the problem doesn't have a solution for that specific set of parameters: the workflow walks on a series of values for what is called the resolution of the clustering, and some combinations of datasets and resolution values won't have a solution, so some elements of that collection (related to the resolution parameter walking) will show up as errors, for some datasets.

@pcm32
Copy link
Member Author

pcm32 commented Oct 4, 2021

Anyway, we'll see once the tools are on the CVMFS.

@simleo
Copy link
Collaborator

simleo commented Oct 6, 2021

1.- I would hope that the WorkflowHub.eu entry can be annotated to give credits to the person who wrote the workflow. I think that it will currently show as IWC, and while I of course value all the excellent feedback that the IWC community will provide, I still think that the main credit should go to the person writing the workflow (Jon Manning in this case).

You can set "creator" in the workflow file, see this example

"creator": [
{
"class": "Person",
"identifier": "https://orcid.org/0000-0002-9464-6640",
"name": "Wolfgang Maier"
}
],

And it will show up in the WorkflowHub page for the workflow:

Screenshot from 2021-10-06 13-01-18

@simleo
Copy link
Collaborator

simleo commented Oct 6, 2021

2.- It would be great as well if, besides setting the author of the workflow to be passed to WorkflowHub.eu, additional organisations could be added (so that we could add, for instance, the EBI Gene Expression Team, next to IWC).

In this example, both Person and Organization are set in creator:

"creator": [
{
"class": "Person",
"identifier": "https://orcid.org/0000-0002-9676-7032",
"name": "Marius van den Beek"
},
{
"class": "Organization",
"name": "IWC",
"url": "https://github.com/galaxyproject/iwc"
}
],

But I think Organization is currently ignored by WorkflowHub. @fbacall can you confirm?

@mvdbeek
Copy link
Member

mvdbeek commented Oct 7, 2021

@pcm32 it seems there multiple versions of scanpy_parameter_iterator in use, can you update these tools to the latest version (in the editor -> workflow options -> upgrade workflow). The workflow itself should follow https://github.com/galaxyproject/iwc/blob/main/workflows/README.md#adding-workflows, namely you have to add an initial release key to the workflow and run planemo dockstore_init. We also need a README.md file in the folder and a CHANGELOG.md file (see https://github.com/galaxyproject/iwc/blob/main/workflows/sars-cov-2-variant-calling/sars-cov-2-consensus-from-variation/CHANGELOG.md for an example).

@pcm32
Copy link
Member Author

pcm32 commented Oct 13, 2021

I have addressed now the multiple versions of the parameter iterator, will address the other remarks shortly. Should I proceed with the PR that you mentioned for the tools @mvdbeek? since it is still failing due to the tools missing issue.

Thanks for the indications on the user credits, will try that.

@mvdbeek
Copy link
Member

mvdbeek commented Oct 13, 2021

Should I proceed with the PR

The PR is getting deployed once you write hit deploy this, but I may have missed a tool, or the update included a version that was not in the initial PR.

@mvdbeek
Copy link
Member

mvdbeek commented Oct 13, 2021

Is there a reason the workflow is not using the latest tools ?
Screenshot 2021-10-13 at 12 10 41

@pcm32
Copy link
Member Author

pcm32 commented Oct 13, 2021

Yes, this is the workflow we used to run a previous data release of Atlas (but the latest one publicly available). We have made improvements to the workflow, but no released data has used it yet. So while updating some inoffensive tools like the parameter iterators is fine, updating the others might have unintended consequences. Our public version of the workflow will almost always be using slightly older versions of the tools, as our dev workflows are normally pushing changes in tools forward.

@pcm32
Copy link
Member Author

pcm32 commented Oct 13, 2021

Yes, sorry, there might be some version changes because I was pointing to the wrong workflow version when first started the PR, sorry about that :-(.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants