-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expression Atlas Smartseq/Droplet downstream workflow #64
base: main
Are you sure you want to change the base?
Expression Atlas Smartseq/Droplet downstream workflow #64
Conversation
Workflow execution fails probably because our tools from the toolshed (user ebi-gxa) are not on the CVMFS setup. How can we add those there? Thanks! |
Thanks @pcm32, those are really cool workflows!
You can set the author in the workflow file and it should appear in TRS interface ultimately (we have some work to do there, but it should be straightforward for dockstore at least).
That is functionality workflowhub would need to expose, I'm not familiar enough to say if this is possible currently. This can be done on dockstore. You can definitely add multiple organizations in the workflow editor interface, and that will be the source of truth eventually.
No, I don't think that's something we'll allow in the intermediate future. We could think about re-running failed steps a couple of times though, would that help ? If we then still can't get the step to work reliably I would say that is a problem and not a workflow we should consider best-practice since it wouldn't work (without tuning the Galaxy instance) on any Galaxy instance.
So the long term plan is that the infrastructure that we are developing for the iwc can be easily adopted by other groups (mostly through improving planemo-ci-action), and https://github.com/ebi-gene-expression-group/scxa-workflows looks like an excellent candidate for this approach. We could then still take up workflows that are published by ebi-gene-expression-group into a collection on dockstore. I think the submodule approach would be a bit of a barrier both for contributions (although we wouldn't have to enforce this ...) and reviews, but I'm curious what other people in @galaxyproject/iwc think about this. It is certainly elegant. |
Ah, and for installing the missing tools on cvmfs / usegalaxy.org, that would happen via a pull-request against https://github.com/galaxyproject/usegalaxy-tools. I will do this tomorrow, the plan is that we automate that. |
This is anyway on an intermediate step, which then gets a filtering of failed datasets on a collection, so the final outputs don't get affected anyway. But this is not due to a tool error, but more likely that the problem doesn't have a solution for that specific set of parameters: the workflow walks on a series of values for what is called the resolution of the clustering, and some combinations of datasets and resolution values won't have a solution, so some elements of that collection (related to the resolution parameter walking) will show up as errors, for some datasets. |
Anyway, we'll see once the tools are on the CVMFS. |
You can set Lines 4 to 10 in cfe57c8
And it will show up in the WorkflowHub page for the workflow: |
In this example, both iwc/workflows/data-fetching/parallel-accession-download/parallel-accession-download.ga Lines 4 to 15 in cfe57c8
But I think |
@pcm32 it seems there multiple versions of scanpy_parameter_iterator in use, can you update these tools to the latest version (in the editor -> workflow options -> upgrade workflow). The workflow itself should follow https://github.com/galaxyproject/iwc/blob/main/workflows/README.md#adding-workflows, namely you have to add an initial release key to the workflow and run |
I have addressed now the multiple versions of the parameter iterator, will address the other remarks shortly. Should I proceed with the PR that you mentioned for the tools @mvdbeek? since it is still failing due to the tools missing issue. Thanks for the indications on the user credits, will try that. |
The PR is getting deployed once you write hit |
Yes, this is the workflow we used to run a previous data release of Atlas (but the latest one publicly available). We have made improvements to the workflow, but no released data has used it yet. So while updating some inoffensive tools like the parameter iterators is fine, updating the others might have unintended consequences. Our public version of the workflow will almost always be using slightly older versions of the tools, as our dev workflows are normally pushing changes in tools forward. |
Yes, sorry, there might be some version changes because I was pointing to the wrong workflow version when first started the PR, sorry about that :-(. |
This PR aims to add the EMBL-EBI Single Cell Expression Atlas downstream analysis workflow as used to produce data with the resource, with a hopefully lightweight example.
I have a few concerns though:
1.- I would hope that the WorkflowHub.eu entry can be annotated to give credits to the person who wrote the workflow. I think that it will currently show as IWC, and while I of course value all the excellent feedback that the IWC community will provide, I still think that the main credit should go to the person writing the workflow (Jon Manning in this case).
2.- It would be great as well if, besides setting the author of the workflow to be passed to WorkflowHub.eu, additional organisations could be added (so that we could add, for instance, the EBI Gene Expression Team, next to IWC).
3.- Our workflow setup currently allows certains steps to fail. For instance, when it comes to the cluster phase, I think that the process relies on some convergence criteria, and for some resolutions value a dataset might not converge if I remember well. Will the testing here consider the workflow failed if one of those steps failed? On our setup (https://github.com/ebi-gene-expression-group/galaxy-workflow-executor) we allow certain steps to fail while still considering the workflow to be successful.
4.- We keep our SC workflow versioned controlled on another repo, and I figured that the best way to keep this in-sync would be as a hidden git submodule. Open to suggestions if you would like to avoid this. I suspect it might break tests initially if the checkout behaviour is not doing submodules. Happy to fix this of course.
I would be happy to PR needed changes to deal with 1 and 2 if given some directions on where you would do these changes and how you would like them.