Expression Atlas Smartseq/Droplet downstream workflow #64

pcm32 · 2021-10-04T15:37:27Z

This PR aims to add the EMBL-EBI Single Cell Expression Atlas downstream analysis workflow as used to produce data with the resource, with a hopefully lightweight example.

I have a few concerns though:

1.- I would hope that the WorkflowHub.eu entry can be annotated to give credits to the person who wrote the workflow. I think that it will currently show as IWC, and while I of course value all the excellent feedback that the IWC community will provide, I still think that the main credit should go to the person writing the workflow (Jon Manning in this case).
2.- It would be great as well if, besides setting the author of the workflow to be passed to WorkflowHub.eu, additional organisations could be added (so that we could add, for instance, the EBI Gene Expression Team, next to IWC).
3.- Our workflow setup currently allows certains steps to fail. For instance, when it comes to the cluster phase, I think that the process relies on some convergence criteria, and for some resolutions value a dataset might not converge if I remember well. Will the testing here consider the workflow failed if one of those steps failed? On our setup (https://github.com/ebi-gene-expression-group/galaxy-workflow-executor) we allow certain steps to fail while still considering the workflow to be successful.
4.- We keep our SC workflow versioned controlled on another repo, and I figured that the best way to keep this in-sync would be as a hidden git submodule. Open to suggestions if you would like to avoid this. I suspect it might break tests initially if the checkout behaviour is not doing submodules. Happy to fix this of course.

I would be happy to PR needed changes to deal with 1 and 2 if given some directions on where you would do these changes and how you would like them.

pcm32 · 2021-10-04T16:25:57Z

Workflow execution fails probably because our tools from the toolshed (user ebi-gxa) are not on the CVMFS setup. How can we add those there? Thanks!

mvdbeek · 2021-10-04T17:34:16Z

Thanks @pcm32, those are really cool workflows!

1.- I would hope that the WorkflowHub.eu entry can be annotated to give credits to the person who wrote the workflow. I think that it will currently show as IWC, and while I of course value all the excellent feedback that the IWC community will provide, I still think that the main credit should go to the person writing the workflow (Jon Manning in this case).

You can set the author in the workflow file and it should appear in TRS interface ultimately (we have some work to do there, but it should be straightforward for dockstore at least).

2.- It would be great as well if, besides setting the author of the workflow to be passed to WorkflowHub.eu, additional organisations could be added (so that we could add, for instance, the EBI Gene Expression Team, next to IWC).

That is functionality workflowhub would need to expose, I'm not familiar enough to say if this is possible currently. This can be done on dockstore. You can definitely add multiple organizations in the workflow editor interface, and that will be the source of truth eventually.

3.- Our workflow setup currently allows certains steps to fail. For instance, when it comes to the cluster phase, I think that the process relies on some convergence criteria, and for some resolutions value a dataset might not converge if I remember well. Will the testing here consider the workflow failed if one of those steps failed? On our setup (https://github.com/ebi-gene-expression-group/galaxy-workflow-executor) we allow certain steps to fail while still considering the workflow to be successful.

No, I don't think that's something we'll allow in the intermediate future. We could think about re-running failed steps a couple of times though, would that help ? If we then still can't get the step to work reliably I would say that is a problem and not a workflow we should consider best-practice since it wouldn't work (without tuning the Galaxy instance) on any Galaxy instance.

4.- We keep our SC workflow versioned controlled on another repo, and I figured that the best way to keep this in-sync would be as a hidden git submodule. Open to suggestions if you would like to avoid this. I suspect it might break tests initially if the checkout behaviour is not doing submodules. Happy to fix this of course.

So the long term plan is that the infrastructure that we are developing for the iwc can be easily adopted by other groups (mostly through improving planemo-ci-action), and https://github.com/ebi-gene-expression-group/scxa-workflows looks like an excellent candidate for this approach. We could then still take up workflows that are published by ebi-gene-expression-group into a collection on dockstore.

I think the submodule approach would be a bit of a barrier both for contributions (although we wouldn't have to enforce this ...) and reviews, but I'm curious what other people in @galaxyproject/iwc think about this. It is certainly elegant.

mvdbeek · 2021-10-04T17:36:13Z

Ah, and for installing the missing tools on cvmfs / usegalaxy.org, that would happen via a pull-request against https://github.com/galaxyproject/usegalaxy-tools. I will do this tomorrow, the plan is that we automate that.

pcm32 · 2021-10-04T20:31:43Z

No, I don't think that's something we'll allow in the intermediate future. We could think about re-running failed steps a couple of times though, would that help ? If we then still can't get the step to work reliably I would say that is a problem and not a workflow we should consider best-practice since it wouldn't work (without tuning the Galaxy instance) on any Galaxy instance.

This is anyway on an intermediate step, which then gets a filtering of failed datasets on a collection, so the final outputs don't get affected anyway. But this is not due to a tool error, but more likely that the problem doesn't have a solution for that specific set of parameters: the workflow walks on a series of values for what is called the resolution of the clustering, and some combinations of datasets and resolution values won't have a solution, so some elements of that collection (related to the resolution parameter walking) will show up as errors, for some datasets.

pcm32 · 2021-10-04T20:32:52Z

Anyway, we'll see once the tools are on the CVMFS.

simleo · 2021-10-06T10:58:21Z

1.- I would hope that the WorkflowHub.eu entry can be annotated to give credits to the person who wrote the workflow. I think that it will currently show as IWC, and while I of course value all the excellent feedback that the IWC community will provide, I still think that the main credit should go to the person writing the workflow (Jon Manning in this case).

You can set "creator" in the workflow file, see this example

iwc/workflows/sars-cov-2-variant-calling/sars-cov-2-pe-illumina-artic-variant-calling/pe-artic-variation.ga

Lines 4 to 10 in cfe57c8

    
           "creator": [ 
        
               { 
        
                   "class": "Person", 
        
                   "identifier": "https://orcid.org/0000-0002-9464-6640", 
        
                   "name": "Wolfgang Maier" 
        
               } 
        
           ],

And it will show up in the WorkflowHub page for the workflow:

simleo · 2021-10-06T11:09:11Z

2.- It would be great as well if, besides setting the author of the workflow to be passed to WorkflowHub.eu, additional organisations could be added (so that we could add, for instance, the EBI Gene Expression Team, next to IWC).

In this example, both Person and Organization are set in creator:

iwc/workflows/data-fetching/parallel-accession-download/parallel-accession-download.ga

Lines 4 to 15 in cfe57c8

    
           "creator": [ 
        
               { 
        
                   "class": "Person", 
        
                   "identifier": "https://orcid.org/0000-0002-9676-7032", 
        
                   "name": "Marius van den Beek" 
        
               }, 
        
               { 
        
                   "class": "Organization", 
        
                   "name": "IWC", 
        
                   "url": "https://github.com/galaxyproject/iwc" 
        
               } 
        
           ],

But I think Organization is currently ignored by WorkflowHub. @fbacall can you confirm?

mvdbeek · 2021-10-07T09:05:35Z

@pcm32 it seems there multiple versions of scanpy_parameter_iterator in use, can you update these tools to the latest version (in the editor -> workflow options -> upgrade workflow). The workflow itself should follow https://github.com/galaxyproject/iwc/blob/main/workflows/README.md#adding-workflows, namely you have to add an initial release key to the workflow and run planemo dockstore_init. We also need a README.md file in the folder and a CHANGELOG.md file (see https://github.com/galaxyproject/iwc/blob/main/workflows/sars-cov-2-variant-calling/sars-cov-2-consensus-from-variation/CHANGELOG.md for an example).

…rection

pcm32 · 2021-10-13T09:48:40Z

I have addressed now the multiple versions of the parameter iterator, will address the other remarks shortly. Should I proceed with the PR that you mentioned for the tools @mvdbeek? since it is still failing due to the tools missing issue.

Thanks for the indications on the user credits, will try that.

mvdbeek · 2021-10-13T09:52:36Z

Should I proceed with the PR

The PR is getting deployed once you write hit deploy this, but I may have missed a tool, or the update included a version that was not in the initial PR.

mvdbeek · 2021-10-13T10:11:27Z

Is there a reason the workflow is not using the latest tools ?

pcm32 · 2021-10-13T10:49:17Z

Yes, this is the workflow we used to run a previous data release of Atlas (but the latest one publicly available). We have made improvements to the workflow, but no released data has used it yet. So while updating some inoffensive tools like the parameter iterators is fine, updating the others might have unintended consequences. Our public version of the workflow will almost always be using slightly older versions of the tools, as our dev workflows are normally pushing changes in tools forward.

pcm32 · 2021-10-13T10:51:11Z

Yes, sorry, there might be some version changes because I was pointing to the wrong workflow version when first started the PR, sorry about that :-(.

pcm32 added 3 commits October 4, 2021 16:27

Expression Atlas smartseq plus tests

130bf5a

Missing dockstore file

3bef624

Checkout with submodules

bbd9742

mvdbeek mentioned this pull request Oct 7, 2021

Update and Install missing ebi_gxa tools galaxyproject/usegalaxy-tools#428

Merged

Point to wf with corrected param iterator and param box for batch cor…

47d6e85

…rection

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expression Atlas Smartseq/Droplet downstream workflow #64

Expression Atlas Smartseq/Droplet downstream workflow #64

pcm32 commented Oct 4, 2021 •

edited

Loading

pcm32 commented Oct 4, 2021

mvdbeek commented Oct 4, 2021

mvdbeek commented Oct 4, 2021

pcm32 commented Oct 4, 2021

pcm32 commented Oct 4, 2021

simleo commented Oct 6, 2021 •

edited

Loading

simleo commented Oct 6, 2021

mvdbeek commented Oct 7, 2021 •

edited

Loading

pcm32 commented Oct 13, 2021

mvdbeek commented Oct 13, 2021

mvdbeek commented Oct 13, 2021

pcm32 commented Oct 13, 2021 •

edited

Loading

pcm32 commented Oct 13, 2021

Expression Atlas Smartseq/Droplet downstream workflow #64

Are you sure you want to change the base?

Expression Atlas Smartseq/Droplet downstream workflow #64

Conversation

pcm32 commented Oct 4, 2021 • edited Loading

pcm32 commented Oct 4, 2021

mvdbeek commented Oct 4, 2021

mvdbeek commented Oct 4, 2021

pcm32 commented Oct 4, 2021

pcm32 commented Oct 4, 2021

simleo commented Oct 6, 2021 • edited Loading

simleo commented Oct 6, 2021

mvdbeek commented Oct 7, 2021 • edited Loading

pcm32 commented Oct 13, 2021

mvdbeek commented Oct 13, 2021

mvdbeek commented Oct 13, 2021

pcm32 commented Oct 13, 2021 • edited Loading

pcm32 commented Oct 13, 2021

pcm32 commented Oct 4, 2021 •

edited

Loading

simleo commented Oct 6, 2021 •

edited

Loading

mvdbeek commented Oct 7, 2021 •

edited

Loading

pcm32 commented Oct 13, 2021 •

edited

Loading