Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Evaluating Reference Data for Bulk RNA Deconvolution tutorial #5549

Open
wants to merge 96 commits into
base: main
Choose a base branch
from

Conversation

hexhowells
Copy link
Collaborator

New tutorial on evaluating reference data for bulk RNA deconvolution tools, evaluating both MuSiC and NNLS deconvolution tools within Galaxy.

hexhowells and others added 30 commits October 5, 2024 08:18
> - {% icon param-collection %} *"Expression Data"*: `Expression Data`
>
> {% snippet faqs/galaxy/workflows_run.md %}
> 3. Add a tag labelled `#A` to the first "Actual cell proportions" and "Pseudobulk" collections

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not completely sure, but I feel like the term "actual cell proportions" might be a little misleading. The cell proportions, as indicated by proportional representation in the single-cell data, are often different from the true in vivo cell type proportions due to systematic drop out biases during data collection. This might be worth mentioning, or maybe a different term which doesn't use "actual" could be substituted.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its the actual cell proportions for the single-cell data, which is the closest we can get to knowing the true cell proportions for any data. I think its probably the cleanest name to use but I will add a section to mention that these won't be a perfect representation of real cell proportions in vivo.

> >
> > ![Scatter plot comparison](../../images/bulk-deconvolution-evaluate/scatterplot-compare.png "Scatter plot comparison between Music and NNLS")
> >
> > 1. Comparing scatter plots, the MuSiC tool has the most accurate results since the points fall closer onto the x=y line

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imagine the case that the NNLS deconvolution more closely resembled the cell proportions in the real, biological context, while MuSic more accurately recapitulated with proportions from the single cell data. Which if these two methods are really more accurate, then?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At it's core, deconvolution tools are trying to determine the cell proportions of some bulk-RNA data, which would ideally represent the biological sample accurately. So the best that any tool can do is measure the data its been given, since any errors in the sequencing won't be known.

Realistically here, NNLS would be determined to be more accurate but without knowing the true cell proportions of the biological sample (which would kind of render deconvolutional tools useless), the best we can do is assume pseudobulk's from single-cell data are a good representation of actual bulk data. In which case its probably safe to assume that MuSiC would be more accurate.

@@ -507,6 +507,10 @@ Camila-goclowski:
email: [email protected]
linkedin: camila-goclowski

carloscheemendonca:
name: Carlos Chee Mendonça
joined: 2025-01
Copy link
Member

@shiltemann shiltemann Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carloscheemendonca please feel free to edit or add more information about yourself to this entry as you see fit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the tests are still failing because the name of this file is expected to end in -test.yml, so just renaming like deconv-eval-stage-1-create-data-test.yml should fix that. And thanks for adding the testing!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah I renamed them before uploading and forgot to add that back in.

Also, the deconv-eval-stage-1-create-data_child workflow is a sub-workflow used in the deconv-eval-stage-1-create-data workflow. I'm not sure how I should add testing for that or if it's even needed here since I would guess it's part of the parent workflow?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hexhowells shouldn't be necessary for the subworkflow no

also, my bad, it should be -tests.yml (with the s)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants