-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add method to calculate embeddings for variable by distance aggregation #807
base: main
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #807 +/- ##
==========================================
- Coverage 69.99% 69.75% -0.24%
==========================================
Files 39 40 +1
Lines 5525 5561 +36
Branches 1029 1037 +8
==========================================
+ Hits 3867 3879 +12
- Misses 1363 1387 +24
Partials 295 295
|
for more information, see https://pre-commit.ci
…se/squidpy into var_by_distance_clustering
hi @LLehner , thank you for this, would you mind elaborating a bit when this would be used? also, what if the embedding are pre-calculated, or the user would like to use something other than the UMAP, should that be an option? finally, I think a test would be required before we get this in, thanks! |
for more information, see https://pre-commit.ci
Hey @giovp, this feature was coming out of a discussion with @maiiashulman. We ran into a situation in which the "literature-curated" signature for hypoxia was either 20 or 4000 genes, the latter obviously being useless. So we wondered which other genes maybe show the same spatially variable pattern as a function of distance to a certain cell-type (e.g. epithelial). This is essentially a graphical method to see if a given set of genes (f.e. the 20 gene signature) even varies in a similar pattern. But I agree with your points; if we see that it's actually doing something useful, we should make it a bit more flexible. |
for more information, see https://pre-commit.ci
…se/squidpy into var_by_distance_clustering
…se/squidpy into var_by_distance_clustering
…se/squidpy into var_by_distance_clustering
for more information, see https://pre-commit.ci
@timtreis this function now returns an Additionally the question is whether a The function call would change from: |
Description
Adds a method in
tools
to calculate embeddings of variables by their counts aggregated by distance.Example usage
import squidpy as sq
load example data set
adata = sq.datasets.seqfish()
Calculate distances of each observation to a specified anchor point (e.g. cell type or tissue location). Here we use cell type "Endothelium" in the annotation column "celltype_mapped_refined":
sq.tl.var_by_distance(adata, groups="Endothelium", cluster_key="celltype_mapped_refined")
The resulting distances are stored in
adata.obsm["design_matrix"]
. Now we can calculate the embeddings, which are returned as a new anndata object:adata_new = sq.tl.var_embeddings(adata, group="Endothelium", design_matrix_key="design_matrix")
Note that by default the bin of distance 0, meaning the counts that belong to the anchor point, are excluded. This can be changed by setting
include_anchor=True
insq.tl.var_embeddings()
.adata_new.X contains the aggregated var x distance_bin count matrix.
adata_new.obs contains the variables as a categorical matrix, which is required to highlight them in plots.
TODO