Skip to content

Commit 39aa426

Browse files
authored
Merge pull request #8 from Malvikakh/edits_vignette_s3
Updated vignette and solutions
2 parents 014c5a5 + b73084c commit 39aa426

File tree

2 files changed

+72
-30
lines changed

2 files changed

+72
-30
lines changed

vignettes/Session_3_imaging_assays.Rmd

Lines changed: 48 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: "Imaging assays (tidy)"
33
author:
44
- Stefano Mangiola, South Australian immunoGENomics Cancer Institute^[<[email protected]>], Walter and Eliza Hall Institute^[<mangiola.s at wehi.edu.au>]
5-
- Malvica Kharbanda, South Australian immunoGENomics Cancer Institute^[<[email protected]>]
5+
- Malvika Kharbanda, South Australian immunoGENomics Cancer Institute^[<[email protected]>]
66
output: rmarkdown::html_vignette
77
# bibliography: "`r file.path(system.file(package='tidySpatialWorkshop', 'vignettes'), 'tidyomics.bib')`"
88
vignette: >
@@ -75,6 +75,10 @@ library(scico)
7575

7676
This [data package](https://bioconductor.org/packages/release/data/experiment/html/SubcellularSpatialData.html) contains annotated datasets localized at the sub-cellular level from the STOmics, Xenium, and CosMx platforms, as analyzed in the publication by [Bhuva et al., 2025](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-024-03241-7). It includes raw transcript detections and provides functions to convert these into `SpatialExperiment` objects.
7777

78+
The data in this workshop we will be analyzing is the Xenium Mouse Brain dataset. The dataset has 3 serial sections of fresh frozen mouse brain. Raw transcript level data is provided with region annotations for each detection.
79+
80+
The data is stored in the `ExperimentHub` package, and can be downloaded using the following queries:
81+
7882
```{r, eval=FALSE}
7983
8084
# To avoid error for SPE loading
@@ -92,11 +96,14 @@ tx |> filter(sample_id=="Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs") |> nrow(
9296

9397
#### An overview of the data
9498

99+
The data is however very large and thus we will work with a small subset of the data.
95100

96101
```{r, fig.width=7, fig.height=8, eval=FALSE}
97102
tx_small = tx[sample(seq_len(nrow(tx)), size = nrow(tx)/500),]
98103
```
99104

105+
However, since the data is very large, for the convenience of the workshop, we will directly download the small subset of the data.
106+
100107
```{r, echo=FALSE}
101108
# To avoid error for SPE loading
102109
# https://support.bioconductor.org/p/9161859/#9161863
@@ -120,7 +127,7 @@ tx_small |>
120127
)
121128
```
122129

123-
We can appreciate how, even subsampling the data 1 in 500, we still have a vast amount of data to visualise.
130+
This dataset have been annotated for regions. Here we plot the regions in the sample. We can appreciate how, even subsampling the data 1 in 500, we still have a vast amount of data to visualise.
124131

125132
```{r, fig.width=7, fig.height=8}
126133
tx_small |>
@@ -132,36 +139,36 @@ tx_small |>
132139
theme(legend.position = "none")
133140
```
134141

135-
This dataset have been annotated for regions. Let's have a look how many regions have been annotated
142+
Let's have a look how many regions have been annotated
136143

137144
```{r}
138145
tx_small |>
139146
distinct(region)
140147
```
141148

142-
From this large dataset, we select a small reagion for illustrative purposes
149+
From this large dataset, we select a small reagion for illustrative purposes.
143150

144151
```{r, eval=FALSE}
145152
tx_small_region =
146153
tx |>
147154
filter(x |> between(3700, 4200), y |> between(5000, 5500))
148155
```
149156

150-
Load the pre-saved data
157+
If you do not have tx loaded from before, load the pre-saved data:
151158

152-
```{r, echo=FALSE}
159+
```{r}
153160
tx_small_region_file = tempfile()
154161
utils::download.file("https://zenodo.org/records/11213155/files/tx_small_region.rda?download=1", destfile = tx_small_region_file)
155162
load(tx_small_region_file)
156163
```
157164

158165
### 2. MoleculeExperiment
159166

160-
The R package MoleculeExperiment includes functions to create and manipulate objects from the newly introduced MoleculeExperiment class, designed for analyzing molecule-based spatial transcriptomics data from platforms such as Xenium by 10X, CosMx SMI by Nanostring, and Merscope by Vizgen, among others.
167+
The R package MoleculeExperiment includes functions to create and manipulate objects from the newly introduced MoleculeExperiment class, designed for analysing molecule-based spatial transcriptomics data from platforms such as Xenium by 10X, CosMx SMI by Nanostring, and Merscope by Vizgen, among others.
161168

162-
Although in this session we will not use `MoleculeExperiment` class, because of the lack of segmentation boundary information (we rather have cell identifiers), we briefly introduce this package because as an important part of Bioconductor.
169+
`MoleculeExperiment` class uses cell boundary information instead of cell identifiers. And thus we won't use `MoleculeExperiment` directly. However, as it is an important part of bioconductor we briefly introduce this package.
163170

164-
We show how we would import our table of probe location into a `MoleculeExperiment`. At the end of the Session, for knowledge, we will navigate the example code given in the [vignette material](https://www.bioconductor.org/packages/release/bioc/vignettes/MoleculeExperiment/inst/doc/MoleculeExperiment.html).
171+
We show how we would import our table of probe location into a `MoleculeExperiment`. For this section, we will go through the example code given in the [vignette material](https://www.bioconductor.org/packages/release/bioc/vignettes/MoleculeExperiment/inst/doc/MoleculeExperiment.html).
165172

166173
```{r, fig.width=7, fig.height=8}
167174
@@ -173,7 +180,7 @@ repoDir = paste0(repoDir, "/xenium_V1_FF_Mouse_Brain")
173180
me = readXenium(repoDir, keepCols = "essential")
174181
me
175182
```
176-
In this object, besides the single molecule location, we have cell segmentation boundaries. We can use these boudaries to understand subcellular localisation of molecules and to aggregate molecules in cells.
183+
In this object, besides the single molecule location, we have cell segmentation boundaries. We can use these boundaries to understand subcellular localisation of molecules and to aggregate molecules in cells.
177184

178185
```{r, fig.width=7, fig.height=8}
179186
ggplot_me() +
@@ -186,7 +193,7 @@ ggplot_me() +
186193
)
187194
```
188195

189-
In this object we don't only have the cell segmentation but the nucleous segmentation as well.
196+
In this object we don't only have the cell segmentation but the nucleus segmentation as well.
190197

191198
```{r, fig.width=7, fig.height=8}
192199
boundaries(me, "nucleus") = readBoundaries(
@@ -224,7 +231,7 @@ rm(me)
224231
gc()
225232
```
226233

227-
We can organise our large data frame containing single molecules into a more efficient `MoleculeExperiment`.
234+
`MoleculeExperiment` also has functions such as `dataframeToMEList()` and then `MoleculeExperiment()` where we can organise our large data frame containing single molecules into a more efficient `MoleculeExperiment` object.
228235

229236
```{r}
230237
library(MoleculeExperiment)
@@ -269,7 +276,7 @@ gc()
269276

270277
#### A preview of a zoomed in section of the tissue
271278

272-
Now let's try to visualise just a small section. You can appreciate, coloured by cell, single molecules. You cqn also appreciate the difference in density between regions. An aspect to note, is that not all probes are withiin cells. This depends on the segmentation process.
279+
Now let's try to visualise just a small section, `tx_small_region`, that we downloaded earlier. You can appreciate, single molecules are coloured by cell. You can also appreciate the difference in density between regions. An aspect to note, is that not all probes are within cells. This depends on the segmentation process.
273280

274281
```{r, fig.width=10, fig.height=10}
275282
brewer.pal(7, "Set1")
@@ -296,43 +303,43 @@ tx_small_region |>
296303
filter(is.na(cell)) |>
297304
ggplot(aes(x, y, colour = factor(cell))) +
298305
geom_point(shape=".") +
299-
300306
facet_wrap(~sample_id, ncol = 2) +
301307
scale_color_manual(values = sample(colorRampPalette(brewer.pal(8, "Set2"))(1800))) +
302308
coord_fixed() +
303309
theme_minimal() +
304310
theme(legend.position = "none")
305311
```
306312

307-
```{r}
308-
rm(tx_small_region)
309-
gc()
310-
```
311-
312313
::: {.note}
313314
**Exercise 3.1**
314315

315316
We want to understand how much data we are discarding, that does not have a cell identity.
316317

317318
- Using base R grammar calculate what is the ratio of outside-cell vs within-cell, probes
318319
- Reproduce the same calculation with `tidyverse`
320+
- Calculate the percentage of probes are within the cytoplasm but outside the nucleus
319321

320322
:::
321323

324+
```{r}
325+
rm(tx_small_region)
326+
gc()
327+
```
328+
322329
### 3. Aggregation and analysis
323330

324331
We will convert our cell by gene count to a `SpatialExperiment`. This object stores a cell by gene matrix with relative XY coordinates.
325332

326-
`SubcellularSpatialData` has a utility function that aggregated the single molecules in cells, where these cell ID have been identified with segmentation.
333+
`SubcellularSpatialData` package has a utility function that aggregated the single molecules in cells, where these cell ID have been identified with segmentation.
327334

328335

329336
```{r, eval=FALSE}
330337
tx_spe = SubcellularSpatialData::tx2spe(tx)
331338
332339
tx_spe = tx_spe |> mutate(in_tissue = TRUE)
333340
```
334-
335-
```{r, echo=FALSE}
341+
If you do not have tx loaded from before, load the pre-saved data converted to `SpatialExperiment`:
342+
```{r}
336343
tx_spe_file = tempfile()
337344
utils::download.file("https://zenodo.org/records/11213166/files/tx_spe.rda?download=1", destfile = tx_spe_file)
338345
# load("~/Downloads/tx_spe.rda")
@@ -351,7 +358,7 @@ Let have a look to the `SpatialExperiment`.
351358
tx_spe
352359
```
353360

354-
A trivial edit to work with `ggspavis.`
361+
Here we introduce the `ggspavis` package to visualize spatial transcriptomics data. This package requires a column called `in_tissue` to be present in the `SpatialExperiment` object. Here we edit our data include this column.
355362

356363
```{r}
357364
tx_spe = tx_spe |> mutate(in_tissue = TRUE)
@@ -379,7 +386,6 @@ We normalise the `SpatialExperiment` using `scater`.
379386
```{r}
380387
tx_spe =
381388
tx_spe |>
382-
383389
# Scaling and tranformation
384390
scater::logNormCounts()
385391
```
@@ -479,7 +485,7 @@ tx_spe_sample_1 =
479485
tx_spe_sample_1 |> select(.cell, clusters)
480486
```
481487

482-
As we have done before, we caculate UMAPs for visualisation purposes.
488+
As we have done before, we caclculate UMAPs for visualisation purposes.
483489

484490
::: {.note}
485491
This step takes long time.
@@ -519,7 +525,7 @@ In the previous sections we have seen how to do gene marker selection for sequen
519525
:::
520526

521527

522-
Too understand whether the cell clusters explain morphology as opposed to merely cell identity, we can color cells according to annotated region. As we can see we have a lot of regions. We have more regions that cell clusters.
528+
To understand whether the cell clusters explain morphology as opposed to merely cell identity, we can color cells according to annotated region. As we can see we have a lot of regions. We have more regions that cell clusters.
523529

524530
```{r, fig.width=7, fig.height=8}
525531
@@ -572,7 +578,7 @@ Algorithm:
572578

573579
- Cell-level annotations provided by users are used to construct a cell annotation matrix
574580

575-
- Identify cellular neighborhoods uses the SoftMax function, enhanced by a "shape" parameter that governs the "influence radious". This measures probability of a cell type to be found in a neighbour.
581+
- Identify cellular neighborhoods uses the SoftMax function, enhanced by a "shape" parameter that governs the "influence radius". This measures probability of a cell type to be found in a neighbour.
576582

577583
- The K-means clustering algorithm finds recurring neighbours
578584

@@ -610,9 +616,17 @@ hoods[1:2, 1:10]
610616

611617
We plot randomly plot 50 cells to see the output of neighborhood scanning using plotHoodMat. In this plot, each value represent the probability of the each cell (each row) located in each cell type neighborhood. The rowSums of the probability maxtrix will always be 1.
612618

613-
```{r, fig.width=7, fig.height=8}
619+
```{r, fig.width=6, fig.height=8}
614620
hoods |>
615-
plotHoodMat(n = 50)
621+
as.data.frame() |>
622+
rownames_to_column(var = "cell") |>
623+
mutate(
624+
cell = str_replace(cell, ".*outs_(\\d+)$", "Xenium_\\1")
625+
) |>
626+
column_to_rownames(var = "cell") |>
627+
as.matrix() |>
628+
plotHoodMat(n = 50)
629+
616630
```
617631

618632
We can then merge the neighborhood results with the `SpatialExperiment` object using `mergeHoodSpe` so that we can conduct more neighborhood-related analysis.
@@ -623,7 +637,7 @@ tx_spe_sample_1 = tx_spe_sample_1 |> mergeHoodSpe(hoods)
623637
tx_spe_sample_1
624638
```
625639

626-
We can see what are the neighborhood distributions look like in each cluster using `plotProbDist.`
640+
We can see what are the neighborhood distributions look like in each cluster using `plotProbDist`. Here we only plot 10 clusters
627641

628642
```{r, fig.width=10, fig.height=10}
629643
tx_spe_sample_1 |>
@@ -635,6 +649,10 @@ tx_spe_sample_1 |>
635649
)
636650
```
637651

652+
The clusters can then be plot on the tissue using `plotissue`
653+
```{r fig.width=7, fig.height=5}
654+
tx_spe_sample_1 |> plotTissue(color = clusters)
655+
```
638656

639657

640658

vignettes/Solutions.Rmd

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -376,6 +376,30 @@ tx_spe_sample_1 |>
376376
377377
```
378378

379+
::: {.note}
380+
**Exercise 3.1**
381+
:::
382+
383+
```{r, eval=FALSE}
384+
# Base R
385+
out_in_table <- table(is.na(tx_small_region$cell))
386+
ratio_out_in <- out_in_table[2] / out_in_table[1]
387+
ratio_out_in
388+
389+
# Tidyverse
390+
tx_small_region |>
391+
summarise(
392+
ratio_out_in = sum(is.na(cell)) / sum(!is.na(cell))
393+
)
394+
395+
tx_small_region |>
396+
summarise(
397+
cytoplasm = count(overlaps_nucleus == "0" & !is.na(cell)),
398+
nucleus = count(overlaps_nucleus == "1" & !is.na(cell)),
399+
cytoplasm_ptc = cytoplasm / (cytoplasm + nucleus) * 100
400+
)
401+
```
402+
379403

380404

381405

0 commit comments

Comments
 (0)