update content and solutions

stemangiola · stemangiola · commit 7f6c20346269 · 2025-05-09T20:50:25.000+02:00
diff --git a/vignettes/Introduction.Rmd b/vignettes/Introduction.Rmd
@@ -54,7 +54,7 @@ knitr::include_graphics(here("inst/images/physalia-min.png"))
 
 You can view the material at the workshop webpage
 
-[here](https://tidyomics.github.io/tidySpatialWorkshop/articles/main.html).
+[here](https://tidyomics.github.io/tidySpatialWorkshop/index.html).
 
 ## Workshop package installation 
 
@@ -90,6 +90,7 @@ Alternatively download the [git zipped package](https://github.com/tidyomics/tid
 
 # Announcements
 
+Tidyomics is now published in (Nature Methods)[https://www.nature.com/articles/s41592-024-02299-2]. And availabel for (free) here[https://www.biorxiv.org/content/10.1101/2023.09.10.557072v3].
 
 # Introduction to Spatial Omics
 
@@ -100,35 +101,65 @@ sequencing in experimental and analytical contexts.
 
 ### Workshop Structure
 
-#### 1. Welcome and Introduction
+#### Day 1
 
--   Overview of the workshop.
--   Goals for Day 1.
+##### 1. Welcome and Introduction
 
-#### 2. What is Spatial Omics?
+-   Introduction of the instructor
+-   Introduction of the crowd
+-   Overview and goals of the workshop.
+
+##### 2. What is Spatial Omics?
 
 -   Definition and significance in modern biology.
 -   Key applications and impact.
-
-#### 3. Technologies in Spatial Omics
-
 -   Overview of different spatial omics technologies.
 -   Comparison of imaging-based vs sequencing-based approaches.
 
-#### 4. Sequencing Spatial Omics
+##### 3. Sequencing Spatial Omics
 
 -   Detailed comparison of methodologies.
 -   Experimental design considerations.
 -   Data analysis challenges and solutions.
 
-#### 5. Overview of Analysis Frameworks
+##### 5. Analysis of sequencing based spatial data
+
+-   Getting Started with SpatialExperiment.
+-   Data Visualisation and Manipulation.
+-   Quality control and filtering.
+-   Dimensionality reduction.
+-   Spatial Clustering.
+-   Deconvolution of pixel-based spatial data.
+
+#### Day 2
+
+##### 1. Introduction to tidyomics
+
+-   Use tidyverse on spatial, single-cell, pseudobulk and bulk genomic data   
 
--   Introduction to various analysis frameworks.
--   Brief mention of 'tidy' data principles in spatial omics.
+##### 2. Working with tidySpatialExperiment
+
+-   tidySpatialExperiment package
+-   Tidyverse commands
+-   Advanced filtering/gating and pseudobulk
+-   Work with features
+-   Summarisation/aggregation
+-   tidyfying your workflow
+-   Visualisation
+
+#### Day 3
+
+##### 1. Imaging Spatial Omics
+
+-   Detailed comparison of methodologies.
+-   Experimental design considerations.
+-   Data analysis challenges and solutions.
 
-#### 6. Wrap-Up and Q&A
+##### 2. Spatial analyses of imaging data
 
--   Summarize key takeaways.
--   Open floor for questions and discussions.
+-   Working with imaging-based data in Bioconductor with MoleculeExperiment
+-   Aggregation and analysis
+-   Clustering
+-   Neighborhood analyses
 
 
diff --git a/vignettes/Session_1_sequencing_assays.Rmd b/vignettes/Session_1_sequencing_assays.Rmd
@@ -17,8 +17,6 @@ knitr::opts_chunk$set(echo = TRUE, cache = FALSE)
 
 # Session 1: Spatial Analysis of Sequencing Data
 
-Web rendering: https://rpubs.com/mangiolas/1186971
-
 ## Overview
 
 This workshop introduces spatial transcriptomics analysis using the Bioconductor framework, with a particular focus on the `SpatialExperiment` package. Participants will learn essential concepts and practical skills for analyzing spatially-resolved genomic data.
@@ -221,7 +219,7 @@ ggspavis::plotSpots(
 Explore additional visualisation features offered by the Visium platform, exposing the H&E (hematoxylin and eosin) image.
 
 ```{r, fig.width=6, fig.height=6}
-ggspavis::plotVisium(spatial_data)
+ggspavis::plotVisium(spatial_data, point_size = 0.5)
 ```
 
 This visualisation focuses on specific tissue features within the dataset, emphasising areas of interest.
@@ -233,7 +231,8 @@ This visualisation focuses on specific tissue features within the dataset, empha
 ggspavis::plotVisium(
   spatial_data, 
   annotate = "spatialLIBD", 
-  highlight = "in_tissue"
+  highlight = "in_tissue", 
+  point_size =0.5
 ) + 
   facet_wrap(~sample_id)
 
@@ -763,19 +762,7 @@ SPOTlight uses a seeded non-negative matrix factorization regression, initialize
 
 #### Producing the reference for single-cell databases
 
-[cellNexus](https://stemangiola.github.io/cellNexus/) is a query interface that allow the programmatic exploration and retrieval of the harmonised, curated and reannotated CELLxGENE single-cell human cell atlas. Data can be retrieved at cell, sample, or dataset levels based on filtering criteria.
-
-Harmonised data is stored in the ARDC Nectar Research Cloud, and most cellNexus functions interact with Nectar via web requests, so a network connection is required for most functionality.
-
-Mangiola et al., 2025 doi [doi.org/10.1101/2023.06.08.542671](https://www.biorxiv.org/content/10.1101/2023.06.08.542671v3)
-
-```{r, echo=FALSE, out.width="700px"}
-knitr::include_graphics(here("inst/images/curated_atlas_query.png"))
-```
-
-
-
-
+Here, we retrieve and prepare a single-cell RNA reference. The dataset in question, zhong-prefrontal-2018, originates from a study by Zhong et al. (2018), which offers a comprehensive single-cell transcriptomic survey of the human prefrontal cortex during development . Utilising the scRNAseq package, the dataset is fetched and subsequently processed to aggregate counts across cells sharing the same sample and cell type, thereby reducing data complexity and enhancing interpretability. Further filtering steps ensure the removal of empty columns and entries with missing cell type annotations. Finally, the logNormCounts function from the scuttle package is applied to perform log-normalisation, a crucial step for mitigating technical variability and preparing the data for accurate comparative analyses .
 
 ```{r, message=FALSE, warning=FALSE,  fig.width=6, fig.height=6}
 # Get reference
@@ -942,6 +929,9 @@ No, let's look at the correlation matrices to see which cell type are most often
 
 plotCorrelationMatrix(res$mat)
 ```
+```{r}
+mat_df = as.data.frame(res$mat)
+```
 
 #### Excercise
 
@@ -954,36 +944,112 @@ Rather than looking at the correlation matrix, overall, let's observe whether th
 :::
 
 
-```{r, fig.width=6, fig.height=6}
-res_spatialLIBD = split(data.frame(res$mat), colData(spatial_data_gene_name)$spatialLIBD ) 
+::: {.note}
+**Exercise 1.5**
+
+## Exercise 1.5 (adapted to your current cell types)
+
+Some of the most positive correlations in the new matrix are seen between:
+
+- **Microglia** and **Neurons**  
+- **Astrocytes** and **Stem.cells**
+
+> **Microglia** are the resident immune cells of the central nervous system, constantly surveying the parenchyma and clearing debris.  
+> **Neurons** are the electrically excitable cells that transmit and process information via synaptic connections.  
+> **Astrocytes** are star-shaped glia that support neuronal metabolism, regulate extracellular ions and neurotransmitter uptake.  
+> **Stem.cells** denote undifferentiated progenitors capable of self-renewal and differentiation into multiple neural lineages.
+
+Let us now **visualise** where these pairs of cell types most co-occur in your spatial map. For **each** pair, carry out the following:
+
+1. **Label** any pixel where both cell types exceed 10 % abundance (i.e. > 0.1).  
+2. **Label** any pixel where the _sum_ of their abundances exceeds 40 % (i.e. > 0.4).  
+3. **Plot** the spatial coordinates of all pixels, **colouring** them by this new label (for example:  
+   - `0` = neither condition met  
+   - `1` = both abundances > 0.1  
+   - `2` = summed abundance > 0.4  
+
+You should end up with two analogous visualisations:
+
+- **Microglia + Neurons**  
+- **Astrocytes + Stem.cells**
+
+Feel free to reuse your previous code, simply substituting the cell-type columns and updating the thresholds as above.
+
+:::
+
+
+
+#### Bonus - Alternative reference from the Human Cell Atlas - using cellNexus
 
-lapply(res_spatialLIBD, function(x) plotCorrelationMatrix(as.matrix(x[,-10]))) 
+[cellNexus](https://stemangiola.github.io/cellNexus/) is a query interface that allow the programmatic exploration and retrieval of the harmonised, curated and reannotated CELLxGENE single-cell human cell atlas. Data can be retrieved at cell, sample, or dataset levels based on filtering criteria.
+
+Harmonised data is stored in the ARDC Nectar Research Cloud, and most cellNexus functions interact with Nectar via web requests, so a network connection is required for most functionality.
+
+Mangiola et al., 2025 doi [doi.org/10.1101/2023.06.08.542671](https://www.biorxiv.org/content/10.1101/2023.06.08.542671v3)
+
+```{r, echo=FALSE, out.width="700px"}
+knitr::include_graphics(here("inst/images/curated_atlas_query.png"))
 ```
 
 
+```{r, eval = FALSE, message=FALSE, warning=FALSE,  fig.width=3, fig.height=3}
+# Get reference
+library(cellNexus)
+library(HDF5Array)
 
-::: {.note}
-**Exercise 1.5**
+tmp_file_path = tempfile()
+
+brain_reference =
+  
+  # Query metadata across 30M cells
+  get_metadata() |>
+  
+  # Filter your data of interest
+  dplyr::filter(tissue_groups=="cerebral lobes and cortical areas", disease == "Normal") |> 
+  
+  # Collect pseudobulk as SummarizedExperiment
+  get_pseudobulk() |> 
+  
+  # Normalise for Spotlight
+  scuttle::logNormCounts() |> 
+  
+  # Save for fast reading
+  HDF5Array::saveHDF5SummarizedExperiment(tmp_file_path, replace = TRUE)
+```
 
-Some of the most positive correlations involve the endothelial cells with Oligodendrocytes and Leptomeningeal cells.
+```{r, eval = FALSE, message=FALSE}
+library(HDF5Array)
 
-Leptomeningeal cells refer to the cells that make up the leptomeninges, which consist of two of the three layers olet's meninges surrounding the brain and spinal cord: the arachnoid mater and the pia mater. These layers play a critical role in protecting the central nervous system and assisting in various physiological processes.
+brain_reference = HDF5Array::loadHDF5SummarizedExperiment(tmp_file_path)
 
-Oligodendrocytes are a type of glial cell in the central nervous system (CNS) of vertebrates, including humans and mouse. These cells are crucial for the formation and maintenance of the myelin sheath, a fatty layer that encases the axons of many neurons.
+my_metadata = colData(brain_reference)
 
-Let's try to visualise the pixel where these cell types most occur.
+knitr::kable(head(my_metadata), format = "html")
+```
 
+These are the cell types included in our reference, and the number of pseudobulk samples we have for each cell type.
 
-- Label pixel that have > 10% (> 0.1) endothelial_cell and leptomeningeal_cell
-- Label pixels that have > 40% (> 0.4) across these two cells
-- Plot pixels colouring by the new label
+```{r, eval = FALSE}
 
-:::
+table(brain_reference$cell_type_harmonised)
 
-```{r}
-mat_df = as.data.frame(res$mat)
 ```
 
+These are the number of samples we have for each of the three data sets.
+
+```{r, eval = FALSE}
+
+table(brain_reference$dataset_id)
+```
+
+The `collection_id` can be used to gather information on the cell database. e.g. <https://cellxgene.cziscience.com/collections/><collection_id>
+
+```{r, eval = FALSE}
+table(brain_reference$collection_id)
+```
+
+
+
 **Session Information**
 
 ```{r}
diff --git a/vignettes/Solutions.Rmd b/vignettes/Solutions.Rmd
@@ -5,7 +5,7 @@ author:
 output: rmarkdown::html_vignette
 # bibliography: "`r file.path(system.file(package='tidySpatialWorkshop', 'vignettes'), 'tidyomics.bib')`"
 vignette: >
-  %\VignetteIndexEntry{Sequencing assays}
+  %\VignetteIndexEntry{Solutions to exercises}
   %\VignetteEncoding{UTF-8}
   %\VignetteEngine{knitr::rmarkdown}
 ---
@@ -96,31 +96,31 @@ lapply(res_spatialLIBD, function(x) plotCorrelationMatrix(as.matrix(x[,-10])))
 
 ```{r, fig.width=7, fig.height=8, eval=FALSE}
 
-
-
-is_endothelial_leptomeningeal = mat_df$endothelial_cell >0.1 & mat_df$leptomeningeal_cell>0.1 & mat_df$endothelial_cell + mat_df$leptomeningeal_cell > 0.4 
-
-spatial_data$is_endothelial_leptomeningeal = is_endothelial_leptomeningeal
-
-ggspavis::plotSpots(spatial_data, annotate = "is_endothelial_leptomeningeal") +
-    facet_wrap(~sample_id) +
-  scale_color_manual(values = c("TRUE"= "red", "FALSE" = "grey"))
-theme(legend.position = "none") +
-  labs(title = "endothelial + leptomeningeal")
-
-
-
-
-is_endothelial_oligodendrocytes = mat_df$endothelial_cell >0.1 & mat_df$oligodendrocyte>0.05 & mat_df$endothelial_cell  + mat_df$oligodendrocyte > 0.4 
-
-spatial_data$is_endothelial_oligodendrocyte = is_endothelial_oligodendrocytes
-
-ggspavis::plotSpots(spatial_data, annotate = "is_endothelial_oligodendrocyte") +
-    facet_wrap(~sample_id) +
-  scale_color_manual(values = c("TRUE"= "blue", "FALSE" = "grey"))
-theme(legend.position = "none") +
-  labs(title = "endothelial + oligodendrocyte")
-
+# 1. Microglia + Neurons
+is_microglia_neuron <- mat_df$Microglia > 0.1 &
+                       mat_df$Neurons   > 0.1 &
+                       (mat_df$Microglia + mat_df$Neurons) > 0.4
+spatial_data$is_microglia_neuron <- is_microglia_neuron
+
+ggspavis::plotSpots(spatial_data, annotate = "is_microglia_neuron") +
+  facet_wrap(~sample_id) +
+  scale_color_manual(values = c("TRUE" = "red", "FALSE" = "grey")) +
+  theme(legend.position = "none") +
+  labs(title = "Microglia + Neurons")
+
+
+# 2. Astrocytes + Stem cells
+# note the space in the column name — use backticks
+is_astrocyte_stem <- mat_df$Astrocytes     > 0.1 &
+                     mat_df$`Stem cells`   > 0.1 &
+                     (mat_df$Astrocytes + mat_df$`Stem cells`) > 0.4
+spatial_data$is_astrocyte_stem <- is_astrocyte_stem
+
+ggspavis::plotSpots(spatial_data, annotate = "is_astrocyte_stem") +
+  facet_wrap(~sample_id) +
+  scale_color_manual(values = c("TRUE" = "blue", "FALSE" = "grey")) +
+  theme(legend.position = "none") +
+  labs(title = "Astrocytes + Stem cells")
 ```
 
 **Excercise 1.6**