-
Notifications
You must be signed in to change notification settings - Fork 5
Analysis
- A) Raw image processing
- (A.1) Image extraction
- (A.2) Image normalisation
- (A.3) Image thresholding and masking
- B) Pixel-based analysis
- C) Cell-based analysis
- (C.1) Cell segmentation
- (C.2) Cell masking
- (C.3) Cell masking visualisation
- (C.4A.1) Unsupervised clustering
- (C.4A.2) Unsupervised clustering visualisation
- (C.4B.1) Expression thresholding
- (C.4B.2) Expression thresholding visualisation
- (C.5A.1) Homotypic spatial analysis
- (C.5A.2) Homotypic spatial analysis visualisation
- (C.5B.1) Heterotypic spatial analysis
- (C.5B.2) Heterotypic spatial analysis visualisation
The first step in SIMPLI analysis workflow is the preprocessing of raw images and it consists of 3 processes:
- (A.1) Image extraction
- (A.2) Image normalisation
- (A.3) Image thresholding and masking
In this process tiff files are extracted from the raw acquisition data from imaging mass cytometry (IMC) experiments. This process should be skipped if the input data does not consist of raw IMC data. See the input page for more details.
Inputs and parameters:
-
raw_metadata_file
with the ROI metadata. -
channel_metadata_file
with the [metal and channel metadata]https://github.com/ciccalab/SIMPLI/wiki/Input#channel-metadata-file). -
tiff_type
type of tiff output ("single"
or"ome"
).
Outputs:
- Images: Images (uncompressed 16 bit tiff) can be output in two different formats:
- single channel tiff files (one for each of the selected channels) (
$output_folder/Images/Raw/sample_name/sample_name-label-raw.tiff
) - .ome.tiff files (one per sample, the order of channels is the same as in the the
channel_metadata
file). ($output_folder/Images/Raw/sample_name/sample_name-all_raw.ome.tiff
)
- single channel tiff files (one for each of the selected channels) (
- Metadata:
- Metadata for all images from all samples:
$output_folder/Images/Raw/raw_tiff_metadata.csv
- By sample metadata for the raw images is also output at at:
$output_folder/Images/Raw/sample_name/sample_name-raw_tiff_metadata.csv
- Metadata for all images from all samples:
The output of this process is located at: $output_folder/Images/Raw/
This process can be skipped by setting the skip_conversion
parameter to true
.
This process performs 99th percentile normalisation of the raw tiff images generated in the Image extraction process or specified by the user with if the image extraction process is skipped.
Inputs and parameters:
-
raw_metadata_file
with the raw tiff image metadata. -
tiff_type
type of tiff output ("single"
or"ome"
).
Outputs:
- Normalised Images: Images (uncompressed 16 bit tiff) can be output in two different formats:
- single channel tiff files (one for each of the selected channels) (
$output_folder/Images/Normalized/sample_name/sample_name-label-normalized.tiff
) - .ome.tiff files (one per sample, the order of channels is the same as in the the
channel_metadata
file). (output_folder/Images/Normalized/sample_name/sample_name-ALL-normalized.ome.tiff
)
- single channel tiff files (one for each of the selected channels) (
- Metadata:
- Metadata for all images from all samples:
$output_folder/Images/Normalized/normalized_tiff_metadata.csv
- By sample metadata for the raw images is also output at at:
-
$output_folder/Images/Raw/sample_name/sample_name-normalized_tiff_metadata.csv
in long format. -
$output_folder/Images/Raw/sample_name/sample_name-normalized_tiff_metadata.csv
in CellProfiler4 compatible wide format.
-
- Metadata for all images from all samples:
The output of this process is located at: $output_folder/Images/Normalized/
This process can be skipped by setting the skip_normalization
parameter to true
.
This process is used to perform the image preprocessing that will generate the final images, which can then be used as input for the pixel-based or the cell-based analysis. The input images for this process can be derived from:
- images generated in the Image normalisation process.
- images generated in the Image extraction process if the Image normalisation process is skipped.
- images specified by the user with the
normalized_metadata_file
file if the image extraction and the image normalisation processes are skipped.
Inputs and parameters:
-
cp4_preprocessing_cppipe
Path to the CellProfiler4 pipeline file used for image preprocessing. See the CellProfiler4 pipeline page for its requirements. -
normalized_metadata_file
with the raw tiff image metadata.
Outputs:
- Preprocessed Images: (uncompressed 16 bit single-channel tiff)
$output_folder/Images/Preprocessed/sample_name/sample_name-label-Preprocessed.tiff
- Metadata:
- Metadata for all images from all samples
$output_folder/Images/Preprocessed/preprocessed_tiff_metadata.csv
- By sample metadata for the raw images is also output at at:
-
$output_folder/Images/Preprocessed/sample_name/sample_name-preprocessed_metadata.csv
in long format. -
$output_folder/Images/Preprocessed/sample_name-cp4-preprocessed_metadata.csv
in CellProfiler4 compatible wide format.
-
- Metadata for all images from all samples
The output of this process is located at: $output_folder/Images/Preprocessed/
This process can be skipped by setting the skip_preprocessing
parameter to true
.
The pixel-based approach implemented in SIMPLI enables the quantification of pixels which are positive for a specific marker or combination of markers. These marker-positive areas can be normalised over the area of the whole image, or the areas of an image mask defined by a the combination of any of the input images with logical operators.
This process measures the areas of interest and normalises them on the selected image masks according to the input metadata. The input images for this process can be derived from:
- images generated in the image thresholding and masking process.
- images specified by the user with the
preprocessed_metadata_file
file if the image thresholding and masking process is skipped.
Inputs and parameters:
-
preprocessed_metadata_file
with the tiff image metadata. -
area_measurements_metadata
Path to thearea_measurements_metadata
file, it has two columns:-
marker
= Marker or combination of markers whose area should be measured. -
main_marker
= Marker or combination of markers whose area should be used to normalise the area of marker. Ifmain_marker
is the same asmarker
then the whole area of the image is used for normalisation.
-
marker
and main_marker
value should be either a value from the label
column of the preprocessed_metadata_file
or a combination of those values with logical operators (AND = &
, OR = |
, NOT = !
, ()
= round brackets).
Outputs:
The area measurements are saved in $output_folder/area_measurements.csv
. The file has the following columns:
-
sample_name
= Sample name. -
main_marker
= Combination of markers used to normalize themarker
area
. -
marker
= Main combination of markers measured. -
area
= Area positive for themarker
combination of markers. -
main_marker_area
= Area positive for themain_marker
combination of markers. -
total_ROI_area
= Total image area for this sample. -
percentage
= Area of the marker (area
) / area of the main marker (main_marker_area
) * 100.
All areas are in pixel2.
This process can be skipped by setting the skip_area
parameter to true
.
Generate boxplots showing the comparisons of the distributions of normalised marker-positive areas between 2 categories of samples. The input data for this process can be derived from:
- images generated in the image thresholding and masking process.
- images specified by the user with the
preprocessed_metadata_file
file if the image thresholding and masking process is skipped.
Inputs and parameters:
-
sample_metadata_file
with the metadata of all samples used in the analysis. -
area_measurements_file
Path to thearea_measurements_file
it should have the following columns:-
sample_name
= Sample name, should match a value in thesample_metadata_file
metadata file. -
main_marker
= Marker or combination of marker used for normalisation. -
marker
= Marker or combination of marker used to calculate the area. -
percentage
= Area of the marker / area of the main marker * 100.
-
FDR is calculated using the number of different marker
values for each value of main_marker
.
Outputs:
The area measurements are saved in $output_folder/Plots/Area_Plots/Boxplots/
a separate folder is created for each main_marker
.
For each main_marker
a pdf file ($output_folder/Plots/Area_Plots/Boxplots/main_marker/main_marker_area_boxplots.pdf
) containing a boxplot for each value of marker
associated to that main_marker
.
The output of this process is located at: $output_folder/Plots/Area_Plots/Boxplots/
This process can be skipped by setting the skip_area_visualization
parameter to true
.
The cell-based analysis aims to investigate the qualitative and quantitative cell representation within the imaged tissue through (1) cell segmentation, cell phenotyping by unsupervised clustering or expression thresholding and spatial analysis of cell densities (homotypic spatial analysis) and distances (heterotypic spatial analysis). The steps of the cell-based analysis are:
- Single-cell data extraction:
- (C.1) Cell segmentation
- (C.2) Cell masking
- (C.3) Cell masking visualisation
- Cell phenotyping:
- (C.4A.1) Unsupervised clustering
- (C.4A.2) Unsupervised clustering visualisation
- (C.4B.1) Expression thresholding
- (C.4B.2) Expression thresholding visualisation
- Spatial analysis:
- (C.5A.1) Homotypic spatial analysis
- (C.5A.2) Homotypic spatial analysis visualisation
- (C.5B.1) Heterotypic spatial analysis
- (C.5B.2) Heterotypic spatial analysis visualisation
Generate single-cell data is .csv
format and the cell masks in tiff format. The input data for this process can be derived from:
- images generated in the image thresholding and masking process.
- images specified by the user with the
preprocessed_metadata_file
file if the image thresholding and masking process is skipped.
Inputs and parameters:
-
preprocessed_metadata_file
with the tiff image metadata. -
cp4_segmentation_cppipe
Path to the CellProfiler4 pipeline file used for cell segmentation. See the CellProfiler4 pipeline page for its requirements.
Outputs:
-
Single cell data:
- Single cell data for all samples:
$output_folder/Segmentation/unannotated_cells.csv
- Single cell data for each sample separately:
$output_folder/Segmentation/sample_name/sample_name-Cells.csv
The single-cell data is a.csv table with a row for each cell and the following annotations:
-
ImageNumber
: CellProfiler4 specific image identifier. -
ObjectNumber
: Unique identity number from 1 to 216-1, matches the corresponding pixels in the cell masks. -
Metadata_sample_name
: Matching thesample_name
values in thepreprocessed_metadata_file
. -
Location_Center_X
andLocation_Center_Y
: Location of the cell centroid in the image in pixel, used for both the homotypic and heterotypic spatial analyses. - CellProfiler4 marker intensity measurements: Used for cell phenotyping by Unsupervised clustering or by Expression thresholding
The exact set of fields and their order depends on the CellProfiler4 pipeline used in the analysis.
- Single cell data for all samples:
-
Cell masks:
Cell masks in uint16 tiff format:$output_folder/Segmentation/sample_name/sample_name-Cell_Mask.tiff
To each cell is associated a unique identity number from 1 to 216-1. All the pixel belonging to a given cell have their value set to its identity number. Pixels not belonging to any cell are set to 0.
These images are compatible with several other tools for downstream analysis including:- CellProfiler4: The cells can be imported as objects from the image.
- Histocat
- cytomapper
The output of this process is located at: $output_folder/Segmentation/
This process can be skipped by setting the skip_segmentation
parameter to true
.
This process allows to identify cells belonging to different populations or tissue compartments according to the overlap of their areas with those of specific masks:
The input images for this process can be derived from:
- images generated in the image thresholding and masking process.
- images specified by the user with the
preprocessed_metadata_file
file if the image thresholding and masking process is skipped.
The input cell masks for this process can be derived from:
- cell masks generated in the cell segmentation process.
- cell masks specified by the user with the
single_cell_masks_metadata
file if the cell segmentation process is skipped.
The input cell data for this process can be derived from:
- cell data generated in the cell segmentation process.
- cell data specified by the user with the
preprocessed_metadata_file
file if the cell segmentation process is skipped.
Inputs and parameters:
-
preprocessed_metadata_file
with the tiff image metadata. -
single_cell_masks_metadata
with the following columns:-
sample_name
= Sample name matching a value in thepreprocessed_metadata_file
file -
label
="Cell_Mask"
-
file_name
= path to a cell mask in uint16 tiff format
-
-
single_cell_data_file
= A .csv file with the following columns:-
Metadata_sample_name
= Sample name matching a value in thepreprocessed_metadata_file
file -
ObjectNumber
= Unique number identifying the pixel belonging to the cell in cell mask.
-
-
cell_masking_metadata
= A .csv file indicating which masks to use and which thresholds of overlap to apply, it should have the following columns:-
cell_type
= name of the cell type being identified. -
threshold_marker
= marker to use as mask. It should match a value in the label column of thepreprocessed_metadata_file
. It can be a combination of markers specified with logical operators (AND =&
, OR =|
, NOT =!
,()
= round brackets). -
threshold_value
= 1 - fraction of area overlap between the cell and the mask. Cells whose area is overlapping the mask by a fraction higher than threshold marker are considered as positive.
-
If a cell is positive for more than one cell type, than it is assigned to the cell type defined first (by row order) in the cell_masking_metadata
file. Cells negative for all cell_types are marked as UNASSIGNED
.
Outputs:
The annotated cell table is a .csv table with the same columns as the table plus the following annotations:
-
cell_type
: Name used to identify the cell type during the analysis. -
CellName
: Unique Cell identity string in the form:Metadata_sample_name_ObjectNumber
The cell type level table is saved at:$output_folder/annotated_cells.csv
This process can be skipped by setting the skip_cell_type_identification
parameter to true
.
This process allows to plot the results of the cell masking process. The input cell masks for this process can be derived from:
- cell masks generated in the cell segmentation process.
- cell masks specified by the user with the
single_cell_masks_metadata
file if the cell segmentation process is skipped.
The input cell data for this process can be derived from:
- cell data generated in the cell segmentation process.
- cell data specified by the user with the
preprocessed_metadata_file
file if the cell segmentation process is skipped.
Inputs and parameters:
-
sample_metadata_file
with the metadata of all samples used in the analysis. -
single_cell_masks_metadata
with the following columns:-
sample_name
= Sample name matching a value in thesample_metadata_file
file -
label
="Cell_Mask"
-
file_name
= path to a cell mask in uint16 tiff format
-
-
annotated_cell_data_file
= A .csv file with the following columns:-
Metadata_sample_name
= Sample name matching a value in thesample_metadata_file
file -
cell_type
= name of the cell type being identified.
-
-
cell_masking_metadata
= A .csv file indicating which masks to use and which thresholds of overlap to apply, it should have the following columns:-
cell_type
= name of the cell type being identified. -
threshold_marker
= marker to use as mask. It should match a value in the label column of thepreprocessed_metadata_file
. It can be a combination of markers specified with logical operators (AND =&
, OR =|
, NOT =!
,()
= round brackets). -
threshold_value
= 1 - fraction of area overlap between the cell and the mask. Cells whose area is overlapping the mask by a fraction higher than threshold marker are considered as positive. -
color
= Color used to represent this cell type. Accepted values are color names or hexadecimal #RGB or #RGBA format ("#RRGGBB" or "#RRGGBBAA"). Cells ofcell_type
="UNASSIGNED"
are automatically assigned the color"#888888".
-
The annotated cell table is a .csv table with the same columns as the table plus the following annotations:
-
cell_type
: Name used to identify the cell type during the analysis. -
CellName
: Unique Cell identity string in the form:Metadata_sample_name_ObjectNumber
The cell type level table is saved at:$output_folder/annotated_cells.csv
Outputs:
The cell type level plots are saved in $output_folder/Plots/Cell_Type_Plots/
and they are divided in:
-
Barplots:
$output_folder/Plots/Cell_Type_Plots/Barplots
.pdf files with barplots with the proportions of all cell types + unassigned cells in:- Each sample: one bar per sample.
- Category (optional): one bar per category, If the comparison column in the
sample_metadata_file
file contains 2 categories. The barplots are divided in the following .pdf files:-
dodged_barplots.pdf
= dodged barplots including"UNASSIGNED"
cells. -
dodged_assigned_ony_barplots.pdf
= dodged barplots excluding"UNASSIGNED"
cells. -
stacked_barplots.pdf
= stacked barplots including"UNASSIGNED"
cells. -
stacked_assigned_only_barplots.pdf
= stacked barplots excluding"UNASSIGNED"
cells.
-
-
Overlays:
$output_folder/Plots/Cell_Type_Plots/Overlays/
- One overlay-sample_name.tiff image per sample. Each cell is coloured by cell type according to the color specified in the cell types metadata file
- overlay_legend.pdf: legend mapping each cell type to its color.
-
Boxplots (Optional):
$output_folder/Plots/Cell_Type_Plots/Boxplots/
If the comparison column in thesample_metadata_file
file contains 2 categories, a .pdf file is produced with one boxplot for each cell type + unassigned cells. The FDR is calculated with the Benjamini-Hochberg procedure.
This process can be skipped by setting the skip_type_visualization
parameter to true
.
This process allows to perform unsupervised clustering on cells from one or more set of cells. The input cell data for this process can be derived from:
- cell data annotated in the cell masking process.
- cell data specified by the user with the
annotated_cell_data_file
file if the cell masking process is skipped.
Inputs and parameters:
-
sample_metadata_file
with the metadata of all samples used in the analysis. If the value of thecomparison
column for the sample is"NA"
all cells from the sample are excluded from the clustering. -
annotated_cell_data_file
= A .csv file with the following columns:-
Metadata_sample_name
= Sample name matching a value in thesample_metadata_file
file -
cell_type
= name of the cell type being identified. - Columns with the expression values of the markers used for clustering, the names should match the
clustering_markers
column in theannotated_cell_data_file
file.
-
-
cell_clustering_metadata
metadata file with the parameters for the cell phenotyping by unsupervised clustering. It contains the following columns:-
cell_type
= name of the cell type to use for phenotyping. Set to"NA"
to use all cells in the sample. -
clustering_markers
=@
separated list of markers to use for clustering. The markers must match a column name from theannotated_cell_data_file
-
clustering_resolutions
=@
separated list of resolutions used to extract the clusters from the graph, use a value above (below) 1.0 if you want to obtain a larger (smaller) number of clusters.
-
See the original Seurat function for details.
The Cell clustering level table is a .csv table with a row for each cell in the cell types that underwent clustering and the following annotations:
-
CellName
: Cell identity string in the form:Metadata_sample_name
_ObjectNumber
Metadata_sample_name
- Clustering resolution columns: res-RESOLUTION-ids for each clustered cell type. Clusters are numbered from 0, the same numbering is used in the plots.
-
ImageNumber
: CellProfiler4 specific image identifier. -
ObjectNumber
: Unique identity number from 1 to 216-1, matches the corresponding pixels in the cell masks. - CellProfiler4 area shape measurements (optional): Can be included if the user plans to use them for downstream analysis
- CellProfiler4 marker intensity measurements: Name used to identify the cell type during the analysis.
-
cell_type
: Name used to identify the cell type during the analysis.
The exact set of fields and their order depends on the CellProfiler4 pipeline params.cp4_segmentation_cppipe
The annotated cell table is saved at: $output_folder/Cell_Clusters/clustered_cells.csv
The same data in .csv and .RData format (Seruat object) is saved separately by cell type in: $output_folder/Cell_Clusters/CELL_TYPE
The cell cluster level plots are saved in $output_folder/Plots/Cell_Cluster_Plots/
and they are divided in:
-
UMAPs:
$example_output/Plots/Cell_Cluster_Plots/CELL_TYPE/UMAPs/
For each clustering resolution a .pdf file with UMAP plots colored by:- Sample
- Cluster: clusters at this level of resolution.
- Marker: markers used for the clustering.
-
Boxplots (Optional):
$output_folder/Plots/Cell_Type_Plots/Boxplots/
For each comparison metadata column with 2 categories:
For each level of resolution a .pdf file is produced, the file contains:
- Heatmap: showing for each cluster the expression of the markers used for the clustering.
- Boxplots: one for each cluster, with the percentage of cells belonging to that cluster on the total cells in the clustered cell type. The FDR is calculated using the Benjamini-Hochberg procedure for all clusters. -
Heatmaps (Optional): If there is no comparison metadata column with 2 categories:
For each level of resolution a .pdf file is produced containing an heatmap showing for each cluster the expression of the markers used for the clustering.