Skip to content

Analysis

MicheleBortol edited this page Apr 8, 2021 · 30 revisions

SIMPLI Analysis Steps

A) Raw image processing

The first step in SIMPLI analysis workflow is the preprocessing of raw images and it consists of 3 processes:

A.1) Image Extraction

In this process tiff files are extracted from the raw acquisition data from imaging mass cytometry (IMC) experiments. This process should be skipped if the input data does not consist of raw IMC data. See the input page for more details.

Inputs and parameters:

Outputs:

  • Images: Images (uncompressed 16 bit tiff) can be output in two different formats:
    • single channel tiff files (one for each of the selected channels) ($output_folder/Images/Raw/sample_name/sample_name-label-raw.tiff )
    • .ome.tiff files (one per sample, the order of channels is the same as in the the channel_metadata file). ($output_folder/Images/Raw/sample_name/sample_name-all_raw.ome.tiff)
  • Metadata:
    • Metadata for all images from all samples: $output_folder/Images/Raw/raw_tiff_metadata.csv
    • By sample metadata for the raw images is also output at at:
      $output_folder/Images/Raw/sample_name/sample_name-raw_tiff_metadata.csv

The output of this process is located at: $output_folder/Images/Raw/

This process can be skipped by setting the skip_conversion parameter to true.

A.2) Image normalisation

This process performs 99th percentile normalisation of the raw tiff images generated in the Image extraction process or specified by the user with if the image extraction process is skipped.

Inputs and parameters:

Outputs:

  • Normalised Images: Images (uncompressed 16 bit tiff) can be output in two different formats:
    • single channel tiff files (one for each of the selected channels) ($output_folder/Images/Normalized/sample_name/sample_name-label-normalized.tiff )
    • .ome.tiff files (one per sample, the order of channels is the same as in the the channel_metadata file). (output_folder/Images/Normalized/sample_name/sample_name-ALL-normalized.ome.tiff)
  • Metadata:
    • Metadata for all images from all samples: $output_folder/Images/Normalized/normalized_tiff_metadata.csv
    • By sample metadata for the raw images is also output at at:
      • $output_folder/Images/Raw/sample_name/sample_name-normalized_tiff_metadata.csv in long format.
      • $output_folder/Images/Raw/sample_name/sample_name-normalized_tiff_metadata.csv in CellProfiler4 compatible wide format.

The output of this process is located at: $output_folder/Images/Normalized/

This process can be skipped by setting the skip_normalization parameter to true.

A.3) Image thresholding and masking

This process is used to perform the image preprocessing that will generate the final images, which can then be used as input for the pixel-based or the cell-based analysis. The input images for this process can be derived from:

Inputs and parameters:

Outputs:

  • Preprocessed Images: (uncompressed 16 bit single-channel tiff)
    $output_folder/Images/Preprocessed/sample_name/sample_name-label-Preprocessed.tiff
  • Metadata:
    • Metadata for all images from all samples $output_folder/Images/Preprocessed/preprocessed_tiff_metadata.csv
    • By sample metadata for the raw images is also output at at:
      • $output_folder/Images/Preprocessed/sample_name/sample_name-preprocessed_metadata.csv in long format.
      • $output_folder/Images/Preprocessed/sample_name-cp4-preprocessed_metadata.csv in CellProfiler4 compatible wide format.

The output of this process is located at: $output_folder/Images/Preprocessed/

This process can be skipped by setting the skip_preprocessing parameter to true.

B) Pixel-based analysis

The pixel-based approach implemented in SIMPLI enables the quantification of pixels which are positive for a specific marker or combination of markers. These marker-positive areas can be normalised over the area of the whole image, or the areas of an image mask defined by a the combination of any of the input images with logical operators.

B.1) Measurement of positve-marker areas

This process measures the areas of interest and normalises them on the selected image masks according to the input metadata. The input images for this process can be derived from:

Inputs and parameters:

  • preprocessed_metadata_file with the tiff image metadata.
  • area_measurements_metadata Path to the area_measurements_metadata file, it has two columns:
    • marker = Marker or combination of markers whose area should be measured.
    • main_marker = Marker or combination of markers whose area should be used to normalise the area of marker. If main_marker is the same as marker then the whole area of the image is used for normalisation.

marker and main_marker value should be either a value from the label column of the preprocessed_metadata_file or a combination of those values with logical operators (AND = &, OR = |, NOT = !, () = round brackets).

Outputs: The area measurements are saved in $output_folder/area_measurements.csv. The file has the following columns:

  • sample_name = Sample name.
  • main_marker = Combination of markers used to normalize the marker area.
  • marker = Main combination of markers measured.
  • area = Area positive for the marker combination of markers.
  • main_marker_area = Area positive for the main_marker combination of markers.
  • total_ROI_area = Total image area for this sample.
  • percentage = Area of the marker (area) / area of the main marker (main_marker_area) * 100.

All areas are in pixel2.

This process can be skipped by setting the skip_area parameter to true.

B.2) Pixel-based analysis visualisation

Generate boxplots showing the comparisons of the distributions of normalised marker-positive areas between 2 categories of samples. The input data for this process can be derived from:

Inputs and parameters:

  • sample_metadata_file with the metadata of all samples used in the analysis.
  • area_measurements_file Path to the area_measurements_file it should have the following columns:
    • sample_name = Sample name, should match a value in the sample_metadata_filemetadata file.
    • main_marker = Marker or combination of marker used for normalisation.
    • marker = Marker or combination of marker used to calculate the area.
    • percentage = Area of the marker / area of the main marker * 100.

FDR is calculated using the number of different marker values for each value of main_marker.

Outputs: The area measurements are saved in $output_folder/Plots/Area_Plots/Boxplots/ a separate folder is created for each main_marker. For each main_marker a pdf file ($output_folder/Plots/Area_Plots/Boxplots/main_marker/main_marker_area_boxplots.pdf) containing a boxplot for each value of marker associated to that main_marker.

The output of this process is located at: $output_folder/Plots/Area_Plots/Boxplots/

This process can be skipped by setting the skip_area_visualization parameter to true.

C) Cell-based analysis

The cell-based analysis aims to investigate the qualitative and quantitative cell representation within the imaged tissue through (1) cell segmentation, cell phenotyping by unsupervised clustering or expression thresholding and spatial analysis of cell densities (homotypic spatial analysis) and distances (heterotypic spatial analysis). The steps of the cell-based analysis are:

C.1) Cell segmentation

Generate single-cell data is .csv format and the cell masks in tiff format. The input data for this process can be derived from:

Inputs and parameters:

Outputs:

  • Single cell data:

    • Single cell data for all samples: $output_folder/Segmentation/unannotated_cells.csv
    • Single cell data for each sample separately: $output_folder/Segmentation/sample_name/sample_name-Cells.csv

    The single-cell data is a.csv table with a row for each cell and the following annotations:

    • ImageNumber: CellProfiler4 specific image identifier.
    • ObjectNumber: Unique identity number from 1 to 216-1, matches the corresponding pixels in the cell masks.
    • Metadata_sample_name: Matching the sample_name values in the preprocessed_metadata_file.
    • Location_Center_X and Location_Center_Y: Location of the cell centroid in the image in pixel, used for both the homotypic and heterotypic spatial analyses.
    • CellProfiler4 marker intensity measurements: Used for cell phenotyping by Unsupervised clustering or by Expression thresholding

    The exact set of fields and their order depends on the CellProfiler4 pipeline used in the analysis.

  • Cell masks:
    Cell masks in uint16 tiff format: $output_folder/Segmentation/sample_name/sample_name-Cell_Mask.tiff To each cell is associated a unique identity number from 1 to 216-1. All the pixel belonging to a given cell have their value set to its identity number. Pixels not belonging to any cell are set to 0.
    These images are compatible with several other tools for downstream analysis including:

The output of this process is located at: $output_folder/Segmentation/

This process can be skipped by setting the skip_segmentation parameter to true.

C.2) Cell masking

This process allows to identify cells belonging to different populations or tissue compartments according to the overlap of their areas with those of specific masks:

The input images for this process can be derived from:

The input cell masks for this process can be derived from:

  • cell masks generated in the cell segmentation process.
  • cell masks specified by the user with the single_cell_masks_metadata file if the cell segmentation process is skipped.

The input cell data for this process can be derived from:

  • cell data generated in the cell segmentation process.
  • cell data specified by the user with the preprocessed_metadata_file file if the cell segmentation process is skipped.

Inputs and parameters:

  • preprocessed_metadata_file with the tiff image metadata.
  • single_cell_masks_metadata with the following columns:
    • sample_name = Sample name matching a value in the preprocessed_metadata_file file
    • label = "Cell_Mask"
    • file_name = path to a cell mask in uint16 tiff format
  • single_cell_data_file = A .csv file with the following columns:
    • Metadata_sample_name = Sample name matching a value in the preprocessed_metadata_file file
    • ObjectNumber = Unique number identifying the pixel belonging to the cell in cell mask.
  • cell_masking_metadata = A .csv file indicating which masks to use and which thresholds of overlap to apply, it should have the following columns:
    • cell_type = name of the cell type being identified.
    • threshold_marker = marker to use as mask. It should match a value in the label column of the preprocessed_metadata_file. It can be a combination of markers specified with logical operators (AND = &, OR = |, NOT = !, () = round brackets).
    • threshold_value = 1 - fraction of area overlap between the cell and the mask. Cells whose area is overlapping the mask by a fraction higher than threshold marker are considered as positive.

If a cell is positive for more than one cell type, than it is assigned to the cell type defined first (by row order) in the cell_masking_metadata file. Cells negative for all cell_types are marked as UNASSIGNED.

Outputs:
The annotated cell table is a .csv table with the same columns as the table plus the following annotations:

  • cell_type: Name used to identify the cell type during the analysis.
  • CellName: Unique Cell identity string in the form: Metadata_sample_name_ObjectNumber
    The cell type level table is saved at: $output_folder/annotated_cells.csv

This process can be skipped by setting the skip_cell_type_identification parameter to true.

C.3) Cell masking visualisation

This process allows to plot the results of the cell masking process. The input cell masks for this process can be derived from:

  • cell masks generated in the cell segmentation process.
  • cell masks specified by the user with the single_cell_masks_metadata file if the cell segmentation process is skipped.

The input cell data for this process can be derived from:

  • cell data generated in the cell segmentation process.
  • cell data specified by the user with the preprocessed_metadata_file file if the cell segmentation process is skipped.

Inputs and parameters:

  • sample_metadata_file with the metadata of all samples used in the analysis.
  • single_cell_masks_metadata with the following columns:
    • sample_name = Sample name matching a value in the sample_metadata_file file
    • label = "Cell_Mask"
    • file_name = path to a cell mask in uint16 tiff format
  • annotated_cell_data_file = A .csv file with the following columns:
    • Metadata_sample_name = Sample name matching a value in the sample_metadata_file file
    • cell_type = name of the cell type being identified.
  • cell_masking_metadata = A .csv file indicating which masks to use and which thresholds of overlap to apply, it should have the following columns:
    • cell_type = name of the cell type being identified.
    • threshold_marker = marker to use as mask. It should match a value in the label column of the preprocessed_metadata_file. It can be a combination of markers specified with logical operators (AND = &, OR = |, NOT = !, () = round brackets).
    • threshold_value = 1 - fraction of area overlap between the cell and the mask. Cells whose area is overlapping the mask by a fraction higher than threshold marker are considered as positive.
    • color = Color used to represent this cell type. Accepted values are color names or hexadecimal #RGB or #RGBA format ("#RRGGBB" or "#RRGGBBAA"). Cells of cell_type = "UNASSIGNED" are automatically assigned the color "#888888".

The annotated cell table is a .csv table with the same columns as the table plus the following annotations:

  • cell_type: Name used to identify the cell type during the analysis.
  • CellName: Unique Cell identity string in the form: Metadata_sample_name_ObjectNumber
    The cell type level table is saved at: $output_folder/annotated_cells.csv

Outputs:
The cell type level plots are saved in $output_folder/Plots/Cell_Type_Plots/ and they are divided in:

  • Barplots: $output_folder/Plots/Cell_Type_Plots/Barplots .pdf files with barplots with the proportions of all cell types + unassigned cells in:

    • Each sample: one bar per sample.
    • Category (optional): one bar per category, If the comparison column in the sample_metadata_file file contains 2 categories. The barplots are divided in the following .pdf files:
      • dodged_barplots.pdf = dodged barplots including "UNASSIGNED" cells.
      • dodged_assigned_ony_barplots.pdf = dodged barplots excluding "UNASSIGNED" cells.
      • stacked_barplots.pdf = stacked barplots including "UNASSIGNED" cells.
      • stacked_assigned_only_barplots.pdf = stacked barplots excluding "UNASSIGNED" cells.
  • Overlays: $output_folder/Plots/Cell_Type_Plots/Overlays/

    • One overlay-sample_name.tiff image per sample. Each cell is coloured by cell type according to the color specified in the cell types metadata file
    • overlay_legend.pdf: legend mapping each cell type to its color.
  • Boxplots (Optional): $output_folder/Plots/Cell_Type_Plots/Boxplots/
    If the comparison column in the sample_metadata_file file contains 2 categories, a .pdf file is produced with one boxplot for each cell type + unassigned cells. The FDR is calculated with the Benjamini-Hochberg procedure.

This process can be skipped by setting the skip_type_visualization parameter to true.

C.4A.1) Unsupervised clustering

This process allows to perform unsupervised clustering on cells from one or more set of cells. The input cell data for this process can be derived from:

  • cell data annotated in the cell masking process.
  • cell data specified by the user with the annotated_cell_data_file file if the cell masking process is skipped.

Inputs and parameters:

  • sample_metadata_file with the metadata of all samples used in the analysis. If the value of the comparison column for the sample is "NA"all cells from the sample are excluded from the clustering.
  • annotated_cell_data_file = A .csv file with the following columns:
    • Metadata_sample_name = Sample name matching a value in the sample_metadata_file file
    • cell_type = name of the cell type being identified.
    • Columns with the expression values of the markers used for clustering, the names should match the clustering_markers column in the annotated_cell_data_file file.
  • cell_clustering_metadata metadata file with the parameters for the cell phenotyping by unsupervised clustering. It contains the following columns:
    • cell_type = name of the cell type to use for phenotyping. Set to "NA" to use all cells in the sample.
    • clustering_markers = @ separated list of markers to use for clustering. The markers must match a column name from the annotated_cell_data_file
    • clustering_resolutions = @ separated list of resolutions used to extract the clusters from the graph, use a value above (below) 1.0 if you want to obtain a larger (smaller) number of clusters.

See the original Seurat function for details.

The Cell clustering level table is a .csv table with a row for each cell in the cell types that underwent clustering and the following annotations:

  • CellName: Cell identity string in the form: Metadata_sample_name_ObjectNumber
  • Metadata_sample_name
  • Clustering resolution columns: res-RESOLUTION-ids for each clustered cell type. Clusters are numbered from 0, the same numbering is used in the plots.
  • ImageNumber: CellProfiler4 specific image identifier.
  • ObjectNumber: Unique identity number from 1 to 216-1, matches the corresponding pixels in the cell masks.
  • CellProfiler4 area shape measurements (optional): Can be included if the user plans to use them for downstream analysis
  • CellProfiler4 marker intensity measurements: Name used to identify the cell type during the analysis.
  • cell_type: Name used to identify the cell type during the analysis.

The exact set of fields and their order depends on the CellProfiler4 pipeline params.cp4_segmentation_cppipe
The annotated cell table is saved at: $output_folder/Cell_Clusters/clustered_cells.csv The same data in .csv and .RData format (Seruat object) is saved separately by cell type in: $output_folder/Cell_Clusters/CELL_TYPE

C.4A.2) Unsupervised clustering visualisation

The cell cluster level plots are saved in $output_folder/Plots/Cell_Cluster_Plots/ and they are divided in:

  • UMAPs: $example_output/Plots/Cell_Cluster_Plots/CELL_TYPE/UMAPs/
    For each clustering resolution a .pdf file with UMAP plots colored by:

  • Boxplots (Optional): $output_folder/Plots/Cell_Type_Plots/Boxplots/
    For each comparison metadata column with 2 categories:
    For each level of resolution a .pdf file is produced, the file contains:
    - Heatmap: showing for each cluster the expression of the markers used for the clustering.
    - Boxplots: one for each cluster, with the percentage of cells belonging to that cluster on the total cells in the clustered cell type. The FDR is calculated using the Benjamini-Hochberg procedure for all clusters.

  • Heatmaps (Optional): If there is no comparison metadata column with 2 categories:
    For each level of resolution a .pdf file is produced containing an heatmap showing for each cluster the expression of the markers used for the clustering.

C.4B.1) Expression thresholding

C.4B.2) Expression thresholding visualisation

C.5A.1) Homotypic spatial analysis

C.5A.2) Homotypic spatial analysis visualisation

C.5B.1) Heterotypic spatial analysis

C.5B.2) Heterotypic spatial analysis visualisation

Clone this wiki locally