Skip to content
MicheleBortol edited this page Jan 27, 2021 · 16 revisions

Input

SIMPLI works on highly multiplexed imaging data and performs analysis at the cell and pixel level, according to parameters specified in four metadata files.

Multiplexed Images

SIMPLI currently operates on input images from Imaging Mass Cytometry experiments in .txt or .mcd format. Support for other techniques (multiplex Immunofluorescence, ...) and file types (.ome.tiff, ...) will be implemented.

Metadata files

SIMPLI execution is controlled through 4 main configuration files in .csv tabular format, which can be edited with excel or any plain text editor. For examples see the the test dataset folder at: metadata

SIMPLI requires the following configuration files:

Sample Metadata file

This files provide the metadata for all the samples, each ROI is considered as a sample and is associated to a row with the following required fields:

  • sample_name: Identifier to be used to refer to this sample (ROI) in the analysis.
  • color: Color used to represent this sample in plots, can be a color name or hexadecimal #RGB or #RGBA format ("#RRGGBB" or "#RRGGBBAA").
  • comparison column: Each sample is associated a category name. To exclude a sample from the comparison, set its field to "NA". Pairwise comparisons will be made only if the column contains two category names("NA" excluded).

Raw Metadata file

This files provide the metadata for all the images that need to be extracted from the raw imaging mass cytometry data (.mcd or .txt format). To each ROI is associated a row with the following required fields:

  • sample_name: Identifier to be used to refer to this sample (ROI) in the analysis.
  • roi_name: Name of the ROI in the mcd file. This field can be left blank if the input is in .txt format.
  • file_name: Path to the .mcd or .txt input file. It can be an absolute path or a relative path from SIMPLI's execution directory.

Channel Metadata file

This files provide the metadata for all the channels that need to be extracted from the raw data. These channels must be present in all the samples included in the analysis. To each channel is associated a row with two required fields:

  • channel_metal: Metal associated to the channel, must match the metal name used in the acquisition from the raw data.
  • channel_label: Label used to name the channel in the analysis.

Area Measurements Metadata file

This files indicates all the areas to be measured in the pixel level analysis stage of the pipeline. The main_marker area will be use to normalize the marker area as a percentage.

  • marker
  • main_marker

The areas are measured for markers or combination of markers expressed with the following operators: ! (NOT), & (AND), | (OR), () (parenthesis). For example the line `(CD8 & CD3) & !CD4, CD3' means measure the CD3+/CD8+/CD4- area over the CD3+ area.

Cell Types Metadata file

This file is used to specify the cell types and markers used for:

  • Cell Type identification.
  • Cell Type clustering.

Each cell type is represented by a row with the following fields:

  • cell_type: Name used to identify the cell type during the analysis.
  • threshold_marker: Marker whose expression is thresholded to identify this cell type. Needs to be match a column in the output of the CellProfiler3 cell segmentation pipeline.
  • threshold_value: Value to use as a threshold to select cells of this type.
  • color: color associated to this cell type for plotting. Accepted values are color names or hexadecimal #RGB or #RGBA format ("#RRGGBB" or "#RRGGBBAA").
  • clustering_markers: Markers to be used for unsupervised clustering of this cell type. "@" separated list of names matching columns in the output of the CellProfiler3 cell segmentation pipeline.
  • clustering_resolutions: "@" separated list of values for the resolution parameter used to cut the nearest neighbour graph during unsupervised clustering. Clustering will be performed for each resolution value, use a value above (below) 1.0 if you want to obtain a larger (smaller) number of clusters.

To exclude a cell type from unsupervised clustering fill the clustering_markers and resolutions fields with "NA".

Clone this wiki locally