-
Notifications
You must be signed in to change notification settings - Fork 5
Input
SIMPLI works on highly multiplexed imaging data and performs analysis at the cell and pixel level, according to parameters specified in four metadata files.
SIMPLI currently operates on input images from Imaging Mass Cytometry experiments in .txt or .mcd format. Support for other techniques (multiplex Immunofluorescence, ...) and file types (.ome.tiff, ...) will be implemented.
SIMPLI execution is controlled through 4 main configuration files in .csv tabular format, which can be edited with excel or any plain text editor. For examples see the the test dataset folder at: metadata
SIMPLI requires the following configuration files:
This files provide the metadata for all the samples, each ROI is considered as a sample and is associated to a row with the following required fields:
-
sample_name
: Identifier to be used to refer to this sample (ROI) in the analysis. -
color
: Color used to represent this sample in plots, can be a color name or hexadecimal #RGB or #RGBA format ("#RRGGBB" or "#RRGGBBAA"). -
comparison
column: Each sample is associated a category name. To exclude a sample from the comparison, set its field to "NA". Pairwise comparisons will be made only if the column contains two category names("NA" excluded).
This files provide the metadata for all the images that need to be extracted from the raw imaging mass cytometry data (.mcd or .txt format). To each ROI is associated a row with the following required fields:
-
sample_name
: Identifier to be used to refer to this sample (ROI) in the analysis. -
roi_name
: Name of the ROI in the mcd file. This field can be left blank if the input is in .txt format. -
file_name
: Path to the .mcd or .txt input file. It can be an absolute path or a relative path from SIMPLI's execution directory.
This files provide the metadata for all the channels that need to be extracted from the raw data. These channels must be present in all the samples included in the analysis. To each channel is associated a row with two required fields:
-
channel_metal
: Metal associated to the channel, must match the metal name used in the acquisition from the raw data. -
channel_label
: Label used to name the channel in the analysis.
This files indicates all the areas to be measured in the pixel level analysis stage of the pipeline. The main_marker
area will be use to normalize the marker
area as a percentage.
marker
main_marker
The areas are measured for markers or combination of markers expressed with the following operators: !
(NOT), &
(AND), |
(OR), ()
(parenthesis). For example the line `(CD8 & CD3) & !CD4, CD3' means measure the CD3+/CD8+/CD4- area over the CD3+ area.
This file is used to specify the cell types and markers used for:
- Cell Type identification.
- Cell Type clustering.
Each cell type is represented by a row with the following fields:
-
cell_type
: Name used to identify the cell type during the analysis. -
threshold_marker
: Marker whose expression is thresholded to identify this cell type. Needs to be match a column in the output of the CellProfiler3 cell segmentation pipeline. -
threshold_value
: Value to use as a threshold to select cells of this type. -
color
: color associated to this cell type for plotting. Accepted values are color names or hexadecimal #RGB or #RGBA format ("#RRGGBB" or "#RRGGBBAA"). -
clustering_markers
: Markers to be used for unsupervised clustering of this cell type. "@" separated list of names matching columns in the output of the CellProfiler3 cell segmentation pipeline. -
clustering_resolutions
: "@" separated list of values for the resolution parameter used to cut the nearest neighbour graph during unsupervised clustering. Clustering will be performed for each resolution value, use a value above (below) 1.0 if you want to obtain a larger (smaller) number of clusters.
To exclude a cell type from unsupervised clustering fill the clustering_markers
and resolutions
fields with "NA".