Code used to generate the PIDgeon pipeline.
An automated computational cytometry was designed. This work included the development and validation of a FlowSOM-based model for automated population identification, followed by training and validation of predictive models for lymphoid-PID diagnosis. Finally, the pipeline can generate interpretable reports for patients with new PID screening requests in a clinical context.
The computational pipeline was designed, trained and validated using 4 independent patient cohorts, including a training dataset and 3 multi-center validation data sets.
The first part of the PIDgeon pipeline involved the development and validation of a reference FlowSOM tree using healthy control blood samples.
The validation of this reference FlowSOM tree included:
- the comparison of FlowSOM-based cell counts from the healthy control files with well-established age-matched healthy controls ranges
- The patients samples from the trainingsdata were preprocessed, mapped onto the reference FlowSOM tree and the features were extracted. Thereafter, a correlation analysis was conducted between the flowFOM-based features of these patients with the cell counts retrieved from conventional analysis through manual gating.
The second part of the PIDgeon pipeline involved the design and optimization of a diagnostic model tailored to identify lymphoid-PID during the early diagnostic PID workup and to categorize lymphoid-PID based on the IUIS classification. Both a non-hierarchical 6-class model and a 3-step 6-class hierarchical model were trained using the training dataset.
The clinical utility of PIDgeon as a flow-based PID screening tool was validated using independent multi-center datasets collected in 4 EuroFlow centers (Salamanca, Prague, Leiden and Ghent), all following the EuroFlow standard operating procedures.
To allow for in-dept interpretation of the hierarchical model and to gain insight in which features had the most impact on the prediction in the different steps of the model, explainable SHAP values were computed for the Ghent validation data set. These SHAP values indicate the importance of certain features in the patient prediction.
Understanding the diagnoses made by the predictive model and providing immunophenotypical information of the patient sample can also be inspected on a patient level using force plots, displaying which features were pushing the predictive model towards a certain diagnosis.
The final aim of PIDgeon is to generate fast and interpretable report for new patients with PID suspicion for whom flow-based PID screening is requested. As a result, an easy adaptable and fast module was built within the pipeline that allows new patients' data files to be uploaded and a patient-centred report to be generated.