XAInyPredictor is a Shiny for Python research prototype for interpretable stratification support. It lets users select a configured use case, build a working input set, confirm that set for analysis, inspect feature context, review model-based stratification outputs, compare profiles, and export results for review.
The app supports both patient-level cohorts and candidate-level sets. Labels such as patient, cohort, candidate, candidate set, and reference candidates are configured per use case in config.yml.
Research prototype notice: outputs support exploratory clinical or translational research workflows. Workflow utility, model interpretation, and clinical relevance should be validated with appropriate domain collaborators before operational use.
From the repository root:
xainypredictorThe default URL is:
http://127.0.0.1:8001
Alternative command:
shiny run src\XAInyPredictor\app.py --port 8001If port 8001 is already in use, choose another port:
shiny run src\XAInyPredictor\app.py --port 8002Use cases live under src/XAInyPredictor/data/.
rai: RAI-R Predictor for thyroid cancer patient stratification.mock: Diabetes Risk Predictor demo use case.neoag: Neoantigen Candidate Prioritizer for peptide-HLA candidate prioritization.
The startup modal selects the initial use case. The top navigation selector can switch use case at runtime; switching clears the current working set and shows a loading state while the selected model context is prepared.
- Start the app and choose a use case.
- In Data Input, review the selected use case summary.
- Build the current working set using one or more sources:
- Manual Entry
- Upload File
- Example Cohort or Example Candidate Set
- Review the table at the top of Data Input. The table remains the working set even when switching between input-source tabs.
- Use Delete Selected or Reset if needed.
- Click Confirm to lock the current working set for downstream analysis.
- Continue to Cohort/Candidate Context and Patient/Candidate Stratification.
Steps 2 and 3 are blocked until the current input set is confirmed. If the working set changes, it must be confirmed again so downstream plots, tables, and exports use the latest data.
Uploaded CSV, TSV, or Excel files must contain the feature columns defined by the active use case. The safest option is to download the CSV template and data dictionary from Data Input after selecting the use case.
General format:
ID,Feature 1,Feature 2,Feature 3
1,value,value,valueNotes:
IDis optional. If it is missing, the app creates sequential IDs.- Numeric fields accept comma or dot decimal separators in manual entry.
- Numeric fields must respect configured min/max ranges.
- Categorical fields must use the allowed values from the active use case configuration.
- Text fields can be used for identifiers such as peptide sequences or HLA alleles.
- If required columns are missing, categories are invalid, or values are out of range, the app shows validation feedback before the set can be confirmed.
The labels below adapt to each use case.
- Cohort/Candidate Context: feature distributions, selected record highlighting, reference population selection, and feature summary statistics.
- Patient/Candidate Stratification: selected record score, decision threshold, assigned class/group, stratification table, and profile comparison controls.
- Cohort/Candidate Set: set-level summary and score distribution.
- Model: model card, intended use, validation status, limitations, and global feature importance.
- Reference Patients/Candidates: closest reference-neighbor context when a reference set is configured.
For the neoantigen use case, closest reference candidates come from the model development reference dataset included with the package. The app uses that dataset internally for scaling, thresholding, and similarity calculations, but the reference view reports anonymized neighbor ranks, distances, scores, and classes rather than row-level reference features.
The stratification view includes a Download outputs menu:
- Report package
- Stratification results
- Closest reference records
- Cohort/candidate set summary
Each use case folder requires:
config.yml
model.pkl
feature_order.txt
example_data.csv
Use cases may also include additional files such as reference_data.csv.
config.yml controls:
- model name, description, and metadata;
- feature definitions, labels, defaults, ranges, allowed values, and roles;
- target column and positive/negative classes;
- entity labels used across the UI;
- tab titles, help text, validation copy, and download labels;
- model-specific options such as false-negative-rate defaults and reference-set limits.
When a new use case folder is added under src/XAInyPredictor/data/, restart or refresh the app to make it available in the startup modal and navigation selector. The GitHub Actions build packages the full data/ directory, so committed use cases are included in generated desktop artifacts.
Install development dependencies:
pip install -e ".[dev]"Run pre-commit:
pre-commit run --all-filesBuild the package:
python -m buildGenerate docs:
doxygen .\docs\DoxyfileThe repository includes a Build XAInyPredictor workflow in .github/workflows/build.yml.
It runs on every push and can also be started manually from GitHub:
- Open the repository on GitHub.
- Go to Actions.
- Select Build XAInyPredictor.
- Click Run workflow.
- When the run finishes, download the generated artifacts from the workflow run summary.
The workflow builds:
- a Windows installer;
- a Linux tarball;
- a macOS zip package.
No automated test suite is currently configured.