Skip to content
This repository was archived by the owner on Sep 22, 2025. It is now read-only.
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

Large diffs are not rendered by default.

1,447 changes: 1,447 additions & 0 deletions AI Guardian/AlertSystemTask/.ipynb_checkpoints/Untitled-checkpoint.ipynb

Large diffs are not rendered by default.

808 changes: 808 additions & 0 deletions AI Guardian/AlertSystemTask/AlertSystemScript.ipynb

Large diffs are not rendered by default.

1,921 changes: 1,921 additions & 0 deletions AI Guardian/AlertSystemTask/New AI spreadsheet - Sheet1.csv

Large diffs are not rendered by default.

98 changes: 98 additions & 0 deletions AI Guardian/AlertSystemTask/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
Guardian Alerts (Sprint 2) – Monitoring Pipeline Report
1) Executive summary
I built an end-to-end notebook that flags patient risk using two complementary signals:
1. a time-series anomaly model (LSTM autoencoder; IsolationForest fallback) that looks for unusual behaviour over recent days, and
2. a behavioural-anomaly classifier (Random Forest or MLP) trained on engineered features (deltas, rolling means, z-scores).
On top of the model scores, I added a clinically anchored vitals overlay (SpO₂, temperature, blood pressure, activity, meals skipped). The final risk_level is the maximum of the model view and the vitals view, and every Medium/High row includes a clear reason.
The pipeline writes artifacts/alerts.csv with:
user_id, timestamp, anom_score, clf_prob, risk_level, reason.

2) Data ingestion & normalization
• Auto-discovery: The notebook finds the dataset in the same folder (prefers New AI spreadsheet - Sheet1.csv).
• Schema mapping (→ canonical names):
o patientId → user_id
o observationStart → timestamp (uses observationEnd if Start is absent)
o Behaviour: stepsTaken → steps, calorieIntake → calorie_intake, sleepHours → sleep_hours,
waterIntakeMl → water_intake (mL→L), bathroomVisits → bathroom_visits
o Vitals/context: heartRate → heart_rate, spo2 → spo2, temperature → temperature,
bloodPressure "120/80" → bp_sys, bp_dia, mealsSkipped → meals_skipped, exerciseMinutes → exercise_minutes
• Time handling: Parse timestamps and sort by user_id, timestamp.
• Missing values: Per-user interpolate → back/forward fill → remaining NaNs to 0.
• Scaling: StandardScaler fit on all model features; saved as artifacts/scaler.pkl.

3) Models & features
3.1 Sequence anomaly (LSTM AE; IF fallback)
• Architecture: LSTM autoencoder (hidden=32, latent=16), teacher-forced reconstruction.
• Window: SEQ_LEN = 14 days.
• Training: 30 epochs, Adam(1e-3).
• Score: Mean-squared reconstruction error per window → aligned to timestamps.
• Calibration: Compute the dataset’s 80th and 95th percentiles of the raw error (err_p80, err_p95) to avoid “everything = High”.
If PyTorch isn’t available, IsolationForest (contamination 0.05) is used and we take -score_samples as an error-like measure.
3.2 Behavioural anomaly classifier (RF/MLP)
• Models: rf (default) or mlp.
• Engineered features (for each of 12 inputs: 5 behaviour + 7 vitals):
value, delta, 7-day rolling mean, 7-day rolling z-score.
• Labels: If no ground truth exists, create weak labels: top 5% by reconstruction error = anomalous.
• Output: clf_prob (0–1). For interpretability, we use 0.65 / 0.85 as Medium / High hints.

4) Alert logic (how risk_level is decided)
This is the exact mapping the notebook implements.
4.1 Signals computed first
• anom_score (0–1): min–max normalization of the reconstruction error (for visibility and plots).
• recon_error: the unnormalized error used to compare against percentiles err_p80 / err_p95.
• clf_prob (0–1): classifier probability (or scaled decision value).
• Vitals snapshot: spo2, temperature (°C), bp_sys/bp_dia, exercise_minutes/day, meals_skipped/day.
4.2 Model risk (based on anomaly + classifier)
High if recon_error ≥ err_p95 OR clf_prob ≥ 0.85
Medium if recon_error ≥ err_p80 OR clf_prob ≥ 0.65 (and not High)
Low otherwise
4.3 Vitals risk (direction-aware clinical thresholds)
• SpO₂: Low ≥95% | Medium 90–94% | High <90%
• Temperature (°C): Low <38.0 | Medium 38.0–39.3 | High ≥39.4
• Blood pressure (mmHg): Low <130/<80 | Medium 130–139 or 80–89 | High ≥140 or ≥90 (escalate internally if ≥180/120)
• Exercise minutes/day: Low ≥20 | Medium 10–19 | High <10
• Meals skipped/day: Low 0–1 | Medium 2 | High ≥3
4.4 Final risk & reasons
risk_level = max(model_risk, vital_risk) # High > Medium > Low
• reason (string): for Medium/High, we list every trigger that fired, e.g.
o “Strong sequence anomaly (≥95th percentile)”
o “Classifier: strong behavioural anomaly (≥0.85)”
o “SpO₂ 89% (<90)”, “High fever 39.5 °C (≥39.4)”, “Stage 2 HTN 165/102”, “Very low activity (6 min)”, “Meals skipped: 3”
• For Low, we leave reason blank (as requested) to keep the CSV clean.

5) Outputs & visualizations
• Primary CSV: artifacts/alerts.csv — entire dataset, columns:
user_id, timestamp, anom_score, clf_prob, risk_level, reason
• Saved models: lstm.pt (if LSTM) or iforest.pkl, plus clf.pkl and scaler.pkl.
• Thresholds meta: thresholds.json (min/max error, p80, p95).
• Notebook visuals:
1. Anomaly score distribution with p80/p95 markers
2. anom_score vs clf_prob (scatter) colored by final risk
3. Risk counts (bar)
4. Example patient timeline (key features + risk overlay)

6) Values and sources (provenance)
• Exercise target: WHO adults’ guideline 150–300 min/week moderate (≈21–43 min/day).
• Blood pressure categories: American Heart Association (Normal/Elevated/Stage 1/Stage 2; crisis ≥180/120).
• Temperature: NHS fever in adults ≥38.0 °C; we treat ≥39.4 °C as high fever.
• SpO₂: Typical “normal” ~95–100%; <90% concerning at rest (Cleveland Clinic style guidance).
• Meals skipped: heuristic (no formal standard).
(In the code, these are embedded as comments near the thresholds for auditability.)

7) Configuration knobs
• Engine: ENGINE = "lstm" | "iforest"; Classifier: CLF = "rf" | "mlp".
• Window length: SEQ_LEN (default 14).
• Sensitivity: tweak err_p80/err_p95 or classifier cutoffs (0.65/0.85).
• Vitals thresholds: adjust SpO₂/Temp/BP/activity/meals to match clinical guidance or site policy.

8) Limitations & next steps
• Not a diagnostic tool: Alerts are for monitoring/triage.
• Weak labels: Until we have ground truth, the classifier learns from anomaly tails.
• Context variance: Vitals can depend on altitude, chronic conditions, or orders; future work: per-patient baselines & clinician-tuned thresholds.
• Enrichment: Add NLP on nursingNote, behaviourTags, emotionTags to improve reasons.

9) TL;DR
• We map anomalies → Low/Medium/High via calibrated sequence error and a behavioural classifier, then apply clear clinical cutoffs for vitals.
• Output is one final risk_level plus a readable reason explaining what fired.
• Thresholds are transparent, tunable, and sourced (WHO/AHA/NHS/Cleveland Clinic where applicable).

125 changes: 125 additions & 0 deletions AI Guardian/AlertSystemTask/RUNBOOK_GM_ALERTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
RUNBOOK_Guardian_Alerts
1) Purpose
This runbook explains how to execute the Guardian Alerts notebook end-to-end to generate patient risk alerts (Low/Medium/High) with reasons, using:
• a sequence anomaly model (LSTM autoencoder; IsolationForest fallback),
• a behavioural anomaly classifier (Random Forest or MLP),
• a clinically anchored vitals overlay (SpO₂, temperature °C, blood pressure, activity, meals skipped).
The final output is artifacts/alerts.csv with:
user_id, timestamp, anom_score, clf_prob, risk_level, reason.



2) What you need (once per machine)
• Python 3.9+
• Packages:
o Required: pandas, numpy, scikit-learn, joblib, matplotlib
o Optional (enables LSTM): torch
• Jupyter Notebook (or JupyterLab)
Install with pip:
pip install pandas numpy scikit-learn joblib matplotlib
# Optional, for LSTM:
pip install torch
# Jupyter (if not already installed):
pip install notebook
(Conda users can conda install pandas numpy scikit-learn matplotlib and conda install pytorch -c pytorch.



3) Folder setup (keep it simple)
Place the notebook and dataset side-by-side in the same folder (e.g., Alerts/):

Alerts/
├─ Guardian_Alerts.ipynb # the notebook you’ll run
├─ New AI spreadsheet - Sheet1.csv # the dataset (CSV)
└─ (auto-created after running)
└─ artifacts/
├─ alerts.csv
├─ scaler.pkl
├─ lstm.pt OR iforest.pkl
├─ clf.pkl
└─ thresholds.json

The notebook auto-detects the dataset in the current folder (prefers New AI spreadsheet - Sheet1.csv). If your CSV has a different name, keep it in the same folder — auto-discovery will still find it.



4) How to run
1. Open a terminal in the Alerts/ folder and launch Jupyter:
jupyter notebook
2. Open Guardian_Alerts.ipynb.
3. Run all cells top → bottom (Kernel → Restart & Run All is fine).
That’s it — the pipeline reads the entire dataset, trains the models, applies the alert logic, saves artifacts, and shows visualizations.



5) What you should see
• Console prints like:
o “Detected dataset: …”
o “PyTorch available: True/False”
o LSTM training progress (if PyTorch is installed)
o Classifier evaluation summary (if labels are weak-labeled or provided)
o “Saved alerts to: artifacts/alerts.csv (rows=####)”
• Plots:
1. Anomaly score distribution with p80/p95 markers
2. anom_score vs clf_prob (scatter) coloured by final risk
3. Risk counts (bar chart)
4. Example patient timeline (key features + risk overlay)



6) Outputs (where to look)
• Primary file: artifacts/alerts.csv with columns
user_id, timestamp, anom_score, clf_prob, risk_level, reason
• Models & metadata:
o scaler.pkl — feature scaler
o lstm.pt (if LSTM used) or iforest.pkl (fallback)
o clf.pkl — behavioural anomaly classifier
o thresholds.json — calibrated anomaly thresholds (p80, p95) and bounds


Quick sanity checks in a new cell:
import pandas as pd
alerts = pd.read_csv("artifacts/alerts.csv")
alerts["risk_level"].value_counts()
alerts.sample(5)




7) Configuration knobs (tune without editing much)
At the top of the notebook you can change:
• ENGINE = "lstm" or "iforest"
(If torch isn’t installed, the code auto-falls back to IsolationForest.)
• CLF = "rf" or "mlp"
• SEQ_LEN = 14 (sequence window in days; shorter = more reactive, longer = smoother)
• Classifier cutoffs used in reasons: 0.65 (Medium), 0.85 (High)
• Vital thresholds (SpO₂, Temp °C, BP, exercise/day, meals skipped) are in the vital_risk_and_reasons() helper — edit if your clinicians prefer different limits.




8) How the alert logic works (so you can verify)
• Model risk from anomaly + classifier:
o High if recon_error ≥ p95 or clf_prob ≥ 0.85
o Medium if recon_error ≥ p80 or clf_prob ≥ 0.65 (and not High)
o Low otherwise
• Vitals risk from clinical thresholds (direction-aware):
o SpO₂ (Low ≥95, Med 90–94, High <90), Temp °C (Med ≥38.0, High ≥39.4),
BP (Stage 1/2 ranges), Exercise/day (Low ≥20, Med 10–19, High <10), Meals skipped (Low 0–1, Med 2, High ≥3)
• Final risk_level = max(model_risk, vital_risk)
• reason lists all triggers for Medium/High rows (model + vitals).



9) Troubleshooting (fast fixes)
• “No CSV/Excel found in current directory”
→ Ensure the dataset is in the same folder as the notebook, or rename to include “sheet1” or “new ai spreadsheet”.
• Timestamp parse errors
→ The code prefers observationStart (falls back to observationEnd). Make sure one of them exists and is a valid date/time.
• Missing columns / KeyError
→ The loader maps your schema to canonical names. If you changed header names, update the rename_map in the loader cell.
• Everything shows as High risk
→ This is usually thresholding. The notebook calibrates model thresholds from your dataset (p80/p95). Re-run after confirming the dataset isn’t trivially small or all-zeros; adjust thresholds if needed.
• No PyTorch
→ The notebook will automatically use IsolationForest. If you want LSTM, install torch.

Loading