SS2LD: Self-Supervised Distillation of Legacy Rule-Based Methods for Enhanced EEG-Based Decision-Making
Self-Supervised Distillation of Legacy Rule-Based Methods for Enhanced EEG-Based Decision-Making
Yipeng Zhang¹, Yuanyi Ding¹, Chenda Duan¹, Atsuro Daida², Hiroki Nariai², Vwani Roychowdhury¹
¹Department of Electrical and Computer Engineering, University of California, Los Angeles
²Mattel Children’s Hospital, David Geffen School of Medicine, University of California, Los Angeles
MICCAI 2025
A Python package for Signal to Seizure Learning and Detection using VAE-based models and self-supervised learning on HFO (High Frequency Oscillation) data.
For development installation:
pip install -e .
For development with optional dependencies:
pip install -e ".[dev]"
Important: Before using SS2LD, configure your data paths in config/param.py
. This file contains all the settings for training and data processing.
- Edit
config/param.py
- Update the
DATA_CONFIG
section:
DATA_CONFIG = {
"data_dir": "/path/to/your/processed/data", # Path to patient folders (sub-001, sub-002, etc.)
"meta_fn": "/path/to/your/metadata.csv", # Path to metadata CSV file
}
- Update the
FEATURE_EXTRACTION_CONFIG
section:
FEATURE_EXTRACTION_CONFIG = {
"resample_rate": 1000,
"window_size": 1000,
"default_bids_folder": "/path/to/your/raw/BIDS/data", # 🚨 Update this path
"default_output_folder": os.path.join(PROJECT_ROOT, "data"), # Should match data_dir above
}
Extract HFO waveforms from raw EDF files:
# Use default paths from config/param.py
ss2ld-extract-features
# Or override paths
ss2ld-extract-features \
--bids-folder /path/to/BIDS/data \
--output-folder /path/to/processed/data \
--resample-rate 1000 \
--skip-existing
Train VAE models for HFO analysis on a single GPU:
# Train with default settings from config/param.py
ss2ld-train --mode train --fold_num 0
# Train on specific GPU with custom suffix
ss2ld-train \
--mode train \
--fold_num 0 \
--device cuda:1 \
--suffix my_experiment
Use the automated multi-GPU training runner for cross-validation:
# Run training across multiple GPUs automatically
ss2ld-run-train \
--num-folds 5 \
--gpu-types A6000 \
--max-gpus 4 \
--memory-threshold 1600
# Test trained model
ss2ld-train --mode test --fold_num 0 --epoch best
After training, perform clustering analysis on the learned representations:
# Run clustering on training results
ss2ld-run-cluster \
--results-suffix "2025-06-27_3" \
--epoch "best" \
--artifact-samples 5000 \
--pathology-samples 2500
Clustering Configuration:
# In config/param.py
CLUSTERING_CONFIG = {
"epoch": "best", # Which epoch to analyze ("best" or epoch number)
"artifact_samples": 5000, # Number of artifact samples for clustering
"pathology_samples": 2500, # Number of pathology samples for clustering
"cluster_algo": "gmm", # Clustering algorithm ("gmm" or "kmeans")
"n_clusters": 2, # Number of clusters
"seed": 42, # Random seed for clustering
}
Use clustering results as teacher labels for supervised fine-tuning:
# Single fold fine-tuning
ss2ld-finetune \
--pretrain-checkpoint "res/2025-06-27_3" \
--teacher-suffix "5000_2500" \
--pretrain-epoch "best" \
--mode train \
--fold_num 0
# Multi-GPU fine-tuning pipeline
ss2ld-run-finetune \
--pretrain-checkpoint "res/2025-06-27_3" \
--teacher-suffix "5000_2500" \
--pretrain-epoch "best" \
--epochs 20 \
--lr 1e-4
Fine-tuning Configuration:
# In config/param.py
FINETUNE_CONFIG = {
"epochs": 20, # Number of fine-tuning epochs
"teacher_suffix": "5000_2500", # Suffix matching clustering output
"pretrain_epoch": 100, # Epoch from base training to use
"pretrain_checkpoint": "./res/2025-02-17_2", # Base checkpoint directory
"vae_augment": True, # Whether to use VAE augmentation during fine-tuning
"lr": 1e-4, # Learning rate for fine-tuning (typically lower than base training)
"batch_size": 32, # Batch size for fine-tuning
}
The SS2LD pipeline consists of three main stages:
-
Self-Supervised Training:
# Train VAE on HFO data ss2ld-run-train --num-folds 5
-
Clustering Analysis:
# Generate cluster-based labels ss2ld-run-cluster --results-suffix "2025-06-27_3"
-
Supervised Fine-tuning:
# Fine-tune using cluster labels as supervision ss2ld-run-finetune --pretrain-checkpoint "res/2025-06-27_3"
The package expects data in this structure:
BIDS_folder/
├── sub-001/
│ └── ses-*/
│ └── ieeg/
│ └── *.edf
├── derivatives/
│ └── hfo_detection/
│ └── sub-001/
│ └── detections.csv
processed_data/
├── sub-001/
│ ├── hfo_waveforms.npz
│ └── hfo_info.csv
├── sub-002/
│ ├── hfo_waveforms.npz
│ └── hfo_info.csv
└── meta.csv
res/
└── 2025-06-27_3/
└── fold_0/
├── ckpt/
│ ├── model_best.pth
│ └── model_100.pth
├── train_best.npz # For clustering input
├── test_best.npz # For clustering input
└── 5000_2500_best/ # Clustering output (for fine-tuning)
├── train_overall_.npz
└── test_overall_.npz
config/param.py
- Main configuration file (edit this!)src/ss2ld/
- Core package with utilities and modelspipeline/
- Higher-level scripts for data processing and training
Control logging behavior across all SS2LD tools:
LOGGING_CONFIG = {
"log_level": "INFO", # DEBUG, INFO, WARNING, ERROR
"log_format": "%(asctime)s - %(name)s - %(levelname)s - %(message)s",
"log_to_file": True, # Whether to save logs to files
"log_dir": "logs", # Directory for log files
"console_log_level": "INFO", # Log level for console output
"file_log_level": "DEBUG", # Log level for file output (can be more detailed)
}
Features:
- Dual output: Console + file logging with different levels
- Automatic log files:
feature_extraction.log
,trainer.log
,run_train.log
- Structured format: Timestamps, component names, log levels
- Configurable verbosity: Set DEBUG for development, INFO for production
After installation, these commands are available:
ss2ld-extract-features
- Extract features from EDF filesss2ld-train
- Train/test VAE modelsss2ld-run-train
- Multi-GPU training pipeliness2ld-run-cluster
- Clustering analysis on trained modelsss2ld-finetune
- Fine-tune models with cluster labelsss2ld-run-finetune
- Multi-GPU fine-tuning pipeline
If you use SS2LD in your research, please cite our paper:
@inproceedings{zhang2025ss2ld,
title={Self-Supervised Distillation of Legacy Rule-Based Methods for Enhanced EEG-Based Decision-Making},
author={Zhang, Yipeng and Ding, Yuanyi and Duan, Chenda and Daida, Atsuro and Nariai, Hiroki and Roychowdhury, Vwani},
booktitle={Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year={2025},
organization={Springer}
}
Authors: Yipeng Zhang¹, Yuanyi Ding¹, Chenda Duan¹, Atsuro Daida², Hiroki Nariai², Vwani Roychowdhury¹
Affiliations: ¹Department of Electrical and Computer Engineering, University of California, Los Angeles, ²Mattel Children’s Hospital, David Geffen School of Medicine, University of California, Los Angeles
Conference: MICCAI 2025