SS2LD: Self-Supervised Distillation of Legacy Rule-Based Methods for Enhanced EEG-Based Decision-Making

Self-Supervised Distillation of Legacy Rule-Based Methods for Enhanced EEG-Based Decision-Making
Yipeng Zhang¹, Yuanyi Ding¹, Chenda Duan¹, Atsuro Daida², Hiroki Nariai², Vwani Roychowdhury¹
¹Department of Electrical and Computer Engineering, University of California, Los Angeles
²Mattel Children’s Hospital, David Geffen School of Medicine, University of California, Los Angeles
MICCAI 2025

A Python package for Signal to Seizure Learning and Detection using VAE-based models and self-supervised learning on HFO (High Frequency Oscillation) data.

Installation

For development installation:

pip install -e .

For development with optional dependencies:

pip install -e ".[dev]"

Configuration

Important: Before using SS2LD, configure your data paths in config/param.py. This file contains all the settings for training and data processing.

Setting up your data paths:

Edit config/param.py
Update the DATA_CONFIG section:

DATA_CONFIG = {
    "data_dir": "/path/to/your/processed/data",  # Path to patient folders (sub-001, sub-002, etc.)
    "meta_fn": "/path/to/your/metadata.csv",    # Path to metadata CSV file
}

Update the FEATURE_EXTRACTION_CONFIG section:

FEATURE_EXTRACTION_CONFIG = {
    "resample_rate": 1000,
    "window_size": 1000,
    "default_bids_folder": "/path/to/your/raw/BIDS/data",  # 🚨 Update this path
    "default_output_folder": os.path.join(PROJECT_ROOT, "data"),  # Should match data_dir above
}

Usage

1. Feature Extraction (Raw EDF → Processed Data)

Extract HFO waveforms from raw EDF files:

# Use default paths from config/param.py
ss2ld-extract-features

# Or override paths
ss2ld-extract-features \
    --bids-folder /path/to/BIDS/data \
    --output-folder /path/to/processed/data \
    --resample-rate 1000 \
    --skip-existing

2. Training VAE Models

Single GPU Training

Train VAE models for HFO analysis on a single GPU:

# Train with default settings from config/param.py
ss2ld-train --mode train --fold_num 0

# Train on specific GPU with custom suffix
ss2ld-train \
    --mode train \
    --fold_num 0 \
    --device cuda:1 \
    --suffix my_experiment

Multi-GPU Training Pipeline

Use the automated multi-GPU training runner for cross-validation:

# Run training across multiple GPUs automatically
ss2ld-run-train \
    --num-folds 5 \
    --gpu-types A6000 \
    --max-gpus 4 \
    --memory-threshold 1600

3. Testing/Inference

# Test trained model
ss2ld-train --mode test --fold_num 0 --epoch best

4. Clustering Analysis

After training, perform clustering analysis on the learned representations:

# Run clustering on training results
ss2ld-run-cluster \
    --results-suffix "2025-06-27_3" \
    --epoch "best" \
    --artifact-samples 5000 \
    --pathology-samples 2500

Clustering Configuration:

# In config/param.py
CLUSTERING_CONFIG = {
    "epoch": "best",  # Which epoch to analyze ("best" or epoch number)
    "artifact_samples": 5000,  # Number of artifact samples for clustering
    "pathology_samples": 2500,  # Number of pathology samples for clustering
    "cluster_algo": "gmm",  # Clustering algorithm ("gmm" or "kmeans")
    "n_clusters": 2,  # Number of clusters
    "seed": 42,  # Random seed for clustering
}

5. Fine-tuning with Cluster Labels

Use clustering results as teacher labels for supervised fine-tuning:

# Single fold fine-tuning
ss2ld-finetune \
    --pretrain-checkpoint "res/2025-06-27_3" \
    --teacher-suffix "5000_2500" \
    --pretrain-epoch "best" \
    --mode train \
    --fold_num 0

# Multi-GPU fine-tuning pipeline
ss2ld-run-finetune \
    --pretrain-checkpoint "res/2025-06-27_3" \
    --teacher-suffix "5000_2500" \
    --pretrain-epoch "best" \
    --epochs 20 \
    --lr 1e-4

Fine-tuning Configuration:

# In config/param.py
FINETUNE_CONFIG = {
    "epochs": 20,  # Number of fine-tuning epochs
    "teacher_suffix": "5000_2500",  # Suffix matching clustering output
    "pretrain_epoch": 100,  # Epoch from base training to use
    "pretrain_checkpoint": "./res/2025-02-17_2",  # Base checkpoint directory
    "vae_augment": True,  # Whether to use VAE augmentation during fine-tuning
    "lr": 1e-4,  # Learning rate for fine-tuning (typically lower than base training)
    "batch_size": 32,  # Batch size for fine-tuning
}

Complete Pipeline Workflow

The SS2LD pipeline consists of three main stages:

Self-Supervised Training:

# Train VAE on HFO data
ss2ld-run-train --num-folds 5

Clustering Analysis:

# Generate cluster-based labels
ss2ld-run-cluster --results-suffix "2025-06-27_3"

Supervised Fine-tuning:

# Fine-tune using cluster labels as supervision
ss2ld-run-finetune --pretrain-checkpoint "res/2025-06-27_3"

Data Structure

The package expects data in this structure:

For Feature Extraction (Input):

BIDS_folder/
├── sub-001/
│   └── ses-*/
│       └── ieeg/
│           └── *.edf
├── derivatives/
│   └── hfo_detection/
│       └── sub-001/
│           └── detections.csv

For Training (Output of feature extraction):

processed_data/
├── sub-001/
│   ├── hfo_waveforms.npz
│   └── hfo_info.csv
├── sub-002/
│   ├── hfo_waveforms.npz
│   └── hfo_info.csv
└── meta.csv

Training Results:

res/
└── 2025-06-27_3/
    └── fold_0/
        ├── ckpt/
        │   ├── model_best.pth
        │   └── model_100.pth
        ├── train_best.npz      # For clustering input
        ├── test_best.npz       # For clustering input
        └── 5000_2500_best/     # Clustering output (for fine-tuning)
            ├── train_overall_.npz
            └── test_overall_.npz

Package Structure

config/param.py - Main configuration file (edit this!)
src/ss2ld/ - Core package with utilities and models
pipeline/ - Higher-level scripts for data processing and training

Logging Configuration

Control logging behavior across all SS2LD tools:

LOGGING_CONFIG = {
    "log_level": "INFO",  # DEBUG, INFO, WARNING, ERROR
    "log_format": "%(asctime)s - %(name)s - %(levelname)s - %(message)s",
    "log_to_file": True,  # Whether to save logs to files
    "log_dir": "logs",  # Directory for log files
    "console_log_level": "INFO",  # Log level for console output
    "file_log_level": "DEBUG",  # Log level for file output (can be more detailed)
}

Features:

Dual output: Console + file logging with different levels
Automatic log files: feature_extraction.log, trainer.log, run_train.log
Structured format: Timestamps, component names, log levels
Configurable verbosity: Set DEBUG for development, INFO for production

Command Line Tools

After installation, these commands are available:

ss2ld-extract-features - Extract features from EDF files
ss2ld-train - Train/test VAE models
ss2ld-run-train - Multi-GPU training pipeline
ss2ld-run-cluster - Clustering analysis on trained models
ss2ld-finetune - Fine-tune models with cluster labels
ss2ld-run-finetune - Multi-GPU fine-tuning pipeline

Citation

If you use SS2LD in your research, please cite our paper:

@inproceedings{zhang2025ss2ld,
    title={Self-Supervised Distillation of Legacy Rule-Based Methods for Enhanced EEG-Based Decision-Making},
    author={Zhang, Yipeng and Ding, Yuanyi and Duan, Chenda and Daida, Atsuro and Nariai, Hiroki and Roychowdhury, Vwani},
    booktitle={Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
    year={2025},
    organization={Springer}
}

Authors: Yipeng Zhang¹, Yuanyi Ding¹, Chenda Duan¹, Atsuro Daida², Hiroki Nariai², Vwani Roychowdhury¹
Affiliations: ¹Department of Electrical and Computer Engineering, University of California, Los Angeles, ²Mattel Children’s Hospital, David Geffen School of Medicine, University of California, Los Angeles Conference: MICCAI 2025

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
pipeline		pipeline
src/ss2ld		src/ss2ld
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SS2LD: Self-Supervised Distillation of Legacy Rule-Based Methods for Enhanced EEG-Based Decision-Making

Installation

Configuration

Setting up your data paths:

Usage

1. Feature Extraction (Raw EDF → Processed Data)

2. Training VAE Models

Single GPU Training

Multi-GPU Training Pipeline

3. Testing/Inference

4. Clustering Analysis

5. Fine-tuning with Cluster Labels

Complete Pipeline Workflow

Data Structure

For Feature Extraction (Input):

For Training (Output of feature extraction):

Training Results:

Package Structure

Logging Configuration

Command Line Tools

Citation

About

Uh oh!

Releases

Packages

Languages

roychowdhuryresearch/SS2LD

Folders and files

Latest commit

History

Repository files navigation

SS2LD: Self-Supervised Distillation of Legacy Rule-Based Methods for Enhanced EEG-Based Decision-Making

Installation

Configuration

Setting up your data paths:

Usage

1. Feature Extraction (Raw EDF → Processed Data)

2. Training VAE Models

Single GPU Training

Multi-GPU Training Pipeline

3. Testing/Inference

4. Clustering Analysis

5. Fine-tuning with Cluster Labels

Complete Pipeline Workflow

Data Structure

For Feature Extraction (Input):

For Training (Output of feature extraction):

Training Results:

Package Structure

Logging Configuration

Command Line Tools

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages