Skip to content

MLAI-Yonsei/SSPO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SSPO (Semi-Supervised Preference Optimization)

This repository contains the implementation of SSPO and baselines (DPO, ORPO, SimPO, KTO, SSRM, SPA).

Installation

  1. Create a virtual environment named 'sspo' with Python 3.10 or higher:
conda create -n sspo python==3.10.0
conda activate sspo
  1. Install required packages:
cd SSPO
pip install -r requirements.txt

Execution

SSPO Training

  1. Preprocess the data:
python preprocessing_data/preprocessing_ultrachat.py --fb [feedback_ratio] --ch [chat_ratio]
python preprocessing_data/preprocessing_medical.py --fb [feedback_ratio] --ch [chat_ratio]
python preprocessing_data/preprocessing_business.py --fb [feedback_ratio] --ch [chat_ratio]
  1. Generate YAML configuration and training command:
python examples/train/make_yaml.py
python examples/train/make_yaml_medical.py
python examples/train/make_yaml_business.py
  1. Execute training:
# Copy the generated command from make_yaml.py output
# Paste it into examples/train/train.sh
bash examples/train/train.sh

DPO, ORPO, SimPO, KTO Training

Follow the same steps as SSPO, but modify the method in examples/train/train.sh

SPA Training

We follow the implementation from the SPA repository. Please refer to this repository for detailed instructions.

SSRM Training

  1. Generate additional unlabeled responses:
python examples/SSRM/generate_responses.py
  1. Perform pseudo-labeling using a pre-trained reward model:
python examples/SSRM/pseudo_label.py
  1. Filter data based on confidence threshold:
python examples/SSRM/conf_threshold.py
  1. Merge feedback data:
python examples/SSRM/merge_json.py
  1. Execute the complete SSRM training pipeline:
# Configure the number of iterations in examples/SSRM/train-ssrm.sh
# The script will execute steps 1-4 for the specified number of iterations
bash examples/SSRM/train-ssrm.sh

Notes

  • Make sure to adjust hyperparameters in the YAML configuration file generated by make_yaml.py
  • For SSRM, you can control the number of iterations by modifying the commands in train-ssrm.sh

About

[ICLR 2026] Semi-Supervised Preference Optimization with Limited Feedback

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages