SSPO (Semi-Supervised Preference Optimization)

This repository contains the implementation of SSPO and baselines (DPO, ORPO, SimPO, KTO, SSRM, SPA).

Installation

Create a virtual environment named 'sspo' with Python 3.10 or higher:

conda create -n sspo python==3.10.0
conda activate sspo

Install required packages:

cd SSPO
pip install -r requirements.txt

Execution

SSPO Training

Preprocess the data:

python preprocessing_data/preprocessing_ultrachat.py --fb [feedback_ratio] --ch [chat_ratio]
python preprocessing_data/preprocessing_medical.py --fb [feedback_ratio] --ch [chat_ratio]
python preprocessing_data/preprocessing_business.py --fb [feedback_ratio] --ch [chat_ratio]

Generate YAML configuration and training command:

python examples/train/make_yaml.py
python examples/train/make_yaml_medical.py
python examples/train/make_yaml_business.py

Execute training:

# Copy the generated command from make_yaml.py output
# Paste it into examples/train/train.sh
bash examples/train/train.sh

DPO, ORPO, SimPO, KTO Training

Follow the same steps as SSPO, but modify the method in examples/train/train.sh

SPA Training

We follow the implementation from the SPA repository. Please refer to this repository for detailed instructions.

SSRM Training

Generate additional unlabeled responses:

python examples/SSRM/generate_responses.py

Perform pseudo-labeling using a pre-trained reward model:

python examples/SSRM/pseudo_label.py

Filter data based on confidence threshold:

python examples/SSRM/conf_threshold.py

Merge feedback data:

python examples/SSRM/merge_json.py

Execute the complete SSRM training pipeline:

# Configure the number of iterations in examples/SSRM/train-ssrm.sh
# The script will execute steps 1-4 for the specified number of iterations
bash examples/SSRM/train-ssrm.sh

Notes

Make sure to adjust hyperparameters in the YAML configuration file generated by make_yaml.py
For SSRM, you can control the number of iterations by modifying the commands in train-ssrm.sh

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
examples		examples
preprocessing_data		preprocessing_data
src_sspo		src_sspo
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SSPO (Semi-Supervised Preference Optimization)

Installation

Execution

SSPO Training

DPO, ORPO, SimPO, KTO Training

SPA Training

SSRM Training

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

SSPO (Semi-Supervised Preference Optimization)

Installation

Execution

SSPO Training

DPO, ORPO, SimPO, KTO Training

SPA Training

SSRM Training

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages