The official repository for the paper "Authorship Style Transfer with Policy Optimization".
Commends for enviroment setup with conda.
conda create --name astrapop python=3.8
conda activate astrapop
pip install -U pip
pip install -r requirements.txt
Please download the original Reddit Million User Dataset (MUD) from here and the original ETS Corpus of Non-Native Written English from here. We will publish the data preprocessing code soon.
To reproduce the results on the Reddit dataset, please run the scirpts in scripts/reddit
following the procedure below.
- Train the paraphrase model and the reference SFT model by running
00_train_paraphraser.sh
and00_train_sft.sh
. - Generate the data for DPO and CPO training by running
01_generate_dpo_cpo_data.sh
. - Train the PO models using PPO/DPO/CPO by running
02_train_ppo.sh
/02_train_dpo.sh
/02_train_cpo.sh
. - Transfer the texts in the test set by running
03_generate.sh
.
To reproduce the results on the ETS dataset, please run the scirpts in scripts/ets
.
- Train the style reward model, the paraphrase model, and the reference SFT model by running
00_train_cls.sh
,00_train_paraphraser.sh
, and00_train_sft.sh
. - Generate the data for DPO and CPO training by running
01_generate_dpo_cpo_data.sh
. - Train the PO models using PPO/DPO/CPO by running
02_train_ppo.sh
/02_train_dpo.sh
/02_train_cpo.sh
. - Transfer the texts in the test set by running
03_generate.sh
.