Skip to content

trusthlt/private-synthetic-text-generation

Repository files navigation

Private Synthetic Text Generation with Diffusion Models

Arxiv License Python 3.8 Anaconda 24.3.0 PyTorch 2.3.0

Description

This repository contains the source code to replicate the experimental results in our paper.

Installation

We use Anaconda 24.3.0 to set up our virtual environment in Python.

conda create -n private-synthetic-text-generation python=3.8
conda activate private-synthetic-text-generation

We install the remaining requirements with pip.

pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt

Data

Please download the respective datasets and put the csv files in the destination folders (SWMH access needs to be granted by its creators).

Dataset Source Manually move to
Drugs.com Already in Repository not needed
SPAM 🔗 data/spam/ 📂
SWMH 🔗 data/swmh/ 📂
Thumbs-Up Already available on huggingface datasets not needed
WebMD 🔗 data/webmd/ 📂

Then you can run the three preprocessing script:

python preprocessing.py
python create_samples.py
python create_val_sets.py

Pretrained Models

Our code relies on some publicly available text diffusion model checkpoints, which you can download here:

Model Source Manually move to
GENIE 🔗 GENIE/ 📂
DiffuSeq 🔗 DiffuSeq/ 📂
SeqDiffuSeq t.b.d. SeqDiffuSeq/ 📂

Cite

Please use the following citation:

@misc{ochs2024privatesynthetictextgeneration,
      title={Private Synthetic Text Generation with Diffusion Models}, 
      author={Sebastian Ochs and Ivan Habernal},
      year={2024},
      eprint={2410.22971},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.22971}, 
}

Disclaimer

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

About

Private Synthetic Text Generation with Diffusion Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published