Code for "Calibrating Generative Models" by Henry D. Smith, Nathaniel L. Diamant, and Brian L. Trippe
We propose two lightweight, general-purpose algorithms, CGM-relax and CGM-reward, for fine-tuning generative models to match distribution-level constraints. We demonstrate that these algorithms apply to diverse model classes, data, and constraint types. Across all experiments, we find that CGM significantly reduces the constraint violation of the base model, while maintaining the fidelity of samples generated by the model.
Calibrating the Genie2 protein structure diffusion model to secondary structure statistics of natural proteins (CATH domains).
You can try out the cgm codebase by opening our demo notebook gmm_example.ipynb in Google Colab [link]. Alternatively, you can clone the cgm Github repository and follow our installation instructions.
We recommend using conda or mamba to install the cgm requirements.
mamba can be installed by following these instructions, which amount to the following:
curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
chmod +x Miniforge3-Linux-x86_64.sh
./Miniforge3-Linux-x86_64.sh
The cgm environment can be installed from the environment file:
mamba create -f env.yml
Once you have activated the cgm environment, install the cgm package (from the root directory of this repository):
python -m pip install -e .
To use cgm in the demo notebook, you also have to install cgm as an ipykernel:
python -m ipykernel install --user --name=cgm
You can verify that your installation is correct by running the tests, or by running the demo notebook gmm_example.ipynb.
To perform fine-tuning with CGM-relax or CGM-reward, you will first need to implement a subclass MyModel of Model, which is contained in cgm/model.py. Model is an abstract base class that represents the generative model to be calibrated. It has two methods that must be overridden:
- sample: draws samples from the generative model
- log_p: evaluates the log probability of samples from the generative model
An example implementation for continuous-time diffusion models, NeuralSDE is given in neural_sde/neural_sde.py.
Once you have implemented MyModel, you will then need to load or train your base model base_model as an instance of MyModel. You are then prepared to calibrate base_model using CGM-relax
from cgm.cgm import calibrate_relaxed
relax_model = calibrate_relaxed(
base_model
h,
hstar,
lambd,
)
or CGM-reward
from cgm.cgm import calibrate_reward
reward_model = calibrate_reward(
base_model
h,
hstar,
N_samps,
)
For a full demonstration of the package functionality, see our example reweighting mixture proportions in a GMM.
Make sure the cgm environment is activated.
Then run
python -m pytest tests