Calibrating Generative Models (CGM)

Code for "Calibrating Generative Models" by Henry D. Smith, Nathaniel L. Diamant, and Brian L. Trippe

We propose two lightweight, general-purpose algorithms, CGM-relax and CGM-reward, for fine-tuning generative models to match distribution-level constraints. We demonstrate that these algorithms apply to diverse model classes, data, and constraint types. Across all experiments, we find that CGM significantly reduces the constraint violation of the base model, while maintaining the fidelity of samples generated by the model.

Calibrating the Genie2 protein structure diffusion model to secondary structure statistics of natural proteins (CATH domains).

Getting Started

You can try out the cgm codebase by opening our demo notebook gmm_example.ipynb in Google Colab [link]. Alternatively, you can clone the cgm Github repository and follow our installation instructions.

Installation

Package manager

We recommend using conda or mamba to install the cgm requirements. mamba can be installed by following these instructions, which amount to the following:

curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
chmod +x Miniforge3-Linux-x86_64.sh
./Miniforge3-Linux-x86_64.sh

Environment install

The cgm environment can be installed from the environment file:

mamba create -f env.yml

Once you have activated the cgm environment, install the cgm package (from the root directory of this repository):

python -m pip install -e .

To use cgm in the demo notebook, you also have to install cgm as an ipykernel:

python -m ipykernel install --user --name=cgm

You can verify that your installation is correct by running the tests, or by running the demo notebook gmm_example.ipynb.

Usage

To perform fine-tuning with CGM-relax or CGM-reward, you will first need to implement a subclass MyModel of Model, which is contained in cgm/model.py. Model is an abstract base class that represents the generative model to be calibrated. It has two methods that must be overridden: - sample: draws samples from the generative model - log_p: evaluates the log probability of samples from the generative model An example implementation for continuous-time diffusion models, NeuralSDE is given in neural_sde/neural_sde.py.

Once you have implemented MyModel, you will then need to load or train your base model base_model as an instance of MyModel. You are then prepared to calibrate base_model using CGM-relax

from cgm.cgm import calibrate_relaxed

relax_model = calibrate_relaxed(
    base_model
    h,
    hstar,
    lambd,
)

or CGM-reward

from cgm.cgm import calibrate_reward

reward_model = calibrate_reward(
    base_model
    h,
    hstar,
    N_samps,
)

For a full demonstration of the package functionality, see our example reweighting mixture proportions in a GMM.

Tests

Make sure the cgm environment is activated. Then run

python -m pytest tests

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
cgm		cgm
figs		figs
genie2		genie2
neural_sde		neural_sde
tests		tests
tinystories		tinystories
.gitignore		.gitignore
LICENSE		LICENSE
env.yml		env.yml
pyproject.toml		pyproject.toml
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Calibrating Generative Models (CGM)

Getting Started

Installation

Package manager

Environment install

Usage

Tests

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

smithhenryd/cgm

Folders and files

Latest commit

History

Repository files navigation

Calibrating Generative Models (CGM)

Getting Started

Installation

Package manager

Environment install

Usage

Tests

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages