Skip to content

Commit

Permalink
update code for image generation and evaluation
Browse files Browse the repository at this point in the history
  • Loading branch information
swyoon committed Oct 27, 2024
1 parent 98d5a72 commit 723f227
Show file tree
Hide file tree
Showing 43 changed files with 9,402 additions and 693 deletions.
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -162,4 +162,6 @@ cython_debug/
#.idea/

datasets/*
pretrained/*
pretrained/*
.datasets_*
results/*
143 changes: 129 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ The official code release of
**Maximum Entropy Inverse Reinforcement Learning of Diffusion Models with Energy-Based Models**

Sangwoong Yoon, Himchan Hwang, Dohyun Kwon, Yung-Kyun Noh, Frank C. Park
**NeurIPS 2024 Oral Presentation**
arxiv: https://arxiv.org/abs/2407.00626

![DxMI](figure/DxMI_figure_crop.jpg)
Expand All @@ -14,6 +15,18 @@ arxiv: https://arxiv.org/abs/2407.00626
* pytorch >= 2.0
* cuda >= 11.6

We recommend using Conda for setting up the environment.

```
conda create -n dxmi python=3.8
conda activate dxmi
# install your version of PyTorch
pip install ...
# install other dependencies
pip install -r requirements.txt
```


## Unit tests

```
Expand All @@ -22,18 +35,13 @@ python -m pytest tests/

## TODO & Status

- [x] CIFAR-10 DDPM
- [x] CIFAR-10 DDGAN
- [x] ImageNet64
- [x] LSUN Bedroom
- [x] FID Evaluation
- [ ] 2D
- [ ] CIFAR-10 DDPM
- [ ] Training
- [ ] Generation
- [ ] CIFAR-10 DDGAN
- [ ] Training
- [x] Generation
- [ ] ImageNet64
- [ ] Training
- [ ] Generation
- [ ] Anomaly Detection
- [ ] FID Evaluation


## Datasets
Expand All @@ -44,6 +52,9 @@ datasets
├── cifar10_train_png
├── cifar10_train_fid_stats.pt
├── imagenet # corresponds to ILSVRC/Data/CLS-LOC/train
├── n01734418
├── ...
├── lsun_bedroom_train
└── mvtec
├── train_data.pth
   └── val_data.pth
Expand All @@ -53,14 +64,83 @@ Dataset files are released in [dropbox link](https://www.dropbox.com/scl/fo/kk65

**CIFAR-10**

We use CIFAR-10 dataset downloaded via PyTorch (`torchvision.datasets.CIFAR10`).

**ImageNet 64x64**

**MVTec-AD**
Currently, ImageNet is hosted by Kaggle.
Please download the dataset from [Kaggle](https://www.kaggle.com/c/imagenet-object-localization-challenge/data).

Directories of training images should be placed under `datasets/imagenet`. You may create a symbolic link as follows:

```
ln -s <PATH_TO_DOWNLOADED_IMAGENET>/ILSVRC/Data/CLS-LOC/train datasets/imagenet
```

`datasets/imagenet` should have directories like `n01440764`, `n01734418`, ... as subdirectories.

**LSUN Bedroom**

LSUN Bedroom dataset is prepared following the protocol of [Consistency Models](https://github.com/openai/consistency_models/tree/main/datasets) repository. Extracted images are placed under `datasets/lsun_bedroom_train`.

## Model Checkpoints

Model checkpoints files can be found in [dropbox link](https://www.dropbox.com/scl/fo/hubdctq91m273eomviuvb/AOKLhw1gg50ljxOSMTla8Ko?rlkey=o5ixr0xdr05391ap2fwigzdkx&dl=0)
Model checkpoints files can be found in [dropbox link](https://www.dropbox.com/scl/fo/ax2xaua6xpuvtfprwu1z8/AG9X-AJi7Fg9U17Ua16tq70?rlkey=5xhjrroyndjqm7ox2kd8fod37&dl=0)

Models checkpoints are supposed to be placed under `pretrained` directory. When fully equipped, the directory structure should be like the following:

```
pretrained
├── cifar10_ddpm # DDPM checkpoint provided by FastDPM
├── cifar10_ddgan # DDGAN checkpoint provided by DDGAN
├── cifar10_ddpm_dxmi_T10
├── cifar10_ddgan_dxmi_T4
├── imagenet64_edm # EDM checkpoint for ImageNet 64x64 provided by Consistency Models
├── imagenet64_edm_dxmi_T10
├── imagenet64_edm_dxmi_T4
├── lsun_bedroom_edm # EDM checkpoint for LSUN Bedroom provided by Consistency Models
├── lsun_bedroom_edm_dxmi_T4
```

## Training

The training scripts are invoked using `torchrun` command and supports multi-GPUs.

**CIFAR-10**

* Training T=10 with DDPM backbone
```
$ CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 train_cifar10.py \
--config configs/cifar10/T10.yaml --dataset configs/cifar10/cifar10.yaml
```
The number of GPUs can be changed by modifying `CUDA_VISIBLE_DEVICES` and `--nproc_per_node` arguments, as in the following example.

* Training T=4 with DDGAN backbone.
```
$ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 train_cifar10.py \
--config configs/cifar10/T4_ddgan.yaml --dataset configs/cifar10/cifar10.yaml
```


**ImageNet 64x64 and LSUN Bedroom**

* Training T=10 on ImageNet 64x64 with EDM backbone
```
$ CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 train_image_large.py \
--config configs/imagenet64/T10.yaml --dataset configs/imagenet64/imagenet64.yaml
```

* Training T=4 on ImageNet 64x64 with EDM backbone
```
$ CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 train_image_large.py \
--config configs/imagenet64/T4.yaml --dataset configs/imagenet64/imagenet64.yaml
```

* Training T=4 on LSUN Bedroom with EDM backbone
```
$ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 train_image_large.py \
--config configs/lsun/T4.yaml --dataset configs/lsun/bedroom.yaml
```

## Generation

Expand All @@ -75,7 +155,42 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 generate_cifar10.py --l
--stat datasets/cifar10_train_fid_stats.pt -n 50000
```

## Training
This scripts saves 50,000 images in PNG format under `pretrained/cifar10_ddpm_dxmi_T10/generated` directory. The generated images can be compressed into npz format as follows:

```
python make_npz.py --dir pretrained/cifar10_ddpm_dxmi_T10/generated --out pretrained/cifar10_ddpm_dxmi_T10/generated.npz
```

**ImageNet 64x64 and LSUN Bedroom**

```
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 generate_large.py --log_dir pretrained/imagenet64_edm_dxmi_T10 --n_sample 50000 --batchsize 100
```

## Evaluation
For LSUN Bedroom, images are too large so that we can not store them in the GPU memory. Therefore, we need to set `--skip_fid` flag.

```
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 generate_large.py --log_dir pretrained/imagenet64_edm_dxmi_T10 --n_sample 50000 --batchsize 100 --skip_fid
```

## Evaluation for Image Generation

We employ evaluation codes provided by [Consistency Models](https://github.com/openai/consistency_models/tree/main/evaluations).

We recommend to create a separate conda environment for evaluation.

```
conda create -n eval python=3.8
conda activate eval
pip install tensorflow==2.XX # install your version of TensorFlow
pip install -r evaluations/requirements.txt
```

**Caution**: In order to utilize GPU for evaluation, you need to install an appropriate version of PyTorch for your environment (Python version, cuDNN version, CUDA version, etc.). See https://www.tensorflow.org/install/source#gpu for more details. In our case of Python 3.8, Tesla V100 GPU, CUDA 11.6, we installed TensorFlow 2.4.0.


```
cd evaluations
python evaluator.py ../pretrained/cifar10_ddpm_dxmi_T10/generated.npz \
../datasets/cifar10_train_png.npz
```
62 changes: 62 additions & 0 deletions configs/cifar10/T10.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
sampler_net:
_target_: models.DxMI.unet_small.Model
resolution: 32
in_channels: 3
out_ch: 3
ch: 128
ch_mult: [1,2,2,2]
num_res_blocks: 2
attn_resolutions: [16,]
dropout: 0.1

sampler:
_target_: models.DxMI.var_sampler.VARSampler
n_timesteps: 10
sample_shape: [3, 32, 32]
trainable_beta: fix_last

energy: Null

value:
_target_: models.value.TimeIndependentValue
net:
_target_: models.modules.IGEBMEncoderV2
in_chan: 3
out_chan: 1
use_spectral_norm: False
keepdim: False
out_activation: linear
avg_pool_dim: 1
learn_out_scale: True
nh: 128


trainer:
_target_: models.DxMI.trainer.DxMI_Trainer
tau1: 0.1
tau2: 0.01
gamma: 1
use_sampler_beta: True
time_cost: 0
adavelreg: 0.99
entropy_in_value: Null
velocity_in_value: Null
time_cost_sig: True
# skip_running_last: 2

training:
sampler_ckpt: pretrained/cifar10_ddpm/model.ckpt.pth
value_ckpt: Null
fid_epoch: 1 # calculate FID per this much epoch, None means not calculate
n_epochs: 200
batchsize: 128
sampling_batchsize: 100
n_fid_samples: 10000
n_critic: 1
n_generator: 1
lr: 1e-7
v_lr: 1e-5
seed: 112233
log_every: 50
beta_lr: 1e-5

57 changes: 57 additions & 0 deletions configs/cifar10/T4_ddgan.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
sampler_net:
_target_: models.ddgan.models.ncsnpp_generator_adagn.NCSNpp
config:
_target_: models.ddgan.NCSNppArgs

sampler:
_target_: models.ddgan.DDGANSampler
n_timesteps: 4
sample_shape: [3, 32, 32]
trainable_beta: fix_last
use_z: True

energy: Null

value:
_target_: models.value.TimeIndependentValue
net:
_target_: models.modules.IGEBMEncoderV2
in_chan: 3
out_chan: 1
use_spectral_norm: False
keepdim: False
out_activation: linear
avg_pool_dim: 1
learn_out_scale: True
nh: 128

trainer:
_target_: models.DxMI.trainer.DxMI_Trainer
tau1: 0.1
tau2: 0.01
gamma: 1
use_sampler_beta: True
n_timesteps: 4
time_cost: 0
time_cost_sig: 1
entropy_in_value: Null
velocity_in_value: Null
value_resample: True
adavelreg: 0.99

training:
sampler_ckpt: ddgan_checkpoints/cifar10/ddgan_cifar10_exp1/netG_1200.pth
value_ckpt: Null
fid_epoch: 1 # calculate FID per this much epoch, None means not calculate
n_epochs: 100
batchsize: 128
sampling_batchsize: 100
n_fid_samples: 10000
n_critic: 1
n_generator: 1
lr: 1e-7
v_lr: 1e-5
beta_lr: 1e-5
seed: 112233
log_every: 50

8 changes: 8 additions & 0 deletions configs/cifar10/cifar10.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# @package _global_
data:
name: cifar10
data_dir: datasets
# dataset:
# _target_: loader.synthetic.sample2d
# data: 8gaussians
# scale_factor: 1
Loading

0 comments on commit 723f227

Please sign in to comment.