update code for image generation and evaluation

swyoon · Oct 27, 2024 · 723f227 · 723f227
1 parent 98d5a72
commit 723f227
Show file tree

Hide file tree

Showing 43 changed files with 9,402 additions and 693 deletions.
diff --git a/.gitignore b/.gitignore
@@ -162,4 +162,6 @@ cython_debug/
 #.idea/
 
 datasets/*
-pretrained/*
+pretrained/*
+.datasets_*
+results/*
diff --git a/README.md b/README.md
@@ -4,6 +4,7 @@ The official code release of
 **Maximum Entropy Inverse Reinforcement Learning of Diffusion Models with Energy-Based Models**
 
 Sangwoong Yoon, Himchan Hwang, Dohyun Kwon, Yung-Kyun Noh, Frank C. Park   
+**NeurIPS 2024 Oral Presentation**  
 arxiv: https://arxiv.org/abs/2407.00626
 
 ![DxMI](figure/DxMI_figure_crop.jpg)
@@ -14,6 +15,18 @@ arxiv: https://arxiv.org/abs/2407.00626
 * pytorch >= 2.0
 * cuda >= 11.6
 
+We recommend using Conda for setting up the environment.
+
+```
+conda create -n dxmi python=3.8
+conda activate dxmi
+# install your version of PyTorch
+pip install ...
+# install other dependencies
+pip install -r requirements.txt
+```
+
+
 ## Unit tests
 
 ```
@@ -22,18 +35,13 @@ python -m pytest tests/
 
 ## TODO & Status
 
+- [x] CIFAR-10 DDPM 
+- [x] CIFAR-10 DDGAN
+- [x] ImageNet64
+- [x] LSUN Bedroom
+- [x] FID Evaluation
 - [ ] 2D  
-- [ ] CIFAR-10 DDPM 
-    - [ ] Training
-    - [ ] Generation
-- [ ] CIFAR-10 DDGAN
-    - [ ] Training
-    - [x] Generation
-- [ ] ImageNet64
-    - [ ] Training
-    - [ ] Generation
 - [ ] Anomaly Detection
-- [ ] FID Evaluation
 
 
 ## Datasets
@@ -44,6 +52,9 @@ datasets
 ├── cifar10_train_png
 ├── cifar10_train_fid_stats.pt
 ├── imagenet  # corresponds to ILSVRC/Data/CLS-LOC/train
+    ├── n01734418
+    ├── ...
+├── lsun_bedroom_train
 └── mvtec
     ├── train_data.pth
     └── val_data.pth
@@ -53,14 +64,83 @@ Dataset files are released in [dropbox link](https://www.dropbox.com/scl/fo/kk65
 
 **CIFAR-10**
 
+We use CIFAR-10 dataset downloaded via PyTorch (`torchvision.datasets.CIFAR10`).
+
 **ImageNet 64x64**
 
-**MVTec-AD**
+Currently, ImageNet is hosted by Kaggle. 
+Please download the dataset from [Kaggle](https://www.kaggle.com/c/imagenet-object-localization-challenge/data).  
+
+Directories of training images should be placed under `datasets/imagenet`. You may create a symbolic link as follows:
+
+```
+ln -s <PATH_TO_DOWNLOADED_IMAGENET>/ILSVRC/Data/CLS-LOC/train datasets/imagenet
+```
+
+`datasets/imagenet` should have directories like `n01440764`, `n01734418`, ... as subdirectories.
+
+**LSUN Bedroom**
+
+LSUN Bedroom dataset is prepared following the protocol of [Consistency Models](https://github.com/openai/consistency_models/tree/main/datasets) repository. Extracted images are placed under `datasets/lsun_bedroom_train`.
 
 ## Model Checkpoints
 
-Model checkpoints files can be found in [dropbox link](https://www.dropbox.com/scl/fo/hubdctq91m273eomviuvb/AOKLhw1gg50ljxOSMTla8Ko?rlkey=o5ixr0xdr05391ap2fwigzdkx&dl=0)
+Model checkpoints files can be found in [dropbox link](https://www.dropbox.com/scl/fo/ax2xaua6xpuvtfprwu1z8/AG9X-AJi7Fg9U17Ua16tq70?rlkey=5xhjrroyndjqm7ox2kd8fod37&dl=0)
+
+Models checkpoints are supposed to be placed under `pretrained` directory. When fully equipped, the directory structure should be like the following:
+
+```
+pretrained
+├── cifar10_ddpm  # DDPM checkpoint provided by FastDPM
+├── cifar10_ddgan  # DDGAN checkpoint provided by DDGAN
+├── cifar10_ddpm_dxmi_T10
+├── cifar10_ddgan_dxmi_T4
+├── imagenet64_edm  # EDM checkpoint for ImageNet 64x64 provided by Consistency Models
+├── imagenet64_edm_dxmi_T10
+├── imagenet64_edm_dxmi_T4
+├── lsun_bedroom_edm  # EDM checkpoint for LSUN Bedroom provided by Consistency Models
+├── lsun_bedroom_edm_dxmi_T4
+```
+
+## Training
+
+The training scripts are invoked using `torchrun` command and supports multi-GPUs.
+
+**CIFAR-10** 
+
+* Training T=10 with DDPM backbone
+```
+$ CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 train_cifar10.py \
+    --config configs/cifar10/T10.yaml --dataset configs/cifar10/cifar10.yaml
+```
+The number of GPUs can be changed by modifying `CUDA_VISIBLE_DEVICES` and `--nproc_per_node` arguments, as in the following example.
+
+* Training T=4 with DDGAN backbone. 
+```
+$ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 train_cifar10.py \
+    --config configs/cifar10/T4_ddgan.yaml --dataset configs/cifar10/cifar10.yaml
+```
+
+
+**ImageNet 64x64 and LSUN Bedroom**
 
+* Training T=10 on ImageNet 64x64 with EDM backbone
+```
+$ CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 train_image_large.py \
+    --config configs/imagenet64/T10.yaml --dataset configs/imagenet64/imagenet64.yaml
+```
+
+* Training T=4 on ImageNet 64x64 with EDM backbone
+```
+$ CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 train_image_large.py \
+    --config configs/imagenet64/T4.yaml --dataset configs/imagenet64/imagenet64.yaml
+```
+
+* Training T=4 on LSUN Bedroom with EDM backbone
+```
+$ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 train_image_large.py \
+    --config configs/lsun/T4.yaml --dataset configs/lsun/bedroom.yaml
+```
 
 ## Generation
 
@@ -75,7 +155,42 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 generate_cifar10.py --l
 --stat datasets/cifar10_train_fid_stats.pt -n 50000
 ```
 
-## Training
+This scripts saves 50,000 images in PNG format under `pretrained/cifar10_ddpm_dxmi_T10/generated` directory. The generated images can be compressed into npz format as follows:
+
+```
+python make_npz.py --dir pretrained/cifar10_ddpm_dxmi_T10/generated --out pretrained/cifar10_ddpm_dxmi_T10/generated.npz
+```
 
+**ImageNet 64x64 and LSUN Bedroom**
+
+```
+CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 generate_large.py --log_dir pretrained/imagenet64_edm_dxmi_T10 --n_sample 50000 --batchsize 100
+```
 
-## Evaluation
+For LSUN Bedroom, images are too large so that we can not store them in the GPU memory. Therefore, we need to set `--skip_fid` flag.
+
+```
+CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 generate_large.py --log_dir pretrained/imagenet64_edm_dxmi_T10 --n_sample 50000 --batchsize 100 --skip_fid
+```
+
+## Evaluation for Image Generation
+
+We employ evaluation codes provided by [Consistency Models](https://github.com/openai/consistency_models/tree/main/evaluations).
+
+We recommend to create a separate conda environment for evaluation.
+
+```
+conda create -n eval python=3.8
+conda activate eval
+pip install tensorflow==2.XX   # install your version of TensorFlow
+pip install -r evaluations/requirements.txt
+```
+
+**Caution**: In order to utilize GPU for evaluation, you need to install an appropriate version of PyTorch for your environment (Python version, cuDNN version, CUDA version, etc.). See https://www.tensorflow.org/install/source#gpu for more details. In our case of Python 3.8, Tesla V100 GPU, CUDA 11.6, we installed TensorFlow 2.4.0.
+
+
+```
+cd evaluations
+python evaluator.py ../pretrained/cifar10_ddpm_dxmi_T10/generated.npz \
+../datasets/cifar10_train_png.npz
+```
diff --git a/configs/cifar10/T10.yaml b/configs/cifar10/T10.yaml
@@ -0,0 +1,62 @@
+sampler_net:
+ _target_: models.DxMI.unet_small.Model
+ resolution: 32
+ in_channels: 3
+ out_ch: 3
+ ch: 128
+ ch_mult: [1,2,2,2]
+ num_res_blocks: 2
+ attn_resolutions: [16,]
+ dropout: 0.1
+
+sampler:
+  _target_: models.DxMI.var_sampler.VARSampler
+  n_timesteps: 10
+  sample_shape: [3, 32, 32]
+  trainable_beta: fix_last 
+
+energy: Null
+
+value:
+  _target_: models.value.TimeIndependentValue
+  net:
+    _target_: models.modules.IGEBMEncoderV2
+    in_chan: 3
+    out_chan: 1
+    use_spectral_norm: False
+    keepdim: False
+    out_activation: linear
+    avg_pool_dim: 1
+    learn_out_scale: True
+    nh: 128
+
+
+trainer:
+  _target_: models.DxMI.trainer.DxMI_Trainer
+  tau1: 0.1
+  tau2: 0.01
+  gamma: 1
+  use_sampler_beta: True
+  time_cost: 0
+  adavelreg: 0.99
+  entropy_in_value: Null
+  velocity_in_value: Null
+  time_cost_sig: True
+    # skip_running_last: 2
+
+training:
+  sampler_ckpt: pretrained/cifar10_ddpm/model.ckpt.pth
+  value_ckpt: Null
+  fid_epoch: 1  # calculate FID per this much epoch, None means not calculate
+  n_epochs: 200
+  batchsize: 128
+  sampling_batchsize: 100
+  n_fid_samples: 10000
+  n_critic: 1
+  n_generator: 1
+  lr: 1e-7
+  v_lr: 1e-5
+  seed: 112233
+  log_every: 50
+  beta_lr: 1e-5
+
diff --git a/configs/cifar10/T4_ddgan.yaml b/configs/cifar10/T4_ddgan.yaml
@@ -0,0 +1,57 @@
+sampler_net:
+  _target_: models.ddgan.models.ncsnpp_generator_adagn.NCSNpp
+  config:
+    _target_: models.ddgan.NCSNppArgs
+
+sampler:
+  _target_: models.ddgan.DDGANSampler
+  n_timesteps: 4
+  sample_shape: [3, 32, 32]
+  trainable_beta: fix_last 
+  use_z: True
+
+energy: Null
+
+value:
+  _target_: models.value.TimeIndependentValue
+  net:
+    _target_: models.modules.IGEBMEncoderV2
+    in_chan: 3
+    out_chan: 1
+    use_spectral_norm: False
+    keepdim: False
+    out_activation: linear
+    avg_pool_dim: 1
+    learn_out_scale: True
+    nh: 128
+
+trainer:
+  _target_: models.DxMI.trainer.DxMI_Trainer
+  tau1: 0.1
+  tau2: 0.01
+  gamma: 1
+  use_sampler_beta: True 
+  n_timesteps: 4
+  time_cost: 0
+  time_cost_sig: 1
+  entropy_in_value: Null
+  velocity_in_value: Null
+  value_resample: True
+  adavelreg: 0.99
+
+training:
+  sampler_ckpt: ddgan_checkpoints/cifar10/ddgan_cifar10_exp1/netG_1200.pth
+  value_ckpt: Null
+  fid_epoch: 1  # calculate FID per this much epoch, None means not calculate
+  n_epochs: 100
+  batchsize: 128
+  sampling_batchsize: 100
+  n_fid_samples: 10000
+  n_critic: 1
+  n_generator: 1
+  lr: 1e-7
+  v_lr: 1e-5
+  beta_lr: 1e-5
+  seed: 112233
+  log_every: 50
+
diff --git a/configs/cifar10/cifar10.yaml b/configs/cifar10/cifar10.yaml
@@ -0,0 +1,8 @@
+# @package _global_
+data:
+  name: cifar10 
+  data_dir: datasets
+  # dataset:
+  #   _target_: loader.synthetic.sample2d
+  #   data: 8gaussians
+  #   scale_factor: 1