GitHub - jiwoogit/ARBooth: [ICCV 2025] Fine-Tuning Visual Autoregressive Models for Subject-Driven Generation

This is the official implementation for the paper "[ICCV 2025] Fine-Tuning Visual Autoregressive Models for Subject-Driven Generation".

Status Checklist

Code & checkpoint upload completed
FlexAttention finetuning option enabled
Infinity-8B checkpoint finetuning enabled

Overview

We introduce a method for fine-tuning visual autoregressive (VAR) models tailored for subject-driven generation tasks. Our approach efficiently customizes VAR models, enabling high-quality personalized image generation.

Hardware Requirements

Our experiments were conducted using a NVIDIA A6000 GPU. Please ensure your hardware meets the following minimum specification:

GPU Memory: ≥ 40GB

Installation

Option 1: Manual Setup

Clone the repository and install dependencies:

git clone https://github.com/jiwoogit/ARBooth.git
cd arbooth
pip install -r requirements.txt

Option 2: Docker Setup

Use our pre-configured Docker image:

docker pull wldn0202/arbooth:latest

Docker Hub link: Docker Image

Pretrained Checkpoints

Please download the official pretrained VAR checkpoints from Infinity's repository and organize them as follows:

weights/
├── infinity_2b_reg.pth
└── infinity_vae_d32_reg.pth

You can download our fine-tuned checkpoints from Hugging Face (wldn0202/ARBooth).

Data Preprocessing

We adopt the preprocessing pipeline of DreamMatcher. Please follow their instructions for detailed steps or refer inputs directory.

Training

Customize training parameters by modifying exp_name and cls_name in the provided script:

bash scripts/train_arbooth.sh

All training results and logs will be saved under the LOCAL_OUT directory.

For detailed configuration options and parameters for fine-tuning, please refer to infinity/utils/arg_util.py.

Evaluation

We evaluate performance using metrics: DINO, CLIP, PRES, and DIV. Update the paths in scripts/eval_arbooth.sh to match your training setup:

bash scripts/eval_arbooth.sh

Inference

Generate images using your custom prompts with the fine-tuned checkpoints:

bash scripts/infer_arbooth.sh

Fine-tuning Tips

Iteration Settings:
- For 2-batch configuration: 500 iterations is recommended
- For 1-batch configuration: 100-150 iterations is recommended
- Adjust these values based on your specific input data and requirements
Class Prompt Selection:
- The choice of class prompt (e.g., "dog", "cat") significantly impacts the final generation quality
- Use general, broad category nouns for optimal results

Acknowledgements

This repository is built upon the following projects:

We sincerely appreciate their invaluable contributions.

Citation

If our paper or repository assists your research, kindly cite us:

@article{chung2025fine,
  title={Fine-Tuning Visual Autoregressive Models for Subject-Driven Generation},
  author={Chung, Jiwoo and Hyun, Sangeek and Kim, Hyunjun and Koh, Eunseo and Lee, MinKyu and Heo, Jae-Pil},
  journal={arXiv preprint arXiv:2504.02612},
  year={2025}
}

Contact

For any questions, please reach out to:

Jiwoo Chung ([email protected])

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
assets		assets
booth_evaluation		booth_evaluation
infinity		infinity
inputs		inputs
scripts		scripts
tools		tools
weights		weights
LICENSE		LICENSE
README.md		README.md
booth_train_final.py		booth_train_final.py
booth_trainer_final.py		booth_trainer_final.py
prompts_validate.txt		prompts_validate.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Status Checklist

Overview

Hardware Requirements

Installation

Option 1: Manual Setup

Option 2: Docker Setup

Pretrained Checkpoints

Data Preprocessing

Training

Evaluation

Inference

Fine-tuning Tips

Acknowledgements

Citation

Contact

About

Uh oh!

Releases

Languages

License

jiwoogit/ARBooth

Folders and files

Latest commit

History

Repository files navigation

Status Checklist

Overview

Hardware Requirements

Installation

Option 1: Manual Setup

Option 2: Docker Setup

Pretrained Checkpoints

Data Preprocessing

Training

Evaluation

Inference

Fine-tuning Tips

Acknowledgements

Citation

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Languages