SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding

Qianqian Sun^1*, Jixiang Luo^2†, Dell Zhang², Xuelong Li²

¹ The University of Hongkong, ² Institute of Artificial Intelligence (TeleAI)

We propose SmartFreeEdit to address the challenge of reasoning instructions and segmentations in image editing, thereby enhancing the practicality of AI editing. Our method effectively handles some semantic editing operations, including adding, removing, changing objects, background changing and global editing.

⏱️ Update News

[2025.7.07] Webpage has been released!
[2025.7.05] Our paper has been accpeted by ACMMM'2025 and Code for image editing is released!
[2025.4.17] Our paper has been released on arxiv Papers and is currently under review for ACM Multimedia 2025 (ACMMM 2025).

📖 Pipeline

Our SmartFreeEdit consists of three key components: 1) An MLLM-driven Promptist that decomposes instructions into Editing Objects, Category, and Target Prompt. 2) Reasoning segmentation converts the prompt into an inference query and generates reasoning masks. 3) An Inpainting-based Image Editor using the hypergraph computation module to enhance global image structure understanding for more accurate edits.

🚀 Getting Started

Environment Requirement 🌍

Clone the repo:

git clone https://github.com/smileformylove/SmartFreeEdit

We recommend you first use conda to create virtual environment, then run:

conda create -n smartfreeedit python=3.10.6 -y
conda activate smartfreeedit
python -m pip install --upgrade pip
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu117

Then, you can install diffusers (implemented in this repo) with:

pip install -e .

Finally, you should install the remaining environment:

pip install -r requirements.txt
pip install flash-attn --no-build-isolation

Downloading Checkpoints

Checkpoints of SmartFreeEdit can be downloaded using the following command:

python SmartFreeEdit/download.py

Running Gradio demo

We provide a demo scripts for different hardware configurations. For users with server access and sufficient CPU/GPU memory ( >40/24 GB), we recommend you use:

export PYTHONPATH=.:$PYTHONPATH

export CUDA_VISIBLE_DEVICES=0

python SmartFreeEdit/src/smartfreeedit_app.py

Training

To train SmartFreeEdit, you need to download the BrushData here and Checkpoints of BrushNet here.

The data strcture should be like:

|-- data
    |-- BrushData
    |-- BrushDench
    |-- EditBench
    |-- ckpt
        |-- realisticVisionV60B1_v51VAE
            |-- model_index.json
            |-- vae
            |-- ...
        |-- segmentation_mask_brushnet_ckpt
        |-- segmentation_mask_brushnet_ckpt_sdxl_v0
        |-- random_mask_brushnet_ckpt
        |-- random_mask_brushnet_ckpt_sdxl_v0
        |-- ...

Train with segmentation mask using the script:

accelerate launch train/brushnet/train_brushnet.py \
--pretrained_model_name_or_path data/base_model/realisticVisionV60B1_v51VAE \
--output_dir examples/hyper \
--train_data_dir data/BrushData \
--resolution 512 \
--learning_rate 1e-5 \
--train_batch_size 8 \
--tracker_project_name brushnet \
--report_to tensorboard \
--validation_steps 300

You can inference with the script:

python train/brushnet/test_brushnet.py

You can evaluate using the script:

python train/brushnet/evaluate_brushnet.py \
--brushnet_ckpt_path data/ckpt/segmentation_mask_brushnet_ckpt \
--image_save_path runs/evaluation_result/BrushBench/brushnet_segmask/inside \
--mapping_file data/BrushBench/mapping_file.json \
--base_dir data/BrushBench \
--mask_key inpainting_mask

Inference

Please download Reason-Edit evaluation benchmark from SmartEdit and put it in file dataset.

Use the script to inference on understanding and reasoning scenes:


python test/ReasonEdit_test.py --save_dir /SmartFreeEdit/edited_images --ReasonEdit_benchmark_dir /dataset/Reasonedit

🖋️ Citation

If you find our work helpful, please star ⭐this repo and cite 📑 our paper. Thanks for your support!

@article{sun2025smartfreeedit,
  title={SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding},
  author={Sun, Qianqian and Luo, Jixiang and Zhang, Dell and Li, Xuelong},
  journal={arXiv preprint arXiv:2504.12704},
  year={2025}
}

📧 Contact

This repository is currently under active development and restructuring. The codebase is being optimized for better stability and reproducibility. While we strive to maintain code quality, you may encounter temporary issues during this transition period. For any questions or technical discussions, feel free to open an issue or contact us via email at [email protected].

👍🏻 Acknowledgements

Our code is modified based on BrushNet, thanks to all the contributors!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
SmartFreeEdit		SmartFreeEdit
assets		assets
docs		docs
src/diffusers		src/diffusers
test		test
train/brushnet		train/brushnet
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding

⏱️ Update News

📖 Pipeline

🚀 Getting Started

Environment Requirement 🌍

Downloading Checkpoints

Running Gradio demo

Training

Inference

🖋️ Citation

📧 Contact

👍🏻 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding

⏱️ Update News

📖 Pipeline

🚀 Getting Started

Environment Requirement 🌍

Downloading Checkpoints

Running Gradio demo

Training

Inference

🖋️ Citation

📧 Contact

👍🏻 Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages