Skip to content
forked from SkyworkAI/UniPic

Unified Autoregressive Modeling for Visual Understanding and Generation

License

Notifications You must be signed in to change notification settings

xiaodiaoke/UniPic

 
 

Repository files navigation

SkyworkUniPic

Welcome to the Skywork-UniPic repository! This repository contains the model weights and implementation of our unified 1.5B-parameter autoregressive model that natively integrates image understanding, text-to-image generation, and image editing capabilities within a single architecture.

Skywork UniPic Teaser

Evaluation

Performance Overview

Main comparison

GenEval

Model Single Two Count Color Position Attr Overall
Diffusion Models
SDv2.1 0.98 0.51 0.44 0.85 0.07 0.17 0.50
SDXL 0.98 0.74 0.39 0.85 0.15 0.23 0.55
IF‑XL 0.97 0.74 0.66 0.81 0.13 0.35 0.61
LUMINA‑Next 0.92 0.46 0.48 0.70 0.09 0.13 0.46
SD3‑medium 0.99 0.94 0.72 0.89 0.33 0.60 0.74
FLUX.1‑dev 0.99 0.81 0.79 0.74 0.20 0.47 0.67
NOVA 0.99 0.91 0.62 0.85 0.33 0.56 0.71
Autoregressive Models
TokenFlow‑XL 0.95 0.60 0.41 0.81 0.16 0.24 0.55
Janus 0.97 0.68 0.30 0.84 0.46 0.42 0.61
Janus Pro 0.99 0.89 0.59 0.90 0.79 0.66 0.80
Emu3‑Gen 0.99 0.81 0.42 0.80 0.49 0.45 0.66
Show‑o 0.98 0.80 0.66 0.84 0.31 0.50 0.68
Unified Models
OmniGen 0.98 0.84 0.66 0.74 0.40 0.43 0.68
OmniGen2 1.00 0.95 0.64 0.88 0.55 0.76 0.80
OmniGen2† 0.99 0.96 0.74 0.98 0.71 0.75 0.86
MetaQuery‑XL† - - - - - - 0.80
BLIP3‑o† 4B - - - - - - 0.81
BLIP3‑o† 8B - - - - - - 0.84
BAGEL 0.99 0.94 0.81 0.88 0.64 0.63 0.82
BAGEL† 0.98 0.95 0.84 0.95 0.78 0.77 0.88
UniWorld‑V1 0.99 0.93 0.79 0.89 0.49 0.70 0.80
UniWorld‑V1† 0.98 0.93 0.81 0.89 0.74 0.71 0.84
Ovis‑U1 0.98 0.98 0.90 0.92 0.79 0.75 0.89
Proprietary Models
GPT‑4o 0.99 0.92 0.85 0.92 0.75 0.61 0.84
Skywork UniPic 0.98 0.92 0.74 0.91 0.89 0.72 0.86

DPG‑Bench

Model Global Entity Attribute Relation Other Overall
Diffusion Models
LUMINA‑Next 82.82 88.65 86.44 80.53 81.82 74.63
SDXL 83.27 82.43 80.91 86.76 80.41 74.65
PlayGroundv2.5 83.06 82.59 81.20 84.08 83.50 75.47
Hunyuan‑DiT 84.59 80.59 88.01 74.36 86.41 78.87
PixArt‑Σ 86.89 82.89 88.94 86.59 87.68 80.54
DALLE3 90.97 89.61 88.39 90.58 89.83 83.50
SD3‑medium 87.90 91.01 88.83 80.70 88.68 84.08
FLUX.1‑dev 82.10 89.50 88.70 91.10 89.40 84.00
Autoregressive Models
Show‑o 79.33 75.44 78.02 84.45 60.80 67.27
EMU3 85.21 86.68 86.84 90.22 83.15 80.60
TokenFlow‑XL 78.72 79.22 81.29 85.22 71.20 73.38
Janus 82.33 87.38 87.70 85.46 86.41 79.68
Janus Pro 86.90 88.90 89.40 89.32 89.48 84.19
BLIP3‑o 4B - - - - - 79.36
BLIP3‑o 8B - - - - - 81.60
Unified Models
OmniGen 87.90 88.97 88.47 87.95 83.56 81.16
OmniGen2 88.81 88.83 90.18 89.37 90.27 83.57
BAGEL 88.94 90.37 91.29 90.82 88.67 85.07
UniWorld‑V1 83.64 88.39 88.44 89.27 87.22 81.38
Ovis‑U1 82.37 90.08 88.68 93.35 85.20 83.72
Skywork UniPic 89.65 87.78 90.84 91.89 91.95 85.50

GEdit‑Bench‑EN

Model  SC ↑  PQ ↑ Overall ↑
Proprietary Models
Gemini‑2.0‑flash 6.73 6.61 6.32
GPT‑4o 7.85 7.62 7.53
Specialized Editing Models
Instruct‑Pix2Pix 3.58 5.49 3.68
MagicBrush 4.68 5.66 4.52
AnyEdit 3.18 5.82 3.21
ICEdit 5.11 6.85 4.84
Step1X‑Edit 7.09 6.76 6.70
Unified Models
OmniGen 5.96 5.89 5.06
OmniGen2 7.16 6.77 6.41
BAGEL 7.36 6.83 6.52
UniWorld‑V1 4.93 7.43 4.85
Ovis‑U1 - - 6.42
Skywork UniPic 6.72 6.18 5.83

Usage

📦 Required Packages

Create virtual environment and install dependencies with pip:

conda create -n unipic python==3.10.14
conda activate unipic
pip install -r requirements.txt

📥 Checkpoints

Download the model checkpoints from [🤗 SkyworkUniPic], It is recommended to use the following command to download the checkpoints

# pip install -U "huggingface_hub[cli]"
huggingface-cli download Skywork/Skywork-UniPic-1.5B  --local-dir checkpoint --repo-type model

🖼️ Text-to-image Generation

You can generate images from text prompts using the following command:

# only support 1024 image_size
export PYTHONPATH=./:$PYTHONPATH
python scripts/text2image.py configs/models/qwen2_5_1_5b_kl16_mar_h.py \
         --checkpoint checkpoint/pytorch_model.bin  --image_size 1024 \
         --prompt 'A glossy-coated golden retriever stands on the park lawn beside a life-sized penguin statue.'  --output output.jpg

To generate a list of images based on prompts in a json file.

export PYTHONPATH=./:$PYTHONPATH
accelerate launch scripts/batch_text2image.py configs/models/qwen2_5_1_5b_kl16_mar_h.py \
       --checkpoint checkpoint/pytorch_model.bin  --image_size 1024 \
       --data data/batch_t2i.json --output output --batch_size 2 --grid_size 2

The json file should look like:

[
  {
   "prompt": "A glossy-coated golden retriever stands on the park lawn beside a life-sized penguin statue."
  },
  {
   "prompt": "Digital portrait of a girl with rainbow hair."
  }
]

✏️ Image Editing

The image editing feature within this unified model is an exploratory module at the forefront of research. It currently faces challenges in edit precision, control granularity, and output consistency, and is not yet production-ready.

# only support 1024 image_size
export PYTHONPATH=./:$PYTHONPATH
python scripts/image_edit.py configs/models/qwen2_5_1_5b_kl16_mar_h.py \
         --checkpoint checkpoint/pytorch_model.bin  --image_size 1024 \
         --image data/sample.png --prompt "Replace the stars with the candle." \
         --output output.jpg 

📖 Image-to-text Generation

export PYTHONPATH=./:$PYTHONPATH
python scripts/image2text.py configs/models/qwen2_5_1_5b_kl16_mar_h.py \
         --checkpoint checkpoint/pytorch_model.bin  --image_size 1024 \
         --image data/sample.png --prompt "Describe the image in detail."

📜 License

This project is licensed under MIT License.

Citation

If you use Skywork-UniPic in your research, please cite:

@misc{skywork2025unipic,
  title={Skywork-UniPic: Unified Autoregressive Modeling for Visual Understanding and Generation},
  author={Skywork Multimodality Team},
  year={2025},
  publisher={GitHub},
  journal={GitHub repository},
  howpublished={\url{https://github.com/SkyworkAI/UniPic}},
  note={MIT License}
}

About

Unified Autoregressive Modeling for Visual Understanding and Generation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%