Merge remote-tracking branch 'upstream/master'

fzilan · fzilan · commit d043f7fce0eb · 2025-04-10T21:02:04.000+08:00
diff --git a/README.md b/README.md
@@ -5,12 +5,13 @@ This repository contains SoTA algorithms, models, and interesting projects in th
 ONE is short for "ONE for all"
 
 ## News
-- [2025.02.21] We support DeepSeek [Janus-Pro](https://huggingface.co/deepseek-ai/Janus-Pro-7B), a SoTA multimodal understanding and generation model. See [here](examples/janus) 🔥
+- [2025.04.10] We release MindONE [v0.3.0](https://github.com/mindspore-lab/mindone/releases/tag/v0.3.0). More than 15 SoTA generative models are added, including Flux, CogView4, OpenSora2.0, Movie Gen 30B , CogVideoX 5B~30B. Have fun!
+- [2025.02.21] We support DeepSeek [Janus-Pro](https://huggingface.co/deepseek-ai/Janus-Pro-7B), a SoTA multimodal understanding and generation model. See [here](examples/janus)
 - [2024.11.06] MindONE [v0.2.0](https://github.com/mindspore-lab/mindone/releases/tag/v0.2.0) is released
 
 ## Quick tour
 
-To install MindONE v0.2.0, please install [MindSpore 2.3.1](https://www.mindspore.cn/install) and run `pip install mindone`
+To install MindONE v0.3.0, please install [MindSpore 2.5.0](https://www.mindspore.cn/install) and run `pip install mindone`
 
 Alternatively, to install the latest version from the `master` branch, please run.
 ```
@@ -39,35 +40,59 @@ prompt = "A cat holding a sign that says 'Hello MindSpore'"
 image = pipe(prompt)[0][0]
 image.save("sd3.png")
 ```
-
-### supported models under mindone/examples
-| model  |  features  
-| :---   |  :--  |
-| [cambrian](https://github.com/mindspore-lab/mindone/blob/master/examples/cambrain)      | working on it |
-| [minicpm-v](https://github.com/mindspore-lab/mindone/blob/master/examples/minicpm_v)      | working on v2.6 |
-| [internvl](https://github.com/mindspore-lab/mindone/blob/master/examples/internvl)      | working on v1.0 v1.5 v2.0 |
-| [llava](https://github.com/mindspore-lab/mindone/blob/master/examples/llava)      | working on llava 1.5 & 1.6 |
-| [vila](https://github.com/mindspore-lab/mindone/blob/master/examples/vila)      | working on it |
-| [pllava](https://github.com/mindspore-lab/mindone/blob/master/examples/pllava)      | working on it |
-| [hpcai open sora](https://github.com/mindspore-lab/mindone/blob/master/examples/opensora_hpcai)      | support v1.0/1.1/1.2 large scale training with dp/sp/zero |
-| [open sora plan](https://github.com/mindspore-lab/mindone/blob/master/examples/opensora_pku) | support v1.0/1.1/1.2 large scale training with dp/sp/zero |
-| [stable diffusion](https://github.com/mindspore-lab/mindone/blob/master/examples/stable_diffusion_v2) | support sd 1.5/2.0/2.1, vanilla fine-tune, lora, dreambooth, text inversion|
-| [stable diffusion xl](https://github.com/mindspore-lab/mindone/blob/master/examples/stable_diffusion_xl)  |support sai style(stability AI) vanilla fine-tune, lora, dreambooth |
-| [dit](https://github.com/mindspore-lab/mindone/blob/master/examples/dit)     | support text to image fine-tune |
-| [latte](https://github.com/mindspore-lab/mindone/blob/master/examples/latte)     | support unconditional text to image fine-tune |
-| [animate diff](https://github.com/mindspore-lab/mindone/blob/master/examples/animatediff) | support motion module and lora training |
-| [video composer](https://github.com/mindspore-lab/mindone/tree/master/examples/videocomposer)     | support conditional video generation with motion transfer and etc.|
-| [ip adapter](https://github.com/mindspore-lab/mindone/blob/master/examples/ip_adapter)     | refactoring  |
-| [t2i-adapter](https://github.com/mindspore-lab/mindone/blob/master/examples/t2i_adapter)     | refactoring |
-| [dynamicrafter](https://github.com/mindspore-lab/mindone/blob/master/examples/dynamicrafter)     | support image to video generation |
-| [hunyuan_dit](https://github.com/mindspore-lab/mindone/blob/master/examples/hunyuan_dit)     | support text to image fine-tune |
-| [pixart_sigma](https://github.com/mindspore-lab/mindone/blob/master/examples/pixart_sigma)     | support text to image fine-tune at different aspect ratio |
-
 ###  run hf diffusers on mindspore
-mindone diffusers is under active development, most tasks were tested with mindspore 2.3.1 and ascend 910 hardware.
+ - mindone diffusers is under active development, most tasks were tested with mindspore 2.5.0 on Ascend Atlas 800T A2 machines.
+ - compatibale with hf diffusers 0.32.2
 
 | component  |  features  
 | :---   |  :--  
-| [pipeline](https://github.com/mindspore-lab/mindone/tree/master/mindone/diffusers/pipelines) | support text2image,text2video,text2audio tasks 30+
-| [models](https://github.com/mindspore-lab/mindone/tree/master/mindone/diffusers/models) | support audoencoder & transformers base models same as hf diffusers
-| [schedulers](https://github.com/mindspore-lab/mindone/tree/master/mindone/diffusers/schedulers) | support ddpm & dpm solver 10+ schedulers same as hf diffusers
+| [pipeline](https://github.com/mindspore-lab/mindone/tree/master/mindone/diffusers/pipelines) | support text-to-image,text-to-video,text-to-audio tasks 160+
+| [models](https://github.com/mindspore-lab/mindone/tree/master/mindone/diffusers/models) | support audoencoder & transformers base models same as hf diffusers 50+
+| [schedulers](https://github.com/mindspore-lab/mindone/tree/master/mindone/diffusers/schedulers) | support diffusion schedulers (e.g., ddpm and dpm solver) same as hf diffusers 35+
+
+### supported models under mindone/examples
+
+| task | model  | inference | finetune | pretrain | institute  |
+| :---   |  :---   |  :---:    |  :---:  |  :---:     |  :--  |
+| Image-to-Video | [hunyuanvideo-i2v](https://github.com/mindspore-lab/mindone/blob/master/examples/hunyuanvideo-i2v) 🔥🔥 |  ✅  | ✖️  | ✖️  | Tencent |
+| Text/Image-to-Video | [wan2.1](https://github.com/mindspore-lab/mindone/blob/master/examples/wan2_1) 🔥🔥🔥 |  ✅  |  ✖️  |  ✖️   | Alibaba  |
+| Text-to-Image | [cogview4](https://github.com/mindspore-lab/mindone/blob/master/examples/cogview) 🔥🔥🔥 | ✅ | ✖️  | ✖️  | Zhipuai |
+| Text-to-Video | [step_video_t2v](https://github.com/mindspore-lab/mindone/blob/master/examples/step_video_t2v) 🔥🔥 | ✅   | ✖️  | ✖️   | StepFun  |
+| Image-Text-to-Text | [qwen2_vl](https://github.com/mindspore-lab/mindone/blob/master/examples/qwen2_vl) 🔥🔥🔥|  ✅ |  ✖️ |  ✖️   | Alibaba |
+| Any-to-Any | [janus](https://github.com/mindspore-lab/mindone/blob/master/examples/janus)  🔥🔥🔥 | ✅  | ✅  | ✅  | DeepSeek |
+| Any-to-Any | [emu3](https://github.com/mindspore-lab/mindone/blob/master/examples/emu3)  🔥🔥 | ✅  | ✅  | ✅  |  BAAI |
+| Class-to-Image | [var](https://github.com/mindspore-lab/mindone/blob/master/examples/var)🔥🔥 | ✅  | ✅  | ✅  | ByteDance  |
+| Text/Image-to-Video | [hpcai open sora 1.2/2.0](https://github.com/mindspore-lab/mindone/blob/master/examples/opensora_hpcai)   🔥🔥   | ✅ | ✅ | ✅ | HPC-AI Tech  |
+| Text/Image-to-Video | [cogvideox 1.5 5B~30B ](https://github.com/mindspore-lab/mindone/blob/master/examples/diffusers/cogvideox_factory) 🔥🔥 | ✅ |  ✅  | ✅  | Zhipu  |
+| Text-to-Video | [open sora plan 1.3](https://github.com/mindspore-lab/mindone/blob/master/examples/opensora_pku) 🔥🔥 | ✅ | ✅ | ✅ | PKU |
+| Text-to-Video | [hunyuanvideo](https://github.com/mindspore-lab/mindone/blob/master/examples/hunyuanvideo) 🔥🔥| ✅  | ✅  | ✅  | Tencent  |
+| Text-to-Video | [movie gen 30B](https://github.com/mindspore-lab/mindone/blob/master/examples/moviegen) 🔥🔥  | ✅ | ✅ | ✅ | Meta |
+| Video-Encode-Decode | [magvit](https://github.com/mindspore-lab/mindone/blob/master/examples/magvit) |  ✅  |  ✅  |  ✅  | Google  |
+| Text-to-Image | [story_diffusion](https://github.com/mindspore-lab/mindone/blob/master/examples/story_diffusion) | ✅  | ✖️  | ✖️  | ByteDance |
+| Image-to-Video | [dynamicrafter](https://github.com/mindspore-lab/mindone/blob/master/examples/dynamicrafter)     | ✅  | ✖️  | ✖️  | Tencent  |
+| Video-to-Video | [venhancer](https://github.com/mindspore-lab/mindone/blob/master/examples/venhancer) |  ✅  | ✖️  | ✖️  | Shanghai AI Lab |
+| Text-to-Video | [t2v_turbo](https://github.com/mindspore-lab/mindone/blob/master/examples/t2v_turbo) |   ✅ |   ✅ |   ✅ | Google |
+| Image-to-Video | [svd](https://github.com/mindspore-lab/mindone/blob/master/examples/svd) | ✅  |  ✅ | ✅  | Stability AI |
+| Text-to-Video | [animate diff](https://github.com/mindspore-lab/mindone/blob/master/examples/animatediff) | ✅  | ✅  | ✅  | CUHK |
+| Text/Image-to-Video | [video composer](https://github.com/mindspore-lab/mindone/tree/master/examples/videocomposer)     | ✅  | ✅  | ✅  | Alibaba |
+| Text-to-Image | [flux](https://github.com/mindspore-lab/mindone/blob/master/examples/diffusers/dreambooth/README_flux.md)  🔥 | ✅ | ✅ | ✖️  | Black Forest Lab |
+| Text-to-Image | [stable diffusion 3](https://github.com/mindspore-lab/mindone/blob/master/examples/diffusers/dreambooth/README_sd3.md) 🔥| ✅ | ✅ | ✖️ | Stability AI |
+| Text-to-Image | [kohya_sd_scripts](https://github.com/mindspore-lab/mindone/blob/master/examples/kohya_sd_scripts) | ✅ | ✅ | ✖️  | kohya |
+| Text-to-Image | [stable diffusion xl](https://github.com/mindspore-lab/mindone/blob/master/examples/diffusers/text_to_image/README_sdxl.md)  | ✅ | ✅ | ✅ | Stability AI|
+| Text-to-Image | [stable diffusion](https://github.com/mindspore-lab/mindone/blob/master/examples/stable_diffusion_v2) | ✅ | ✅ | ✅ | Stability AI |
+| Text-to-Image | [hunyuan_dit](https://github.com/mindspore-lab/mindone/blob/master/examples/hunyuan_dit)     | ✅ | ✅ | ✅ | Tencent |
+| Text-to-Image | [pixart_sigma](https://github.com/mindspore-lab/mindone/blob/master/examples/pixart_sigma)     | ✅ | ✅ | ✅ | Huawei |
+| Text-to-Image | [fit](https://github.com/mindspore-lab/mindone/blob/master/examples/fit) | ✅ | ✅ | ✅ | Shanghai AI Lab  |
+| Class-to-Video | [latte](https://github.com/mindspore-lab/mindone/blob/master/examples/latte)     |✅  | ✅ | ✅  | Shanghai AI Lab |
+| Class-to-Image | [dit](https://github.com/mindspore-lab/mindone/blob/master/examples/dit)     | ✅  | ✅  | ✅  | Meta |
+| Text-to-Image | [t2i-adapter](https://github.com/mindspore-lab/mindone/blob/master/examples/t2i_adapter)     | ✅  | ✅  | ✅  | Shanghai AI Lab |
+| Text-to-Image | [ip adapter](https://github.com/mindspore-lab/mindone/blob/master/examples/ip_adapter)     | ✅  | ✅  | ✅  | Tencent  |
+| Text-to-3D | [mvdream](https://github.com/mindspore-lab/mindone/blob/master/examples/mvdream) |   ✅ |   ✅ |   ✅ | ByteDance  |
+| Image-to-3D | [instantmesh](https://github.com/mindspore-lab/mindone/blob/master/examples/instantmesh) | ✅  | ✅  | ✅  | Tencent  |
+| Image-to-3D | [sv3d](https://github.com/mindspore-lab/mindone/blob/master/examples/sv3d) |   ✅ |   ✅ |   ✅ | Stability AI  |
+| Text/Image-to-3D | [hunyuan3d-1.0](https://github.com/mindspore-lab/mindone/blob/master/examples/hunyuan3d_1)     | ✅ | ✅ | ✅ | Tencent |
+
+### supported captioner
+| task | model  | inference | finetune | pretrain | features  |
+| :---   |  :---   |  :---:    |  :---:  |  :---:     |  :--  |
+| Image-Text-to-Text | [pllava](https://github.com/mindspore-lab/mindone/tree/master/tools/captioners/PLLaVA) 🔥|  ✅ |  ✖️ |  ✖️   | support video and image captioning |
diff --git a/README_re.md b/README_re.md
diff --git a/examples/README.md b/examples/README.md
@@ -5,22 +5,38 @@
 
 | model   |  codebase style | original repo
 | :---   |  :--  | :-
+| [cogview](https://github.com/mindspore-lab/mindone/blob/master/examples/cogview) | THUDM official | https://github.com/THUDM/CogView4 |
+| [wan2_1](https://github.com/mindspore-lab/mindone/blob/master/examples/wan2_1) | Alibaba Wan Group official|  https://github.com/Wan-Video/Wan2.1 |
+| [step_video_t2v](https://github.com/mindspore-lab/mindone/blob/master/examples/step_video_t2v) | StepFun official | https://github.com/stepfun-ai/Step-Video-T2V   |
+| [janus](https://github.com/mindspore-lab/mindone/blob/master/examples/janus) | DeepSeek AI official | https://github.com/deepseek-ai/Janus |
+| [emu3](https://github.com/mindspore-lab/mindone/blob/master/examples/emu3) | BAAIVision official | https://github.com/baaivision/Emu3 |
+| [var](https://github.com/mindspore-lab/mindone/blob/master/examples/var) | ByteDance FoundationVision official | https://github.com/FoundationVision/VAR |
+| [hpcai open sora](https://github.com/mindspore-lab/mindone/blob/master/examples/opensora_hpcai)      | HPC-AI Tech official | https://github.com/hpcaitech/Open-Sora
+| [open sora plan](https://github.com/mindspore-lab/mindone/blob/master/examples/opensora_pku)      | PKU-YuanGroup official | https://github.com/PKU-YuanGroup/Open-Sora-Plan
+| [flux](https://github.com/mindspore-lab/mindone/blob/master/examples/flux) | Black Forest Labs official | https://github.com/black-forest-labs/flux |
+| [movie gen](https://github.com/mindspore-lab/mindone/blob/master/examples/moviegen)     | implemented by MindONE team, based on the MovieGen paper by Meta | https://arxiv.org/pdf/2310.05737  |
+| [hunyuan3d-1.0](https://github.com/mindspore-lab/mindone/blob/master/examples/hunyuan3d_1)     | Tencent official | https://github.com/Tencent/Hunyuan3D-1 |
+| [kohya_sd_scripts](https://github.com/mindspore-lab/mindone/blob/master/examples/kohya_sd_scripts) | kohya  | https://github.com/kohya-ss/sd-scripts |
+| [magvit](https://github.com/mindspore-lab/mindone/blob/master/examples/magvit) | implemented by MindONE team based on the MagViT-v2 paper by Google    | https://arxiv.org/pdf/2310.05737 |
+| [instantmesh](https://github.com/mindspore-lab/mindone/blob/master/examples/instantmesh) | Tencent ARC Lab official | https://github.com/TencentARC/InstantMesh |
+| [hunyuanvideo](https://github.com/mindspore-lab/mindone/blob/master/examples/hunyuanvideo) | HunyuanVideo official | https://github.com/Tencent/HunyuanVideo |
+| [story_diffusion](https://github.com/mindspore-lab/mindone/blob/master/examples/story_diffusion) | HVision-NKU official | https://github.com/HVision-NKU/StoryDiffusion |
+| [dynamicrafter](https://github.com/mindspore-lab/mindone/blob/master/examples/dynamicrafter)     | Tencent Research official | https://github.com/Doubiiu/DynamiCrafter
+| [hunyuan_dit](https://github.com/mindspore-lab/mindone/blob/master/examples/hunyuan_dit)     | Tencent Research official | https://github.com/Tencent/HunyuanDiT
+| [pixart_sigma](https://github.com/mindspore-lab/mindone/blob/master/examples/pixart_sigma)     | Noah Lab official | https://github.com/PixArt-alpha/PixArt-sigma
+| [svd](https://github.com/mindspore-lab/mindone/blob/master/examples/svd) | Stability AI official  | https://github.com/Stability-AI/generative-models  |
+| [mvdream](https://github.com/mindspore-lab/mindone/blob/master/examples/mvdream) | ByteDance official | https://github.com/bytedance/MVDream  |
+| [sv3d](https://github.com/mindspore-lab/mindone/blob/master/examples/sv3d) | Stability AI official   | https://github.com/Stability-AI/generative-models   |
+| [hunyuanvideo-i2v](https://github.com/mindspore-lab/mindone/blob/master/examples/hunyuanvideo-i2v) | Tencent official  | https://github.com/Tencent/HunyuanVideo-I2V  |
+| [venhancer](https://github.com/mindspore-lab/mindone/blob/master/examples/venhancer) |  Vchitect Shanghai AI Laboratory official | https://github.com/Vchitect/VEnhancer |
 | [stable diffusion](https://github.com/mindspore-lab/mindone/blob/master/examples/stable_diffusion_v2) | Stability AI official | https://github.com/Stability-AI/stablediffusion
 | [stable diffusion xl](https://github.com/mindspore-lab/mindone/blob/master/examples/stable_diffusion_xl)  | Stability AI official| https://github.com/Stability-AI/generative-models |
 | [ip adaptor](https://github.com/vigo999/mindone/tree/master/examples/ip_adapter)     | Tencent-ailab official | https://github.com/tencent-ailab/IP-Adapter
 | [t2i-adapter](https://github.com/vigo999/mindone/tree/master/examples/t2i_adapter)     | ARC Lab, Tencent PCG official | https://github.com/TencentARC/T2I-Adapter
 | [dit](https://github.com/mindspore-lab/mindone/blob/master/examples/dit)     | Facebook Research official | https://github.com/facebookresearch/DiT
+| [fit](https://github.com/mindspore-lab/mindone/blob/master/examples/fit)     | Shanghai AI Lab official | https://github.com/whlzy/Fit
 | [latte](https://github.com/mindspore-lab/mindone/blob/master/examples/latte)     | Vchitect Shanghai AI Laboratory official | https://github.com/Vchitect/Latte |
+| [t2v_turbo](https://github.com/mindspore-lab/mindone/tree/master/examples/t2v_turbo)      | Google official | https://github.com/Ji4chenLi/t2v-turbo
 | [video composer](https://github.com/mindspore-lab/mindone/tree/master/examples/videocomposer)      | ali vilab official | https://github.com/ali-vilab/videocomposer
 | [animatediff](https://github.com/mindspore-lab/mindone/tree/master/examples/animatediff) | Yuwei Guo official | https://github.com/guoyww/animatediff/
-| [hpcai open sora](https://github.com/mindspore-lab/mindone/blob/master/examples/opensora_hpcai)      | HPC-AI Tech official | https://github.com/hpcaitech/Open-Sora
-| [open sora plan](https://github.com/mindspore-lab/mindone/blob/master/examples/opensora_pku)      | PKU-YuanGroup official | https://github.com/PKU-YuanGroup/Open-Sora-Plan
-| [cambrian](https://github.com/mindspore-lab/mindone/blob/master/examples/cambrain)      | offical github  | https://github.com/cambrian-mllm/cambrian
-| [minicpm-v](https://github.com/mindspore-lab/mindone/blob/master/examples/minicpm_v)      | OpenBMB official | https://github.com/OpenBMB/MiniCPM-V
-| [internvl](https://github.com/mindspore-lab/mindone/blob/master/examples/internvl)      | Shanghai AI Lab official | https://github.com/OpenGVLab/InternVL
-| [llava](https://github.com/mindspore-lab/mindone/blob/master/examples/llava)      | Haotian-Liu official | https://github.com/haotian-liu/LLaVA
-| [vila](https://github.com/mindspore-lab/mindone/blob/master/examples/vila)      | Nvidia Lab official | https://github.com/NVlabs/VILA
-| [pllava](https://github.com/mindspore-lab/mindone/blob/master/examples/pllava)      | Magic Research official | https://github.com/magic-research/PLLaVA
-| [dynamicrafter](https://github.com/mindspore-lab/mindone/blob/master/examples/dynamicrafter)     | Tencent Research official | https://github.com/Doubiiu/DynamiCrafter
-| [hunyuan_dit](https://github.com/mindspore-lab/mindone/blob/master/examples/hunyuan_dit)     | Tencent Research official | https://github.com/Tencent/HunyuanDiT
-| [pixart_sigma](https://github.com/mindspore-lab/mindone/blob/master/examples/pixart_sigma)     | Noah Lab official | https://github.com/PixArt-alpha/PixArt-sigma
+| [qwen2-vl](https://github.com/mindspore-lab/mindone/tree/master/examples/qwen2_vl) | HF transformers official | https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct |
diff --git a/pyproject.toml b/pyproject.toml
@@ -33,6 +33,7 @@ dependencies = [
     "sentencepiece",
     "trampoline",
     "numpy<2.0",
+    "mindcv==0.3.0",
     "huggingface-hub>=0.20.2",
     "safetensors>=0.3.1",
     "transformers==4.46.3",