Skip to content

Commit d043f7f

Browse files
author
fzilan
committed
Merge remote-tracking branch 'upstream/master'
2 parents 9e3d2cc + 4505764 commit d043f7f

File tree

4 files changed

+83
-41
lines changed

4 files changed

+83
-41
lines changed

README.md

Lines changed: 55 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,13 @@ This repository contains SoTA algorithms, models, and interesting projects in th
55
ONE is short for "ONE for all"
66

77
## News
8-
- [2025.02.21] We support DeepSeek [Janus-Pro](https://huggingface.co/deepseek-ai/Janus-Pro-7B), a SoTA multimodal understanding and generation model. See [here](examples/janus) 🔥
8+
- [2025.04.10] We release MindONE [v0.3.0](https://github.com/mindspore-lab/mindone/releases/tag/v0.3.0). More than 15 SoTA generative models are added, including Flux, CogView4, OpenSora2.0, Movie Gen 30B , CogVideoX 5B~30B. Have fun!
9+
- [2025.02.21] We support DeepSeek [Janus-Pro](https://huggingface.co/deepseek-ai/Janus-Pro-7B), a SoTA multimodal understanding and generation model. See [here](examples/janus)
910
- [2024.11.06] MindONE [v0.2.0](https://github.com/mindspore-lab/mindone/releases/tag/v0.2.0) is released
1011

1112
## Quick tour
1213

13-
To install MindONE v0.2.0, please install [MindSpore 2.3.1](https://www.mindspore.cn/install) and run `pip install mindone`
14+
To install MindONE v0.3.0, please install [MindSpore 2.5.0](https://www.mindspore.cn/install) and run `pip install mindone`
1415

1516
Alternatively, to install the latest version from the `master` branch, please run.
1617
```
@@ -39,35 +40,59 @@ prompt = "A cat holding a sign that says 'Hello MindSpore'"
3940
image = pipe(prompt)[0][0]
4041
image.save("sd3.png")
4142
```
42-
43-
### supported models under mindone/examples
44-
| model | features
45-
| :--- | :-- |
46-
| [cambrian](https://github.com/mindspore-lab/mindone/blob/master/examples/cambrain) | working on it |
47-
| [minicpm-v](https://github.com/mindspore-lab/mindone/blob/master/examples/minicpm_v) | working on v2.6 |
48-
| [internvl](https://github.com/mindspore-lab/mindone/blob/master/examples/internvl) | working on v1.0 v1.5 v2.0 |
49-
| [llava](https://github.com/mindspore-lab/mindone/blob/master/examples/llava) | working on llava 1.5 & 1.6 |
50-
| [vila](https://github.com/mindspore-lab/mindone/blob/master/examples/vila) | working on it |
51-
| [pllava](https://github.com/mindspore-lab/mindone/blob/master/examples/pllava) | working on it |
52-
| [hpcai open sora](https://github.com/mindspore-lab/mindone/blob/master/examples/opensora_hpcai) | support v1.0/1.1/1.2 large scale training with dp/sp/zero |
53-
| [open sora plan](https://github.com/mindspore-lab/mindone/blob/master/examples/opensora_pku) | support v1.0/1.1/1.2 large scale training with dp/sp/zero |
54-
| [stable diffusion](https://github.com/mindspore-lab/mindone/blob/master/examples/stable_diffusion_v2) | support sd 1.5/2.0/2.1, vanilla fine-tune, lora, dreambooth, text inversion|
55-
| [stable diffusion xl](https://github.com/mindspore-lab/mindone/blob/master/examples/stable_diffusion_xl) |support sai style(stability AI) vanilla fine-tune, lora, dreambooth |
56-
| [dit](https://github.com/mindspore-lab/mindone/blob/master/examples/dit) | support text to image fine-tune |
57-
| [latte](https://github.com/mindspore-lab/mindone/blob/master/examples/latte) | support unconditional text to image fine-tune |
58-
| [animate diff](https://github.com/mindspore-lab/mindone/blob/master/examples/animatediff) | support motion module and lora training |
59-
| [video composer](https://github.com/mindspore-lab/mindone/tree/master/examples/videocomposer) | support conditional video generation with motion transfer and etc.|
60-
| [ip adapter](https://github.com/mindspore-lab/mindone/blob/master/examples/ip_adapter) | refactoring |
61-
| [t2i-adapter](https://github.com/mindspore-lab/mindone/blob/master/examples/t2i_adapter) | refactoring |
62-
| [dynamicrafter](https://github.com/mindspore-lab/mindone/blob/master/examples/dynamicrafter) | support image to video generation |
63-
| [hunyuan_dit](https://github.com/mindspore-lab/mindone/blob/master/examples/hunyuan_dit) | support text to image fine-tune |
64-
| [pixart_sigma](https://github.com/mindspore-lab/mindone/blob/master/examples/pixart_sigma) | support text to image fine-tune at different aspect ratio |
65-
6643
### run hf diffusers on mindspore
67-
mindone diffusers is under active development, most tasks were tested with mindspore 2.3.1 and ascend 910 hardware.
44+
- mindone diffusers is under active development, most tasks were tested with mindspore 2.5.0 on Ascend Atlas 800T A2 machines.
45+
- compatibale with hf diffusers 0.32.2
6846

6947
| component | features
7048
| :--- | :--
71-
| [pipeline](https://github.com/mindspore-lab/mindone/tree/master/mindone/diffusers/pipelines) | support text2image,text2video,text2audio tasks 30+
72-
| [models](https://github.com/mindspore-lab/mindone/tree/master/mindone/diffusers/models) | support audoencoder & transformers base models same as hf diffusers
73-
| [schedulers](https://github.com/mindspore-lab/mindone/tree/master/mindone/diffusers/schedulers) | support ddpm & dpm solver 10+ schedulers same as hf diffusers
49+
| [pipeline](https://github.com/mindspore-lab/mindone/tree/master/mindone/diffusers/pipelines) | support text-to-image,text-to-video,text-to-audio tasks 160+
50+
| [models](https://github.com/mindspore-lab/mindone/tree/master/mindone/diffusers/models) | support audoencoder & transformers base models same as hf diffusers 50+
51+
| [schedulers](https://github.com/mindspore-lab/mindone/tree/master/mindone/diffusers/schedulers) | support diffusion schedulers (e.g., ddpm and dpm solver) same as hf diffusers 35+
52+
53+
### supported models under mindone/examples
54+
55+
| task | model | inference | finetune | pretrain | institute |
56+
| :--- | :--- | :---: | :---: | :---: | :-- |
57+
| Image-to-Video | [hunyuanvideo-i2v](https://github.com/mindspore-lab/mindone/blob/master/examples/hunyuanvideo-i2v) 🔥🔥 || ✖️ | ✖️ | Tencent |
58+
| Text/Image-to-Video | [wan2.1](https://github.com/mindspore-lab/mindone/blob/master/examples/wan2_1) 🔥🔥🔥 || ✖️ | ✖️ | Alibaba |
59+
| Text-to-Image | [cogview4](https://github.com/mindspore-lab/mindone/blob/master/examples/cogview) 🔥🔥🔥 || ✖️ | ✖️ | Zhipuai |
60+
| Text-to-Video | [step_video_t2v](https://github.com/mindspore-lab/mindone/blob/master/examples/step_video_t2v) 🔥🔥 || ✖️ | ✖️ | StepFun |
61+
| Image-Text-to-Text | [qwen2_vl](https://github.com/mindspore-lab/mindone/blob/master/examples/qwen2_vl) 🔥🔥🔥|| ✖️ | ✖️ | Alibaba |
62+
| Any-to-Any | [janus](https://github.com/mindspore-lab/mindone/blob/master/examples/janus) 🔥🔥🔥 |||| DeepSeek |
63+
| Any-to-Any | [emu3](https://github.com/mindspore-lab/mindone/blob/master/examples/emu3) 🔥🔥 |||| BAAI |
64+
| Class-to-Image | [var](https://github.com/mindspore-lab/mindone/blob/master/examples/var)🔥🔥 |||| ByteDance |
65+
| Text/Image-to-Video | [hpcai open sora 1.2/2.0](https://github.com/mindspore-lab/mindone/blob/master/examples/opensora_hpcai) 🔥🔥 |||| HPC-AI Tech |
66+
| Text/Image-to-Video | [cogvideox 1.5 5B~30B ](https://github.com/mindspore-lab/mindone/blob/master/examples/diffusers/cogvideox_factory) 🔥🔥 |||| Zhipu |
67+
| Text-to-Video | [open sora plan 1.3](https://github.com/mindspore-lab/mindone/blob/master/examples/opensora_pku) 🔥🔥 |||| PKU |
68+
| Text-to-Video | [hunyuanvideo](https://github.com/mindspore-lab/mindone/blob/master/examples/hunyuanvideo) 🔥🔥|||| Tencent |
69+
| Text-to-Video | [movie gen 30B](https://github.com/mindspore-lab/mindone/blob/master/examples/moviegen) 🔥🔥 |||| Meta |
70+
| Video-Encode-Decode | [magvit](https://github.com/mindspore-lab/mindone/blob/master/examples/magvit) |||| Google |
71+
| Text-to-Image | [story_diffusion](https://github.com/mindspore-lab/mindone/blob/master/examples/story_diffusion) || ✖️ | ✖️ | ByteDance |
72+
| Image-to-Video | [dynamicrafter](https://github.com/mindspore-lab/mindone/blob/master/examples/dynamicrafter) || ✖️ | ✖️ | Tencent |
73+
| Video-to-Video | [venhancer](https://github.com/mindspore-lab/mindone/blob/master/examples/venhancer) || ✖️ | ✖️ | Shanghai AI Lab |
74+
| Text-to-Video | [t2v_turbo](https://github.com/mindspore-lab/mindone/blob/master/examples/t2v_turbo) |||| Google |
75+
| Image-to-Video | [svd](https://github.com/mindspore-lab/mindone/blob/master/examples/svd) |||| Stability AI |
76+
| Text-to-Video | [animate diff](https://github.com/mindspore-lab/mindone/blob/master/examples/animatediff) |||| CUHK |
77+
| Text/Image-to-Video | [video composer](https://github.com/mindspore-lab/mindone/tree/master/examples/videocomposer) |||| Alibaba |
78+
| Text-to-Image | [flux](https://github.com/mindspore-lab/mindone/blob/master/examples/diffusers/dreambooth/README_flux.md) 🔥 ||| ✖️ | Black Forest Lab |
79+
| Text-to-Image | [stable diffusion 3](https://github.com/mindspore-lab/mindone/blob/master/examples/diffusers/dreambooth/README_sd3.md) 🔥||| ✖️ | Stability AI |
80+
| Text-to-Image | [kohya_sd_scripts](https://github.com/mindspore-lab/mindone/blob/master/examples/kohya_sd_scripts) ||| ✖️ | kohya |
81+
| Text-to-Image | [stable diffusion xl](https://github.com/mindspore-lab/mindone/blob/master/examples/diffusers/text_to_image/README_sdxl.md) |||| Stability AI|
82+
| Text-to-Image | [stable diffusion](https://github.com/mindspore-lab/mindone/blob/master/examples/stable_diffusion_v2) |||| Stability AI |
83+
| Text-to-Image | [hunyuan_dit](https://github.com/mindspore-lab/mindone/blob/master/examples/hunyuan_dit) |||| Tencent |
84+
| Text-to-Image | [pixart_sigma](https://github.com/mindspore-lab/mindone/blob/master/examples/pixart_sigma) |||| Huawei |
85+
| Text-to-Image | [fit](https://github.com/mindspore-lab/mindone/blob/master/examples/fit) |||| Shanghai AI Lab |
86+
| Class-to-Video | [latte](https://github.com/mindspore-lab/mindone/blob/master/examples/latte) |||| Shanghai AI Lab |
87+
| Class-to-Image | [dit](https://github.com/mindspore-lab/mindone/blob/master/examples/dit) |||| Meta |
88+
| Text-to-Image | [t2i-adapter](https://github.com/mindspore-lab/mindone/blob/master/examples/t2i_adapter) |||| Shanghai AI Lab |
89+
| Text-to-Image | [ip adapter](https://github.com/mindspore-lab/mindone/blob/master/examples/ip_adapter) |||| Tencent |
90+
| Text-to-3D | [mvdream](https://github.com/mindspore-lab/mindone/blob/master/examples/mvdream) |||| ByteDance |
91+
| Image-to-3D | [instantmesh](https://github.com/mindspore-lab/mindone/blob/master/examples/instantmesh) |||| Tencent |
92+
| Image-to-3D | [sv3d](https://github.com/mindspore-lab/mindone/blob/master/examples/sv3d) |||| Stability AI |
93+
| Text/Image-to-3D | [hunyuan3d-1.0](https://github.com/mindspore-lab/mindone/blob/master/examples/hunyuan3d_1) |||| Tencent |
94+
95+
### supported captioner
96+
| task | model | inference | finetune | pretrain | features |
97+
| :--- | :--- | :---: | :---: | :---: | :-- |
98+
| Image-Text-to-Text | [pllava](https://github.com/mindspore-lab/mindone/tree/master/tools/captioners/PLLaVA) 🔥|| ✖️ | ✖️ | support video and image captioning |

README_re.md

Whitespace-only changes.

examples/README.md

Lines changed: 27 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -5,22 +5,38 @@
55

66
| model | codebase style | original repo
77
| :--- | :-- | :-
8+
| [cogview](https://github.com/mindspore-lab/mindone/blob/master/examples/cogview) | THUDM official | https://github.com/THUDM/CogView4 |
9+
| [wan2_1](https://github.com/mindspore-lab/mindone/blob/master/examples/wan2_1) | Alibaba Wan Group official| https://github.com/Wan-Video/Wan2.1 |
10+
| [step_video_t2v](https://github.com/mindspore-lab/mindone/blob/master/examples/step_video_t2v) | StepFun official | https://github.com/stepfun-ai/Step-Video-T2V |
11+
| [janus](https://github.com/mindspore-lab/mindone/blob/master/examples/janus) | DeepSeek AI official | https://github.com/deepseek-ai/Janus |
12+
| [emu3](https://github.com/mindspore-lab/mindone/blob/master/examples/emu3) | BAAIVision official | https://github.com/baaivision/Emu3 |
13+
| [var](https://github.com/mindspore-lab/mindone/blob/master/examples/var) | ByteDance FoundationVision official | https://github.com/FoundationVision/VAR |
14+
| [hpcai open sora](https://github.com/mindspore-lab/mindone/blob/master/examples/opensora_hpcai) | HPC-AI Tech official | https://github.com/hpcaitech/Open-Sora
15+
| [open sora plan](https://github.com/mindspore-lab/mindone/blob/master/examples/opensora_pku) | PKU-YuanGroup official | https://github.com/PKU-YuanGroup/Open-Sora-Plan
16+
| [flux](https://github.com/mindspore-lab/mindone/blob/master/examples/flux) | Black Forest Labs official | https://github.com/black-forest-labs/flux |
17+
| [movie gen](https://github.com/mindspore-lab/mindone/blob/master/examples/moviegen) | implemented by MindONE team, based on the MovieGen paper by Meta | https://arxiv.org/pdf/2310.05737 |
18+
| [hunyuan3d-1.0](https://github.com/mindspore-lab/mindone/blob/master/examples/hunyuan3d_1) | Tencent official | https://github.com/Tencent/Hunyuan3D-1 |
19+
| [kohya_sd_scripts](https://github.com/mindspore-lab/mindone/blob/master/examples/kohya_sd_scripts) | kohya | https://github.com/kohya-ss/sd-scripts |
20+
| [magvit](https://github.com/mindspore-lab/mindone/blob/master/examples/magvit) | implemented by MindONE team based on the MagViT-v2 paper by Google | https://arxiv.org/pdf/2310.05737 |
21+
| [instantmesh](https://github.com/mindspore-lab/mindone/blob/master/examples/instantmesh) | Tencent ARC Lab official | https://github.com/TencentARC/InstantMesh |
22+
| [hunyuanvideo](https://github.com/mindspore-lab/mindone/blob/master/examples/hunyuanvideo) | HunyuanVideo official | https://github.com/Tencent/HunyuanVideo |
23+
| [story_diffusion](https://github.com/mindspore-lab/mindone/blob/master/examples/story_diffusion) | HVision-NKU official | https://github.com/HVision-NKU/StoryDiffusion |
24+
| [dynamicrafter](https://github.com/mindspore-lab/mindone/blob/master/examples/dynamicrafter) | Tencent Research official | https://github.com/Doubiiu/DynamiCrafter
25+
| [hunyuan_dit](https://github.com/mindspore-lab/mindone/blob/master/examples/hunyuan_dit) | Tencent Research official | https://github.com/Tencent/HunyuanDiT
26+
| [pixart_sigma](https://github.com/mindspore-lab/mindone/blob/master/examples/pixart_sigma) | Noah Lab official | https://github.com/PixArt-alpha/PixArt-sigma
27+
| [svd](https://github.com/mindspore-lab/mindone/blob/master/examples/svd) | Stability AI official | https://github.com/Stability-AI/generative-models |
28+
| [mvdream](https://github.com/mindspore-lab/mindone/blob/master/examples/mvdream) | ByteDance official | https://github.com/bytedance/MVDream |
29+
| [sv3d](https://github.com/mindspore-lab/mindone/blob/master/examples/sv3d) | Stability AI official | https://github.com/Stability-AI/generative-models |
30+
| [hunyuanvideo-i2v](https://github.com/mindspore-lab/mindone/blob/master/examples/hunyuanvideo-i2v) | Tencent official | https://github.com/Tencent/HunyuanVideo-I2V |
31+
| [venhancer](https://github.com/mindspore-lab/mindone/blob/master/examples/venhancer) | Vchitect Shanghai AI Laboratory official | https://github.com/Vchitect/VEnhancer |
832
| [stable diffusion](https://github.com/mindspore-lab/mindone/blob/master/examples/stable_diffusion_v2) | Stability AI official | https://github.com/Stability-AI/stablediffusion
933
| [stable diffusion xl](https://github.com/mindspore-lab/mindone/blob/master/examples/stable_diffusion_xl) | Stability AI official| https://github.com/Stability-AI/generative-models |
1034
| [ip adaptor](https://github.com/vigo999/mindone/tree/master/examples/ip_adapter) | Tencent-ailab official | https://github.com/tencent-ailab/IP-Adapter
1135
| [t2i-adapter](https://github.com/vigo999/mindone/tree/master/examples/t2i_adapter) | ARC Lab, Tencent PCG official | https://github.com/TencentARC/T2I-Adapter
1236
| [dit](https://github.com/mindspore-lab/mindone/blob/master/examples/dit) | Facebook Research official | https://github.com/facebookresearch/DiT
37+
| [fit](https://github.com/mindspore-lab/mindone/blob/master/examples/fit) | Shanghai AI Lab official | https://github.com/whlzy/Fit
1338
| [latte](https://github.com/mindspore-lab/mindone/blob/master/examples/latte) | Vchitect Shanghai AI Laboratory official | https://github.com/Vchitect/Latte |
39+
| [t2v_turbo](https://github.com/mindspore-lab/mindone/tree/master/examples/t2v_turbo) | Google official | https://github.com/Ji4chenLi/t2v-turbo
1440
| [video composer](https://github.com/mindspore-lab/mindone/tree/master/examples/videocomposer) | ali vilab official | https://github.com/ali-vilab/videocomposer
1541
| [animatediff](https://github.com/mindspore-lab/mindone/tree/master/examples/animatediff) | Yuwei Guo official | https://github.com/guoyww/animatediff/
16-
| [hpcai open sora](https://github.com/mindspore-lab/mindone/blob/master/examples/opensora_hpcai) | HPC-AI Tech official | https://github.com/hpcaitech/Open-Sora
17-
| [open sora plan](https://github.com/mindspore-lab/mindone/blob/master/examples/opensora_pku) | PKU-YuanGroup official | https://github.com/PKU-YuanGroup/Open-Sora-Plan
18-
| [cambrian](https://github.com/mindspore-lab/mindone/blob/master/examples/cambrain) | offical github | https://github.com/cambrian-mllm/cambrian
19-
| [minicpm-v](https://github.com/mindspore-lab/mindone/blob/master/examples/minicpm_v) | OpenBMB official | https://github.com/OpenBMB/MiniCPM-V
20-
| [internvl](https://github.com/mindspore-lab/mindone/blob/master/examples/internvl) | Shanghai AI Lab official | https://github.com/OpenGVLab/InternVL
21-
| [llava](https://github.com/mindspore-lab/mindone/blob/master/examples/llava) | Haotian-Liu official | https://github.com/haotian-liu/LLaVA
22-
| [vila](https://github.com/mindspore-lab/mindone/blob/master/examples/vila) | Nvidia Lab official | https://github.com/NVlabs/VILA
23-
| [pllava](https://github.com/mindspore-lab/mindone/blob/master/examples/pllava) | Magic Research official | https://github.com/magic-research/PLLaVA
24-
| [dynamicrafter](https://github.com/mindspore-lab/mindone/blob/master/examples/dynamicrafter) | Tencent Research official | https://github.com/Doubiiu/DynamiCrafter
25-
| [hunyuan_dit](https://github.com/mindspore-lab/mindone/blob/master/examples/hunyuan_dit) | Tencent Research official | https://github.com/Tencent/HunyuanDiT
26-
| [pixart_sigma](https://github.com/mindspore-lab/mindone/blob/master/examples/pixart_sigma) | Noah Lab official | https://github.com/PixArt-alpha/PixArt-sigma
42+
| [qwen2-vl](https://github.com/mindspore-lab/mindone/tree/master/examples/qwen2_vl) | HF transformers official | https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct |

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ dependencies = [
3333
"sentencepiece",
3434
"trampoline",
3535
"numpy<2.0",
36+
"mindcv==0.3.0",
3637
"huggingface-hub>=0.20.2",
3738
"safetensors>=0.3.1",
3839
"transformers==4.46.3",

0 commit comments

Comments
 (0)