From 3d0369cdf973efd9e6434ab460f30f805f150f96 Mon Sep 17 00:00:00 2001 From: Fzilan <33061146+Fzilan@users.noreply.github.com> Date: Sun, 21 Dec 2025 15:31:57 +0800 Subject: [PATCH 1/2] docs: add v0.5.0 changelog --- CHANGELOG.md | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index e69de29bb2..cfef494ba9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -0,0 +1,28 @@ +# Changelog + +## [v0.5.0] - 2025-12-21 + +### Highlights +- Added dozens of new transformer checkpoints and processors (e.g., CSM, ModernBERT Decoder, SmolLM3, Hunyuan v1, GLM4 variants, Qwen3/Qwen3-Omni, Mlcd) and refreshed the transformers base to 4.57.1, expanding support for the latest model families. (#1391, #1396, #1397, #1399, #1401, #1403, #1409, #1411, #1462, #1464, #1472) +- Expanded generative pipelines and diffusion tooling with new QwenImage, HiDream, CannyEdit, and OmniGen2 fine-tuning utilities alongside ComfyUI root assets and text encoder resources. (#1288, #1346, #1360, #1410, #1480, #1481) +- Improved interoperability and performance through AutoencoderMixin, context parallelism enhancements, accelerated pipelines, and wide-ranging compatibility fixes for MindSpore 2.6/2.7 and downstream examples. (#1433, #1438, #1444, #1383, #1385, #1417) + +### Added +- Introduced many new transformer models and processors, including Exaone, Florence2, Parakeet, VaultGemma, T5Gemma, DOGE, Ernie 4.5 (dense & MoE), SAM-HQ, BLT/Apertus/Ministral, Mistral3 (4.57.1), EOMT, TimesFM, XCodec, Kyutai Speech-to-Text, and more. (#1392, #1393, #1395, #1396, #1397, #1399, #1401, #1403, #1407, #1420, #1451, #1452, #1453, #1454, #1457, #1462, #1464) +- Added new pipelines and demos such as QwenImage diffusers support, HiDream, CannyEdit, updated accelerated pipelines, and ComfyUI root/CLI assets; added text encoder files for inference setups. (#1288, #1346, #1360, #1433, #1480, #1481) +- Extended example coverage with OmniGen2 fine-tuning, MovieGen MindSpore 2.7.0 support, and new examples for MMS, Nougat, MyT5, Granite-Vision, MatCha, DePlot, fill-mask pipe usage, Jamba, and others. (#1329, #1333, #1334, #1335, #1336, #1337, #1341, #1362, #1410) +- Added AutoencoderMixin and updated CLI/processors for Phi4, Whisper, UltraVox, InternVL, Qwen2 Audio, MiniCPMV, LLaVA Next (image/video), plus supplemental GLM4V processors. (#1349, #1429, #1471, #1444) + +### Fixed +- Resolved multiple transformer and pipeline bugs, including Qwen3-VL text attention selection, GLM4.1V batch generation index handling, Accelerate import guards, FA missing_keys warnings, Reformer UT stability, and division-by-zero in compute_diffs. (#1351, #1343, #1354, #1368, #1371, #1431, #1437, #1455) +- Stabilized UT thresholds for BF16, Seamless M4T, Emu3, Zero helper, and RT-DETR fast UT cases; updated fast unit tests for various models across MindSpore versions. (#1187, #1345, #1358, #1359, #1366, #1373, #1383) +- Fixed from_pretrained loading in OpenSora/OmniGen pipelines, added lora scale coverage in diffusers tests, and sorted auto-mapping to reduce nondeterminism. (#1187, #1370, #1353) + +### Documentation & Examples +- Refreshed READMEs and tutorials for Llama2/Llama3, SparkTTS, SAM2/LanguageSAM, MmaDA, Wan2.1, Janus-Pro, Emu3, and general repository instructions; improved performance notes for multiple demos. (#1344, #1377, #1369, #1376, #1378, #1417) +- Updated OpenSora-HPC AI, HunyuanVideo I2V, and Emu3 examples for MindSpore 2.6.0/2.7.0 compatibility and performance. (#1385, #1417) +- Added new documentation for GPT-SW3/MADLAD-400, MMS, and various model usage guides. (#1329, #1335) + +## [v0.4.0] + +For details of the previous release, see the [v0.4.0 changelog](https://github.com/mindspore-lab/mindone/blob/refs/tags/v0.4.0/CHANGELOG.md). From 842cd98e9d8913d1c1cc32e4497aa692b3cad0cf Mon Sep 17 00:00:00 2001 From: Fzilan <33061146+Fzilan@users.noreply.github.com> Date: Sun, 21 Dec 2025 15:38:33 +0800 Subject: [PATCH 2/2] Add PR links to v0.5.0 changelog --- CHANGELOG.md | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index cfef494ba9..17d2122051 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,25 +3,25 @@ ## [v0.5.0] - 2025-12-21 ### Highlights -- Added dozens of new transformer checkpoints and processors (e.g., CSM, ModernBERT Decoder, SmolLM3, Hunyuan v1, GLM4 variants, Qwen3/Qwen3-Omni, Mlcd) and refreshed the transformers base to 4.57.1, expanding support for the latest model families. (#1391, #1396, #1397, #1399, #1401, #1403, #1409, #1411, #1462, #1464, #1472) -- Expanded generative pipelines and diffusion tooling with new QwenImage, HiDream, CannyEdit, and OmniGen2 fine-tuning utilities alongside ComfyUI root assets and text encoder resources. (#1288, #1346, #1360, #1410, #1480, #1481) -- Improved interoperability and performance through AutoencoderMixin, context parallelism enhancements, accelerated pipelines, and wide-ranging compatibility fixes for MindSpore 2.6/2.7 and downstream examples. (#1433, #1438, #1444, #1383, #1385, #1417) +- Added dozens of new transformer checkpoints and processors (e.g., CSM, ModernBERT Decoder, SmolLM3, Hunyuan v1, GLM4 variants, Qwen3/Qwen3-Omni, Mlcd) and refreshed the transformers base to 4.57.1, expanding support for the latest model families. ([#1391](https://github.com/mindspore-lab/mindone/pull/1391), [#1396](https://github.com/mindspore-lab/mindone/pull/1396), [#1397](https://github.com/mindspore-lab/mindone/pull/1397), [#1399](https://github.com/mindspore-lab/mindone/pull/1399), [#1401](https://github.com/mindspore-lab/mindone/pull/1401), [#1403](https://github.com/mindspore-lab/mindone/pull/1403), [#1409](https://github.com/mindspore-lab/mindone/pull/1409), [#1411](https://github.com/mindspore-lab/mindone/pull/1411), [#1462](https://github.com/mindspore-lab/mindone/pull/1462), [#1464](https://github.com/mindspore-lab/mindone/pull/1464), [#1472](https://github.com/mindspore-lab/mindone/pull/1472)) +- Expanded generative pipelines and diffusion tooling with new QwenImage, HiDream, CannyEdit, and OmniGen2 fine-tuning utilities alongside ComfyUI root assets and text encoder resources. ([#1288](https://github.com/mindspore-lab/mindone/pull/1288), [#1346](https://github.com/mindspore-lab/mindone/pull/1346), [#1360](https://github.com/mindspore-lab/mindone/pull/1360), [#1410](https://github.com/mindspore-lab/mindone/pull/1410), [#1480](https://github.com/mindspore-lab/mindone/pull/1480), [#1481](https://github.com/mindspore-lab/mindone/pull/1481)) +- Improved interoperability and performance through AutoencoderMixin, context parallelism enhancements, accelerated pipelines, and wide-ranging compatibility fixes for MindSpore 2.6/2.7 and downstream examples. ([#1433](https://github.com/mindspore-lab/mindone/pull/1433), [#1438](https://github.com/mindspore-lab/mindone/pull/1438), [#1444](https://github.com/mindspore-lab/mindone/pull/1444), [#1383](https://github.com/mindspore-lab/mindone/pull/1383), [#1385](https://github.com/mindspore-lab/mindone/pull/1385), [#1417](https://github.com/mindspore-lab/mindone/pull/1417)) ### Added -- Introduced many new transformer models and processors, including Exaone, Florence2, Parakeet, VaultGemma, T5Gemma, DOGE, Ernie 4.5 (dense & MoE), SAM-HQ, BLT/Apertus/Ministral, Mistral3 (4.57.1), EOMT, TimesFM, XCodec, Kyutai Speech-to-Text, and more. (#1392, #1393, #1395, #1396, #1397, #1399, #1401, #1403, #1407, #1420, #1451, #1452, #1453, #1454, #1457, #1462, #1464) -- Added new pipelines and demos such as QwenImage diffusers support, HiDream, CannyEdit, updated accelerated pipelines, and ComfyUI root/CLI assets; added text encoder files for inference setups. (#1288, #1346, #1360, #1433, #1480, #1481) -- Extended example coverage with OmniGen2 fine-tuning, MovieGen MindSpore 2.7.0 support, and new examples for MMS, Nougat, MyT5, Granite-Vision, MatCha, DePlot, fill-mask pipe usage, Jamba, and others. (#1329, #1333, #1334, #1335, #1336, #1337, #1341, #1362, #1410) -- Added AutoencoderMixin and updated CLI/processors for Phi4, Whisper, UltraVox, InternVL, Qwen2 Audio, MiniCPMV, LLaVA Next (image/video), plus supplemental GLM4V processors. (#1349, #1429, #1471, #1444) +- Introduced many new transformer models and processors, including Exaone, Florence2, Parakeet, VaultGemma, T5Gemma, DOGE, Ernie 4.5 (dense & MoE), SAM-HQ, BLT/Apertus/Ministral, Mistral3 (4.57.1), EOMT, TimesFM, XCodec, Kyutai Speech-to-Text, and more. ([#1392](https://github.com/mindspore-lab/mindone/pull/1392), [#1393](https://github.com/mindspore-lab/mindone/pull/1393), [#1395](https://github.com/mindspore-lab/mindone/pull/1395), [#1396](https://github.com/mindspore-lab/mindone/pull/1396), [#1397](https://github.com/mindspore-lab/mindone/pull/1397), [#1399](https://github.com/mindspore-lab/mindone/pull/1399), [#1401](https://github.com/mindspore-lab/mindone/pull/1401), [#1403](https://github.com/mindspore-lab/mindone/pull/1403), [#1407](https://github.com/mindspore-lab/mindone/pull/1407), [#1420](https://github.com/mindspore-lab/mindone/pull/1420), [#1451](https://github.com/mindspore-lab/mindone/pull/1451), [#1452](https://github.com/mindspore-lab/mindone/pull/1452), [#1453](https://github.com/mindspore-lab/mindone/pull/1453), [#1454](https://github.com/mindspore-lab/mindone/pull/1454), [#1457](https://github.com/mindspore-lab/mindone/pull/1457), [#1462](https://github.com/mindspore-lab/mindone/pull/1462), [#1464](https://github.com/mindspore-lab/mindone/pull/1464)) +- Added new pipelines and demos such as QwenImage diffusers support, HiDream, CannyEdit, updated accelerated pipelines, and ComfyUI root/CLI assets; added text encoder files for inference setups. ([#1288](https://github.com/mindspore-lab/mindone/pull/1288), [#1346](https://github.com/mindspore-lab/mindone/pull/1346), [#1360](https://github.com/mindspore-lab/mindone/pull/1360), [#1433](https://github.com/mindspore-lab/mindone/pull/1433), [#1480](https://github.com/mindspore-lab/mindone/pull/1480), [#1481](https://github.com/mindspore-lab/mindone/pull/1481)) +- Extended example coverage with OmniGen2 fine-tuning, MovieGen MindSpore 2.7.0 support, and new examples for MMS, Nougat, MyT5, Granite-Vision, MatCha, DePlot, fill-mask pipe usage, Jamba, and others. ([#1329](https://github.com/mindspore-lab/mindone/pull/1329), [#1333](https://github.com/mindspore-lab/mindone/pull/1333), [#1334](https://github.com/mindspore-lab/mindone/pull/1334), [#1335](https://github.com/mindspore-lab/mindone/pull/1335), [#1336](https://github.com/mindspore-lab/mindone/pull/1336), [#1337](https://github.com/mindspore-lab/mindone/pull/1337), [#1341](https://github.com/mindspore-lab/mindone/pull/1341), [#1362](https://github.com/mindspore-lab/mindone/pull/1362), [#1410](https://github.com/mindspore-lab/mindone/pull/1410)) +- Added AutoencoderMixin and updated CLI/processors for Phi4, Whisper, UltraVox, InternVL, Qwen2 Audio, MiniCPMV, LLaVA Next (image/video), plus supplemental GLM4V processors. ([#1349](https://github.com/mindspore-lab/mindone/pull/1349), [#1429](https://github.com/mindspore-lab/mindone/pull/1429), [#1471](https://github.com/mindspore-lab/mindone/pull/1471), [#1444](https://github.com/mindspore-lab/mindone/pull/1444)) ### Fixed -- Resolved multiple transformer and pipeline bugs, including Qwen3-VL text attention selection, GLM4.1V batch generation index handling, Accelerate import guards, FA missing_keys warnings, Reformer UT stability, and division-by-zero in compute_diffs. (#1351, #1343, #1354, #1368, #1371, #1431, #1437, #1455) -- Stabilized UT thresholds for BF16, Seamless M4T, Emu3, Zero helper, and RT-DETR fast UT cases; updated fast unit tests for various models across MindSpore versions. (#1187, #1345, #1358, #1359, #1366, #1373, #1383) -- Fixed from_pretrained loading in OpenSora/OmniGen pipelines, added lora scale coverage in diffusers tests, and sorted auto-mapping to reduce nondeterminism. (#1187, #1370, #1353) +- Resolved multiple transformer and pipeline bugs, including Qwen3-VL text attention selection, GLM4.1V batch generation index handling, Accelerate import guards, FA missing_keys warnings, Reformer UT stability, and division-by-zero in compute_diffs. ([#1351](https://github.com/mindspore-lab/mindone/pull/1351), [#1343](https://github.com/mindspore-lab/mindone/pull/1343), [#1354](https://github.com/mindspore-lab/mindone/pull/1354), [#1368](https://github.com/mindspore-lab/mindone/pull/1368), [#1371](https://github.com/mindspore-lab/mindone/pull/1371), [#1431](https://github.com/mindspore-lab/mindone/pull/1431), [#1437](https://github.com/mindspore-lab/mindone/pull/1437), [#1455](https://github.com/mindspore-lab/mindone/pull/1455)) +- Stabilized UT thresholds for BF16, Seamless M4T, Emu3, Zero helper, and RT-DETR fast UT cases; updated fast unit tests for various models across MindSpore versions. ([#1187](https://github.com/mindspore-lab/mindone/pull/1187), [#1345](https://github.com/mindspore-lab/mindone/pull/1345), [#1358](https://github.com/mindspore-lab/mindone/pull/1358), [#1359](https://github.com/mindspore-lab/mindone/pull/1359), [#1366](https://github.com/mindspore-lab/mindone/pull/1366), [#1373](https://github.com/mindspore-lab/mindone/pull/1373), [#1383](https://github.com/mindspore-lab/mindone/pull/1383)) +- Fixed from_pretrained loading in OpenSora/OmniGen pipelines, added lora scale coverage in diffusers tests, and sorted auto-mapping to reduce nondeterminism. ([#1187](https://github.com/mindspore-lab/mindone/pull/1187), [#1370](https://github.com/mindspore-lab/mindone/pull/1370), [#1353](https://github.com/mindspore-lab/mindone/pull/1353)) ### Documentation & Examples -- Refreshed READMEs and tutorials for Llama2/Llama3, SparkTTS, SAM2/LanguageSAM, MmaDA, Wan2.1, Janus-Pro, Emu3, and general repository instructions; improved performance notes for multiple demos. (#1344, #1377, #1369, #1376, #1378, #1417) -- Updated OpenSora-HPC AI, HunyuanVideo I2V, and Emu3 examples for MindSpore 2.6.0/2.7.0 compatibility and performance. (#1385, #1417) -- Added new documentation for GPT-SW3/MADLAD-400, MMS, and various model usage guides. (#1329, #1335) +- Refreshed READMEs and tutorials for Llama2/Llama3, SparkTTS, SAM2/LanguageSAM, MmaDA, Wan2.1, Janus-Pro, Emu3, and general repository instructions; improved performance notes for multiple demos. ([#1344](https://github.com/mindspore-lab/mindone/pull/1344), [#1377](https://github.com/mindspore-lab/mindone/pull/1377), [#1369](https://github.com/mindspore-lab/mindone/pull/1369), [#1376](https://github.com/mindspore-lab/mindone/pull/1376), [#1378](https://github.com/mindspore-lab/mindone/pull/1378), [#1417](https://github.com/mindspore-lab/mindone/pull/1417)) +- Updated OpenSora-HPC AI, HunyuanVideo I2V, and Emu3 examples for MindSpore 2.6.0/2.7.0 compatibility and performance. ([#1385](https://github.com/mindspore-lab/mindone/pull/1385), [#1417](https://github.com/mindspore-lab/mindone/pull/1417)) +- Added new documentation for GPT-SW3/MADLAD-400, MMS, and various model usage guides. ([#1329](https://github.com/mindspore-lab/mindone/pull/1329), [#1335](https://github.com/mindspore-lab/mindone/pull/1335)) ## [v0.4.0]