21 Dec 00:46

mrwyattii

c00388a

v0.12.6: Patch release

What's Changed

Update version.txt after 0.12.5 release by @mrwyattii in #4826
Cache metadata for TP activations and grads by @BacharL in #4360
Inference changes for incorporating meta loading checkpoint by @oelayan7 in #4692
Update CODEOWNERS by @mrwyattii in #4838
support baichuan model: by @baodii in #4721
inference engine: check if accelerator supports FP16 by @nelyahu in #4832
Update zeropp.md by @goodship1 in #4835
[NPU] load EXPORT_ENV based on different accelerators to support multi-node training on other devices by @minchao-sun in #4830
Add cuda_accelerator.py to triggers for A6000 test by @mrwyattii in #4848
Capture short kernel sequences to graph by @inkcherry in #4318
Checkpointing: Avoid assigning tensor storage with different device by @deepcharm in #4836
engine.py: remove unused _curr_save_path by @nelyahu in #4844
Mixtral FastGen Support by @cmikeh2 in #4828

New Contributors

@minchao-sun made their first contribution in #4830

Full Changelog: v0.12.5...v0.12.6

Contributors

cmikeh2, goodship1, and 8 other contributors

Assets 2

16 Dec 01:00

mrwyattii

v0.12.5

65b7727

v0.12.5: Patch release

What's Changed

Fix DS Stable Diffusion for latest diffusers version by @lekurile in #4770
Resolve any '..' in the file paths using os.path.abspath() by @rraminen in #4709
Update dockerfile with updated versions by @loadams in #4780
Run workflows when they are edited by @loadams in #4779
BF16_Optimizer: add support for bf16 grad acc by @nelyahu in #4713
fix autoTP issue for mpt (trust_remote_code=True) by @sywangyi in #4787
Fix Hybrid Engine metrics printing by @lekurile in #4789
[BUG] partition_balanced return wrong result. by @zjjMaiMai in #4312
improve the way to determine whether a variable is None by @RUAN-ZX in #4782
[NPU] Add HcclBackend for 1-bit adam, 1-bit lamb, 0/1 adam by @RUAN-ZX in #4733
Fix for stage3 when setting different communication data type by @BacharL in #4540
Add support of Falcon models (7b, 40b, 180b) to DeepSpeed-FastGen by @arashb in #4790
Switch paths-ignore to single quotes, update paths-ignore on nv-pre-compile-ops by @loadams in #4805
fix for tests using torch<2.1 by @mrwyattii in #4818
Universal Checkpoint for Sequence Parallelism by @samadejacobs in #4752
Accelerate CI fix by @mrwyattii in #4819
fix [BUG] 'DeepSpeedGPTInference' object has no attribute 'dtype' for… by @jxysoft in #4814
Update broken link in docs by @mrwyattii in #4822
Update imports from Transformers by @loadams in #4817
Minor updates to CI workflows by @mrwyattii in #4823
fix falcon model load from_config meta_data error by @baodii in #4783
mv DeepSpeedEngine param_names dict init post _configure_distributed_model by @nelyahu in #4803
Refactor launcher user arg parsing by @mrwyattii in #4824
Fix 4649 by @Alienfeel in #4650

New Contributors

@zjjMaiMai made their first contribution in #4312
@jxysoft made their first contribution in #4814
@baodii made their first contribution in #4783
@Alienfeel made their first contribution in #4650

Full Changelog: v0.12.4...v0.12.5

Contributors

arashb, jxysoft, and 12 other contributors

Assets 2

01 Dec 19:32

mrwyattii

v0.12.4

7122362

v0.12.4: Patch release

What's Changed

Update version.txt after 0.12.3 release by @mrwyattii in #4673
[MII] catch error wrt HF version and Mistral by @jeffra in #4634
[NPU] Add NPU support for unit test by @RUAN-ZX in #4569
[op-builder] use unique exceptions for cuda issues by @jeffra in #4653
Add stable diffusion unit test by @mrwyattii in #2496
[CANN] Support cpu offload optimizer for Ascend NPU by @hipudding in #4568
Inference Checkpoints in V2 by @cmikeh2 in #4664
KV Cache Improved Flexibility by @cmikeh2 in #4668
Fix for when prompt contains an odd num of apostrophes by @oelayan7 in #4660
universal-ckp: support megatron-deepspeed llama model by @mosheisland in #4666
Add new MII unit tests by @mrwyattii in #4693
[Bug fix] WarmupCosineLR issues by @sbwww in #4688
infV2 fix for OPT size variants by @mrwyattii in #4694
Add get and set APIs for the ZeRO-3 partitioned parameters by @yiliu30 in #4681
Remove unneeded dict reinit (fix for #4565) by @eisene in #4702
Update flops profiler to recurse by @loadams in #4374
Communication Optimization for Large-Scale Training by @RezaYazdaniAminabadi in #4695
[docs] Intel inference blog by @jeffra in #4734
use all_gather_into_tensor instead of all_gather by @taozhiwei in #4705
Install deepspeed-kernels only on Linux by @aphedges in #4739
Add nv-sd badge to README by @loadams in #4747
Re-organize .gitignore file to be parsed properly by @aphedges in #4740
fix mics run with offload++ by @GuanhuaWang in #4749
Fix logger formatting for partitioning flags by @OAfzal in #4728
fix: to solve #4726 by @RUAN-ZX in #4727
Add safetensors support by @jihnenglin in #4659

New Contributors

@RUAN-ZX made their first contribution in #4569
@oelayan7 made their first contribution in #4660
@sbwww made their first contribution in #4688
@yiliu30 made their first contribution in #4681
@eisene made their first contribution in #4702
@taozhiwei made their first contribution in #4705
@OAfzal made their first contribution in #4728
@jihnenglin made their first contribution in #4659

Full Changelog: v0.12.3...v0.12.4

Contributors

jeffra, cmikeh2, and 15 other contributors

Assets 2

13 Nov 17:51

lekurile

v0.12.3

6ea44d0

v0.12.3: Patch release

New Bug Fixes

Stable Diffusion now supported with latest Torch, diffusers, and Triton versions.

What's Changed

Update version.txt after 0.12.2 release by @mrwyattii in #4617
Fix figure in FlexGen blog by @tohtana in #4624
Fix figure of llama2 13B in DS-FlexGen blog by @tohtana in #4625
Fix config format by @xu-song in #4594
Guanhua/partial offload rebase v2 (#590) by @GuanhuaWang in #4636
offload++ blog (#623) by @GuanhuaWang in #4637
Update README in offloadpp blog by @GuanhuaWang in #4641
[docs] update news items by @jeffra in #4640
DeepSpeed-FastGen Chinese Blog by @HeyangQin in #4642
Fix issues with torch cpu builds by @loadams in #4639
Isolate src code and testing for DeepSpeed-FastGen by @cmikeh2 in #4610
Add Japanese blog for DeepSpeed-FastGen by @tohtana in #4651
Fix for MII unit tests by @mrwyattii in #4652
Enhance the robustness of module_state_dict by @LZHgrla in #4587
Enable ZeRO3 allgather for multiple dtypes by @tohtana in #4647
add option to disable pipeline partitioning by @nelyahu in #4322
Added HIP_PLATFORM_AMD=1 for non JIT build by @rraminen in #4585
Fix rope_theta arg for diffusers_attention by @lekurile in #4656
tl.dot(a,b, trans_b=True) is not supported by triton2.0+ , updating this api by @bmedishe in #4541
Update ds-chat workflow to work w/ deepspeed-chat install by @lekurile in #4598
Diffusers attention script update triton2.1 by @bmedishe in #4573
Fix the openfold training. by @cctry in #4657
Universal ckp fixes by @mosheisland in #4588
Update .gitignore [Adding comments , Improved documentation] by @Nadav23AnT in #4631
Update lr_schedules.py by @CoinCheung in #4563
Fix UNET and VAE implementations for new diffusers version by @lekurile in #4663
fix num_kv_heads sharding in autoTP for the new in-repo Falcon-40B by @dc3671 in #4654

New Contributors

@xu-song made their first contribution in #4594
@LZHgrla made their first contribution in #4587
@mosheisland made their first contribution in #4588
@Nadav23AnT made their first contribution in #4631
@CoinCheung made their first contribution in #4563

Full Changelog: v0.12.2...v0.12.3

Contributors

jeffra, cmikeh2, and 16 other contributors

Assets 2

04 Nov 04:08

jeffra

v0.12.2

4f7dd72

v0.12.2

What's Changed

Quick bug fix direct to master to ensure mismatched cuda environments are shown to the user 4f7dd72
Update version.txt after 0.12.1 release by @mrwyattii in #4615

Full Changelog: v0.12.1...v0.12.2

Contributors

mrwyattii

Assets 2

04 Nov 03:24

jeffra

v0.12.1

3437a5b

v0.12.1: Patch release

What's Changed

Update version.txt after 0.12.0 release by @mrwyattii in #4611
Add number for latency comparison by @tohtana in #4612
Update minor CUDA version compatibility. by @cmikeh2 in #4613

Full Changelog: v0.12.0...v0.12.1

Contributors

cmikeh2, mrwyattii, and tohtana

Assets 2

03 Nov 22:36

jeffra

v0.12.0

09834bb

DeepSpeed v0.12.0

New features

DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference

What's Changed

Update version.txt after 0.11.2 release by @mrwyattii in #4609
Pin transformers in nv-inference by @loadams in #4606
DeepSpeed-FastGen by @cmikeh2 in #4604
DeepSpeed-FastGen blog by @jeffra in #4607

Full Changelog: v0.11.2...v0.12.0

Contributors

jeffra, cmikeh2, and 2 other contributors

Assets 2

03 Nov 20:29

jeffra

v0.11.2

f060407

v0.11.2: Patch release

What's Changed

Update version.txt after 0.11.1 release by @mrwyattii in #4484
Update DS_BUILD_* references. by @loadams in #4485
Introduce pydantic_v1 compatibility module for pydantic>=2.0.0 support by @ringohoffman in #4407
Enable control over timeout with environment variable by @BramVanroy in #4405
Update ROCm verison by @loadams in #4486
adding 8bit dequantization kernel for asym fine-grained block quantization in zero-inference by @stephen-youn in #4450
Fix scale factor on flops profiler by @loadams in #4500
add DeepSpeed4Science white paper by @conglongli in #4502
[CCLBackend] update API by @Liangliang-Ma in #4378
Ulysses: add col-ai evaluation by @samadejacobs in #4517
Ulysses: Update README.md by @samadejacobs in #4518
add available memory check to accelerators by @jeffra in #4508
clear redundant parameters in zero3 bwd hook by @inkcherry in #4520
Add NPU FusedAdam support by @CurryRice233 in #4343
fix error type issue in deepspeed/comm/ccl.py by @Liangliang-Ma in #4521
Fixed deepspeed.comm.monitored_barrier call by @Quentin-Anthony in #4496
[Bug fix] Add rope_theta for llama config by @cupertank in #4480
[ROCm] Add rocblas header by @rraminen in #4538
[docs] ZeRO infinity slides and blog by @jeffra in #4542
Switch from HIP_PLATFORM_HCC to HIP_PLATFORM_AMD by @loadams in #4539
Turn off I_MPI_PIN for impi launcher by @delock in #4531
[docs] paper updates by @jeffra in #4543
ROCm 6.0 prep changes by @loadams in #4537
Fix RTD builds by @mrwyattii in #4558
pipe engine _aggregate_total_loss: more efficient loss concatenation by @nelyahu in #4327
Add missing rocblas include by @loadams in #4557
Enable universal checkpoint for zero stage 1 by @tjruwase in #4516
[AutoTP] Make AutoTP work when num_heads not divisible by number of workers by @delock in #4011
Fix the sequence-parallelism for the dense model architecture by @RezaYazdaniAminabadi in #4530
engine.py - save_checkpoint: only rank-0 should create the save dir by @nelyahu in #4536
Remove PP Grad Tail Check by @Quentin-Anthony in #2538
Added HIP_PLATFORM_AMD=1 by @rraminen in #4570
fix multiple definition while building evoformer by @fecet in #4556
Don't check overflow for bf16 data type by @hablb in #4512
Public update by @yaozhewei in #4583
[docs] paper updates by @jeffra in #4584
Disable CPU inference on PRs by @loadams in #4590

New Contributors

@ringohoffman made their first contribution in #4407
@BramVanroy made their first contribution in #4405
@cupertank made their first contribution in #4480

Full Changelog: v0.11.1...v0.11.2

Contributors

jeffra, BramVanroy, and 19 other contributors

Assets 2

09 Oct 16:54

mrwyattii

v0.11.1

e9503fe

v0.11.1: Patch release

What's Changed

Fix bug in bfloat16 optimizer related to checkpointing by @okoge-kaz in #4434
Move tensors to device if mp is not enabled by @deepcharm in #4461
Fix torch import causing release build failure by @mrwyattii in #4468
add lm_head and embed_out tensor parallel by @Yejing-Lai in #3962
Fix release workflow by @mrwyattii in #4483

New Contributors

@okoge-kaz made their first contribution in #4434
@deepcharm made their first contribution in #4461

Full Changelog: v0.11.0...v0.11.1

Contributors

mrwyattii, Yejing-Lai, and 2 other contributors

Assets 2

06 Oct 22:00

mrwyattii

v0.11.0

26d0dd9

DeepSpeed v0.11.0

New features

DeepSpeed-VisualChat: Improve Your Chat Experience with Multi-Round Multi-Image Inputs [English] [中文] [日本語]
Announcing the DeepSpeed4Science Initiative: Enabling large-scale scientific discovery through sophisticated AI system technologies [DeepSpeed4Science website] [Tutorials] [Blog] [中文] [日本語]

What's Changed

added a model check for use_triton in deepspeed by @stephen-youn in #4266
Update release and bump patch versioning flow by @loadams in #4286
README update by @tjruwase in #4303
Update README.md by @NinoRisteski in #4316
Handle empty parameter groups by @tjruwase in #4277
Clean up modeling code by @loadams in #4320
Fix Zero3 contiguous grads, reduce scatter false accuracy issue by @nelyahu in #4321
Add release version checking by @loadams in #4328
clear redundant timers by @starkhu in #4308
DS-Chat BLOOM: Fix Attention mask by @lekurile in #4338
Fix a bug in the implementation of dequantization for inference by @sakogan in #3433
Suppress noise by @tjruwase in #4310
Fix skipped inference tests by @mrwyattii in #4336
Fix autotune to support Triton 2.1 by @stephen-youn in #4340
Pass base_dir to model files can be loaded for auto-tp/meta-tensor. by @awan-10 in #4348
Support InternLM by @wangruohui in #4137
DeepSpeed4Science by @conglongli in #4357
fix deepspeed4science links by @conglongli in #4358
Add the policy to run llama model from the official repo by @RezaYazdaniAminabadi in #4313
Check inference input_id tokens length by @mrwyattii in #4349
add deepspeed4science blog link by @conglongli in #4364
Update conda env to have max pydantic version by @loadams in #4362
Enable workflow dispatch on Torch 1.10 CI tests by @loadams in #4361
deepspeed4science chinese blog by @conglongli in #4366
deepspeed4science japanese blog by @conglongli in #4369
Openfold fix by @cctry in #4368
[BUG] add the missing method to MPS accelerator by @cli99 in #4363
Fix multinode runner to properly append to PDSH_SSH_ARGS_APPEND by @loadams in #4373
Fix min torch version by @tjruwase in #4375
Fix llama meta tensor loading in AutoTP and kernel injected inference by @zeyugao in #3608
adds triton flash attention2 kernel by @stephen-youn in #4337
Allow multiple inference engines in single script by @mrwyattii in #4384
Save/restore step in param groups with zero 1 or 2 by @tohtana in #4396
Fix incorrect assignment of self.quantized_nontrainable_weights by @VeryLazyBoy in #4399
update deepspeed4science blog by @conglongli in #4408
Add torch no grad condition by @ajindal1 in #4391
Update nv-transformers workflow to use cu11.6 by @loadams in #4412
Add condition when dimension is greater than 2 by @ajindal1 in #4390
[CPU] Add CPU AutoTP UT. by @Yejing-Lai in #4263
fix cpu loading model partition OOM by @Yejing-Lai in #4353
Update cpu_inference checkout action by @loadams in #4424
Zero infinity xpu support by @Liangliang-Ma in #4130
[CCLBackend] Using parallel memcpy for inference_all_reduce by @delock in #4404
Change default set_to_none=true in zero_grad methods by @Jackmin801 in #4438
Small docstring fix by @Jackmin801 in #4431
fix: check-license by @Jackmin801 in #4432
Fixup check release version script by @loadams in #4413
Enable ad-hoc running of cpu_inference by @loadams in #4444
Fix wrong documentation of ignore_unused_parameters by @UniverseFly in #4418
DeepSpeed-VisualChat Blog by @xiaoxiawu-microsoft in #4446
Fix a bug in DeepSpeedMLP by @sakogan in #4389
documenting load_from_fp32_weights config parameter by @clumsy in #4449
Add Japanese translation of DS-VisualChat blog by @tohtana in #4454
fix blog format by @conglongli in #4456
Update README-Japanese.md by @conglongli in #4457
DeepSpeed-VisualChat Chinese blog by @conglongli in #4458
CI fix for torch 2.1 release by @mrwyattii in #4452
fix lm head overriden issue, move it from checkpoint in-loop loading … by @sywangyi in #4206
feat: add Lion by @enneamer in #4331
pipe engine eval_batch: add option to disable loss broadcast by @nelyahu in #4326
Add release flow by @loadams in #4467

New Contributors

@nelyahu made their first contribution in #4321
@starkhu made their first contribution in #4308
@sakogan made their first contribution in #3433
@cctry made their first contribution in #4368
@zeyugao made their first contribution in #3608
@VeryLazyBoy made their first contribution in #4399
@ajindal1 made their first contribution in #4391
@Liangliang-Ma made their first contribution in #4130
@Jackmin801 made their first contribution in #4438
@UniverseFly made their first contribution in #4418
@enneamer made their first contribution in #4331

Full Changelog: v0.10.3...v0.11.0

Contributors

clumsy, conglongli, and 26 other contributors

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

New Bug Fixes

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

New features

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

New features

What's Changed

New Contributors

Contributors

Releases: microsoft/DeepSpeed

v0.12.6: Patch release

What's Changed

New Contributors

Contributors

v0.12.5: Patch release

What's Changed

New Contributors

Contributors

v0.12.4: Patch release

What's Changed

New Contributors

Contributors

v0.12.3: Patch release

New Bug Fixes

What's Changed

New Contributors

Contributors

v0.12.2

What's Changed

Contributors

v0.12.1: Patch release

What's Changed

Contributors

DeepSpeed v0.12.0

New features

What's Changed

Contributors

v0.11.2: Patch release

What's Changed

New Contributors

Contributors

v0.11.1: Patch release

What's Changed

New Contributors

Contributors

DeepSpeed v0.11.0

New features

What's Changed

New Contributors

Contributors