Release Note

Hi all, torch_musa v2.9.0 is now available. Along with torch2.9.0, we enchanced user experiences and support bunch of new features. This release supports Context Parallel in FSDP2, sparse-related operators and "reduce-overhead" mode for torch.compile. Since torch_musa 2.9.0, GEMM kernels are computed in FP32 by default, user can set environment variable TORCH_ALLOW_TF32_MUBLAS_OVERRIDE=1 or python global setting 'torch.backends.musa.matmul.allow_tf32 = True' to enable TF32 computation.

We also made kineto as a third_party repository of torch_musa, and this is not the official one but a musified one.

Build torch_musa v2.9.0 on MUSA platform with MUSA SDK>= 4.3.2 please.

EnhanceMent

Operators

Support torch.arange with Double dtype
Fix BatchNorm outputs NaN
Optimize performance of embedding_bag
Support complex dtypes for index_select, index_put
Support some Sparse Tensor operators
Support some special operators.
Fix empty tensor creation error with pin_memory=True
Add W8A8 matmul kernel

New Features

Support torch.compile wth mode="reduce-overhead"
Support Context Parallel (Ulysses) in FSDP2
Support DLPack for torch.tensor to enable zero-copy when interacted with other library

Known && blocked issues

torch.compile generated kernel's performance worse than torch_musa v2.7.0

Please feel free to contact us with any issues or questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch_musa Release v2.9.0

Choose a tag to compare

Sorry, something went wrong.