Skip to content

torch_musa Release v2.9.0

Latest

Choose a tag to compare

@fmo-mt fmo-mt released this 17 Mar 06:52
1dc7872

Release Note

Hi all, torch_musa v2.9.0 is now available. Along with torch2.9.0, we enchanced user experiences and support bunch of new features. This release supports Context Parallel in FSDP2, sparse-related operators and "reduce-overhead" mode for torch.compile. Since torch_musa 2.9.0, GEMM kernels are computed in FP32 by default, user can set environment variable TORCH_ALLOW_TF32_MUBLAS_OVERRIDE=1 or python global setting 'torch.backends.musa.matmul.allow_tf32 = True' to enable TF32 computation.

We also made kineto as a third_party repository of torch_musa, and this is not the official one but a musified one.

Build torch_musa v2.9.0 on MUSA platform with MUSA SDK>= 4.3.2 please.

EnhanceMent

Operators

  • Support torch.arange with Double dtype
  • Fix BatchNorm outputs NaN
  • Optimize performance of embedding_bag
  • Support complex dtypes for index_select, index_put
  • Support some Sparse Tensor operators
  • Support some special operators.
  • Fix empty tensor creation error with pin_memory=True
  • Add W8A8 matmul kernel

New Features

  • Support torch.compile wth mode="reduce-overhead"
  • Support Context Parallel (Ulysses) in FSDP2
  • Support DLPack for torch.tensor to enable zero-copy when interacted with other library

Known && blocked issues

  • torch.compile generated kernel's performance worse than torch_musa v2.7.0

Please feel free to contact us with any issues or questions.