Releases: intel/torch-ccl
OneCCL Bindings for Pytorch* v2.8.0+xpu release note
This release support PyTorch* v2.8.0 and Intel® Extension for PyTorch* v2.8.10+xpu for distributed scenarios on the Intel® GPU platform.
Intel® Extension for PyTorch* v2.8.10+xpu has adopted the PyTorch* XCCL backend for distributed usages. We observed that the scaling performance using PyTorch* XCCL is on par with OneCCL Bindings for PyTorch* (torch-ccl) for validated AI workloads. As a result, we will discontinue active development of torch-ccl immediately after the 2.8 release.
OneCCL Bindings for Pytorch* v2.7.0+xpu release note
Support PyTorch 2.7.0 and IPEX 2.7.10+xpu.
OneCCL Bindings for Pytorch* v2.6.0+xpu release note
Use oneCCL 2021.14.
Support PyTorch 2.6.0 and IPEX 2.6.10+xpu.
OneCCL Bindings for Pytorch* v2.5.0+xpu release note
- Update oneCCL to 2021.14
- Align to the PyTorch 2.5.1
OneCCL Bindings for Pytorch* v2.3.100+xpu release note
Align to the PyTorch 2.3.1 and IPEX 2.3.110+xpu.
oneCCL Bindings for Pytorch* v2.1.400+xpu release note
Uplift oneCCL to 2021.13.1 release version in v2.1.400+xpu release.
Intel® oneCCL Bindings for Pytorch* v2.1.300+xpu release note
Features include:
- Extend a prototype feature enabled by
TORCH_LLM_ALLREDUCE=1to provide better scale-up performance by enabling optimized collectives such asallreduce,allgather,reducescatteralgorithms in Intel® oneCCL. This feature requires XeLink enabled for cross-cards communication. - Enable a set of coalesced primitives in CCL backend, including
allreduce_into_tensor_coalesced,allgather_into_tensor_coalesced,reduce_scatter_tensor_coalescedand_broadcast_coalesced.
oneCCL Bindings for Pytorch* v2.1.200+xpu release note
Features include:
- Uplift oneCCL to 2021.12 release version.
- LLM inference scaling optimization based on oneCCL (Prototype)
- Export CCL_SKIP_SCHEDULER=1 to enable the optimization
oneCCL Bindings for Pytorch* v2.2.0+cpu release note
- Update oneCCL to 2021.9
- Align to the PyTorch 2.2.0
oneCCL Bindings for Pytorch* v2.1.100+xpu release note
Features include:
-
Add experimental allreduce implementation to provide better allreduce performance especially in LLM inference. This experimental feature can be enabled by TORCH_LLM_ALLREDUCE=1 and can speed up single node performance with up to 4 Intel® Data Center GPU Max 1550 cards (requires XeLink enabled for cross-cards communication).
-
Uplift oneCCL to 2021.11.1 release version.
-
Uplift supported PyTorch version to 2.1.0.