Releases: intel/intel-optimization-for-horovod
v0.28.1.6
Intel® Optimization for Horovod* v0.28.1.6 Release Notes
Intel® Optimization for Horovod* is the Intel optimized distributed training framework to extend the official Horovod (based on v0.28.1) which aims at providing distributed ability to run TensorFlow workloads on Intel GPU clusters. This release contains the following major features:
- Supports Intel® oneAPI Base Toolkit 2025.0.1.
- Supports TensorFlow 2.15.1 and Intel® Extension for TensorFlow* v2.15.0.2.
- Turns TensorFlow NextPluggableDevice mode by default for Intel Device.
- Supports both scale-up and scale-out on the Intel® Data Center Max GPU cluster
- Fixes potential overflow of displacement arrays for large number of ranks and msg sizes
v0.28.1.5
Intel® Optimization for Horovod* v0.28.1.5 Release Notes
Major Features and Improvements
Intel® Optimization for Horovod* is the Intel optimized distributed training framework to extend the official Horovod (based on v0.28.1) which aims at providing distributed ability to run TensorFlow workloads on Intel GPU clusters. This release contains the following major features:
- Supports Intel® oneAPI Base Toolkit 2024.2.1.
- Supports TensorFlow 2.15.1 and Intel® Extension for TensorFlow* v2.15.0.1.
- Turns TensorFlow NextPluggableDevice mode by default for Intel Device.
- Updates usage of inplace ccl:reduce_scatter for better performance.
- Supports both scale-up and scale-out on the Intel® Data Center Max GPU clusters.
v0.28.1.4
Intel® Optimization for Horovod* v0.28.1.4 Release Notes
Major Features and Improvements
Intel® Optimization for Horovod* is the Intel optimized distributed training framework to extend the official Horovod (based on v0.28.1) which aims at providing distributed ability to run TensorFlow workloads on Intel GPU clusters. This release contains the following major features:
- Supports Intel® oneAPI Base Toolkit 2024.1.
- Supports TensorFlow 2.15 and Intel® Extension for TensorFlow* v2.15.0.0.
- Integrates the TensorFlow NextPluggableDevice as a new device type and implements XLA Horovod OPs on the Intel GPU backend to the OpenXLA ecosystem.
- Supports both scale-up and scale-out on the Intel® Data Center Max GPU clusters.
Known Issues
- Scale-out has hang issue due to OneCCL's bug in Intel® oneAPI Base Toolkit 2024.1. Please use Intel® oneAPI Base Toolkit 2024.0 when running scale-out tasks.
Intel® Optimization for Horovod* 0.28.1.2
Major Features and Improvements
Intel® Optimization for Horovod* is Intel optimized distributed training framework to extend official Horovod (based on v0.28.1) which aims at providing distributed ability to run TensorFlow workloads in Intel GPU cluster. This release contains following major features:
- Supported
TorusAllreduce
operation for cross nodeAllReduce
. This collective operation development is based on oneAPI Collective Communications Library (oneCCL) to do inter-GPU communication primitives that are topology-aware and provide accelerated inter-GPU communication. - Supported TensorFlow 2.14.0 and Intel® Extension for TensorFlow* v2.14.0.0 in Intel® Optimization for Horovod*.
- Supported scale-up and scale-out in Intel® Data Center Max GPU cluster.
Documentations to get started
Intel® Optimization for Horovod* 0.28.1.0
Major Features and Improvements
Intel® Optimization for Horovod* is Intel optimized distributed training framework to extend official Horovod (based on v0.28.1) which aims at providing distributed ability to run TensorFlow workloads in Intel GPU cluster. This release contains following major features:
- Rebased Intel® Optimization for Horovod* to latest stock v0.28.1 Horovod. The main changes in this rebase includes:
- Horovod API compatibility issues with tf.keras 2.11 are fixed. Now, HVD wrapper for keras optimizer can work correctly without
legacy
limitation. - Support new implemented methodology in
reducescatter
and enable batch memory copy forallgather
/reducescatter
.
- Horovod API compatibility issues with tf.keras 2.11 are fixed. Now, HVD wrapper for keras optimizer can work correctly without
- Refined Intel ® Optimization for Horovod* version to four digits version format v0.28.1.0 by three digits from stock Horovod v0.28.1 and the last one initiated from 0 and increased. It will make it easier for users to understand Intel ® Optimization for Horovod* and stock Horovod version mapping relationship.
- Supported TensorFlow 2.13.0 and Intel® Extension for TensorFlow* v2.13.0.0 in Intel® Optimization for Horovod*.
- Supported scale-up and scale-out in Intel® Data Center Max GPU cluster.
Documentations to get started
Intel® Optimization for Horovod* 0.5.0
Major Features and Improvements
Intel® Optimization for Horovod* is Intel optimized distributed training framework to extend official Horovod (based on v0.26.1) which aims at providing distributed ability to run TensorFlow workloads in Intel GPU cluster. This release contains following major features:
- Enabled
All2All
/(grouped)AllGather
/(grouped)ReduceScatter
/BroadcastInplace(Resource)
operations for TensorFlow in Intel® Data Center Max GPU cluster. Those collective operation development is based on oneAPI Collective Communications Library (oneCCL) to do inter-GPU communication primitives that are topology-aware and provide accelerated inter-GPU communication. - Switched
CXX
compiler fromdpcpp
toicpx
. New source code build command in how to build. - Supported TF2.12 and Intel® Extension for TensorFlow* v1.2 in Intel® Optimization for Horovod*.
- Supported scale-up and scale-out in Intel® Data Center Max GPU cluster.
Documentations to get started
Intel® Optimization for Horovod* 0.4.0
Major Features
Intel® Optimization for Horovod* is Intel optimized distributed training framework to extend official Horovod, which aims at providing distributed ability to run TensorFlow and PyTorch workloads in Intel GPU cluster. It's developed based on public Horovod latest release v0.26.1.
This release contains following major features:
- Enabled
AllReduce/GroupedAllreduce/BroadCast
operations for TensorFlow and PyTorch in Intel® Data Center Max GPU Series cluster. Those collective operation development is based on Intel® oneAPI Collective Communications Library (oneCCL) to do inter-GPU communication primitives that are topology-aware and provide accelerated inter-GPU communication. - Supported scale-up and scale-out in Intel® Data Center Max GPU Series cluster.
- Co-worked with Intel® Extension for TensorFlow* v1.1 and Intel® Extension for PyTorch* v1.13.10+xpu.