v0.14.4 Patch release
What's Changed
- Update version.txt after 0.14.3 release by @mrwyattii in #5651
- [CPU] SHM based allreduce improvement for small message size by @delock in #5571
- _exec_forward_pass: place zeros(1) on the same device as the param by @nelyahu in #5576
- [XPU] adapt lazy_call func to different versions by @YizhouZ in #5670
- fix IDEX dependence in xpu accelerator by @Liangliang-Ma in #5666
- Remove compile wrapper to simplify access to model attributes by @tohtana in #5581
- Fix hpZ with zero element by @samadejacobs in #5652
- Fixing the reshape bug in sequence parallel alltoall, which corrupted all QKV data by @YJHMITWEB in #5664
- enable yuan autotp & add conv tp by @Yejing-Lai in #5428
- Fix latest pytorch '_get_socket_with_port' import error by @Yejing-Lai in #5654
- Fix numpy upgrade to 2.0.0 BUFSIZE import error by @Yejing-Lai in #5680
- Update BUFSIZE to come from autotuner's constants.py, not numpy by @loadams in #5686
- [XPU] support op builder from intel_extension_for_pytorch kernel path by @YizhouZ in #5425
New Contributors
- @YJHMITWEB made their first contribution in #5664
Full Changelog: v0.14.3...v0.14.4