Releases: flagos-ai/TransformerEngine-FL
v0.1.0+te2.9.0
TransformerEngine-FL Release Notes
Highlights
This release establishes the multi-backend plugin architecture as the foundation of TE-FL and adds broad hardware vendor support, bringing FP8 training/inference to six new AI chip platforms beyond NVIDIA.
Core Architecture
- Multi-Backend Architecture Implementation (#4) — Introduced the plugin-based operator dispatch system (
OpRegistry,OpManager,SelectionPolicy) with three backend tiers: FlagOS (default/Triton), Reference (pure PyTorch), and Vendor (hardware-specific). This is the foundational change enabling multi-vendor support. - Add missing
__init__.pyfiles and policy test suite (#9) — Added comprehensive test coverage for the selection policy system (726-line test suite).
New Vendor Backends
Five new hardware vendor backends were added to the plugin system:
| Vendor | PR | Description |
|---|---|---|
| Hygon | #15 | DCU accelerator support with full op registration |
| METAX | #21, #31 | GPU support with attention backend and flash attention |
| KunlunXin | #27, #29, #30 | Baidu Kunlun chip support with flash attention, plus registration and availability fixes |
| Iluvatar | #35 | Iluvatar Corex GPU support with full op set |
| MUSA | #42 | Moore Threads S-series GPU support (2,733 lines) |
FlagOS Backend Enhancements
- Scaled masked softmax forward/backward (#52) — Added softmax kernel implementations for the FlagOS backend.
- Refactor optimizer implementations and improve multi_tensor ops (#36) — Rewrote FlagOS fused Adam and multi-tensor operations for correctness and maintainability.
- FlagGems context cleanup (#18, #20, #22) — Unified and then simplified
flag_gemsinvocation, removing theuse_gemscontext manager in favor of direct calls.
Attention System
- Register
get_attention_backendfor all backends and fix FlashAttention fallback (#14) — Centralized attention backend selection with proper fallback chain. - Fix flash-attention fallback failures (#7) — Resolved errors when falling back between attention implementations.
- Fix torch SDPA backend multi-batch support (#17) — Fixed
ValueErrorin the reference SDPA backend whenbatch_size > 1.
Operator & Optimizer Improvements
- Add
multi_tensor_adam_param_remainderand context parallel support (#23) — Extended optimizer ops and added context parallelism support for distributed attention. - Fix enum mismatch in plugins (#25) — Aligned Python enum definitions with C++ header constants.
- Fix parameter mismatch between TE_FL and NVTE functions (#34) — Resolved signature inconsistencies in the PyTorch frontend ops.
Platform Compatibility
- Python-level patches for multi-platform support (#49) — 42 files patched to ensure PyTorch frontend works across different hardware platforms (device-agnostic APIs, conditional CUDA imports, etc.).
- Fix NV shared lib bug (#16) — Fixed shared library loading when CUDA is unavailable.
- Fix import bugs (#6) — Resolved import errors in plugin attention backends.
CI/CD & Quality
- Add workflows to validate TE QA test cases (#41) — Introduced CI workflows and expanded test coverage (85 files, 7,658 insertions), including softmax test improvements.
Documentation
- Polish README (#11) — Updated project README.
Contributors
Initial Preview
🚀 Release Notes: v2.9.0+fl.0.1.0
⚠️ ALPHA RELEASE - UNSTABLE
This is the first public release of the project. It is currently in an unstable alpha state. Expect bugs, breaking changes, and incomplete features. Use in production environments at your own risk.
🌟 What's New
This release marks the initial foundation of the project. We've focused on establishing the core architecture and basic functionality:
🛠 Known Issues
Since this is a "first-light" build, please be aware of the following:
- Stability: Unexpected crashes may occur under heavy load.
- Incomplete Features: Some features are visible but not yet functional.
- Documentation: README and API docs are still a work in progress.
🧪 Feedback Wanted
Help us make the stable release better! If you encounter a bug or have a suggestion:
- Check the [Issues Tab] to see if it's already known.
- If not, please open a new issue with the label bug or feedback.