Skip to content

Releases: flagos-ai/TransformerEngine-FL

v0.1.0+te2.9.0

25 Mar 14:54

Choose a tag to compare

TransformerEngine-FL Release Notes

Highlights

This release establishes the multi-backend plugin architecture as the foundation of TE-FL and adds broad hardware vendor support, bringing FP8 training/inference to six new AI chip platforms beyond NVIDIA.


Core Architecture

  • Multi-Backend Architecture Implementation (#4) — Introduced the plugin-based operator dispatch system (OpRegistry, OpManager, SelectionPolicy) with three backend tiers: FlagOS (default/Triton), Reference (pure PyTorch), and Vendor (hardware-specific). This is the foundational change enabling multi-vendor support.
  • Add missing __init__.py files and policy test suite (#9) — Added comprehensive test coverage for the selection policy system (726-line test suite).

New Vendor Backends

Five new hardware vendor backends were added to the plugin system:

Vendor PR Description
Hygon #15 DCU accelerator support with full op registration
METAX #21, #31 GPU support with attention backend and flash attention
KunlunXin #27, #29, #30 Baidu Kunlun chip support with flash attention, plus registration and availability fixes
Iluvatar #35 Iluvatar Corex GPU support with full op set
MUSA #42 Moore Threads S-series GPU support (2,733 lines)

FlagOS Backend Enhancements

  • Scaled masked softmax forward/backward (#52) — Added softmax kernel implementations for the FlagOS backend.
  • Refactor optimizer implementations and improve multi_tensor ops (#36) — Rewrote FlagOS fused Adam and multi-tensor operations for correctness and maintainability.
  • FlagGems context cleanup (#18, #20, #22) — Unified and then simplified flag_gems invocation, removing the use_gems context manager in favor of direct calls.

Attention System

  • Register get_attention_backend for all backends and fix FlashAttention fallback (#14) — Centralized attention backend selection with proper fallback chain.
  • Fix flash-attention fallback failures (#7) — Resolved errors when falling back between attention implementations.
  • Fix torch SDPA backend multi-batch support (#17) — Fixed ValueError in the reference SDPA backend when batch_size > 1.

Operator & Optimizer Improvements

  • Add multi_tensor_adam_param_remainder and context parallel support (#23) — Extended optimizer ops and added context parallelism support for distributed attention.
  • Fix enum mismatch in plugins (#25) — Aligned Python enum definitions with C++ header constants.
  • Fix parameter mismatch between TE_FL and NVTE functions (#34) — Resolved signature inconsistencies in the PyTorch frontend ops.

Platform Compatibility

  • Python-level patches for multi-platform support (#49) — 42 files patched to ensure PyTorch frontend works across different hardware platforms (device-agnostic APIs, conditional CUDA imports, etc.).
  • Fix NV shared lib bug (#16) — Fixed shared library loading when CUDA is unavailable.
  • Fix import bugs (#6) — Resolved import errors in plugin attention backends.

CI/CD & Quality

  • Add workflows to validate TE QA test cases (#41) — Introduced CI workflows and expanded test coverage (85 files, 7,658 insertions), including softmax test improvements.

Documentation

  • Polish README (#11) — Updated project README.

Contributors

Initial Preview

02 Feb 06:25
8690ab4

Choose a tag to compare

Initial Preview Pre-release
Pre-release

🚀 Release Notes: v2.9.0+fl.0.1.0

⚠️ ALPHA RELEASE - UNSTABLE

This is the first public release of the project. It is currently in an unstable alpha state. Expect bugs, breaking changes, and incomplete features. Use in production environments at your own risk.


🌟 What's New

This release marks the initial foundation of the project. We've focused on establishing the core architecture and basic functionality:

🛠 Known Issues

Since this is a "first-light" build, please be aware of the following:

  • Stability: Unexpected crashes may occur under heavy load.
  • Incomplete Features: Some features are visible but not yet functional.
  • Documentation: README and API docs are still a work in progress.

🧪 Feedback Wanted

Help us make the stable release better! If you encounter a bug or have a suggestion:

  1. Check the [Issues Tab] to see if it's already known.
  2. If not, please open a new issue with the label bug or feedback.