TransformerEngine-FL Release Notes

Highlights

This release establishes the multi-backend plugin architecture as the foundation of TE-FL and adds broad hardware vendor support, bringing FP8 training/inference to six new AI chip platforms beyond NVIDIA.

Core Architecture

Multi-Backend Architecture Implementation (#4) — Introduced the plugin-based operator dispatch system (OpRegistry, OpManager, SelectionPolicy) with three backend tiers: FlagOS (default/Triton), Reference (pure PyTorch), and Vendor (hardware-specific). This is the foundational change enabling multi-vendor support.
Add missing __init__.py files and policy test suite (#9) — Added comprehensive test coverage for the selection policy system (726-line test suite).

New Vendor Backends

Five new hardware vendor backends were added to the plugin system:

Vendor	PR	Description
Hygon	#15	DCU accelerator support with full op registration
METAX	#21, #31	GPU support with attention backend and flash attention
KunlunXin	#27, #29, #30	Baidu Kunlun chip support with flash attention, plus registration and availability fixes
Iluvatar	#35	Iluvatar Corex GPU support with full op set
MUSA	#42	Moore Threads S-series GPU support (2,733 lines)

FlagOS Backend Enhancements

Scaled masked softmax forward/backward (#52) — Added softmax kernel implementations for the FlagOS backend.
Refactor optimizer implementations and improve multi_tensor ops (#36) — Rewrote FlagOS fused Adam and multi-tensor operations for correctness and maintainability.
FlagGems context cleanup (#18, #20, #22) — Unified and then simplified flag_gems invocation, removing the use_gems context manager in favor of direct calls.

Attention System

Register get_attention_backend for all backends and fix FlashAttention fallback (#14) — Centralized attention backend selection with proper fallback chain.
Fix flash-attention fallback failures (#7) — Resolved errors when falling back between attention implementations.
Fix torch SDPA backend multi-batch support (#17) — Fixed ValueError in the reference SDPA backend when batch_size > 1.

Operator & Optimizer Improvements

Add multi_tensor_adam_param_remainder and context parallel support (#23) — Extended optimizer ops and added context parallelism support for distributed attention.
Fix enum mismatch in plugins (#25) — Aligned Python enum definitions with C++ header constants.
Fix parameter mismatch between TE_FL and NVTE functions (#34) — Resolved signature inconsistencies in the PyTorch frontend ops.

Platform Compatibility

Python-level patches for multi-platform support (#49) — 42 files patched to ensure PyTorch frontend works across different hardware platforms (device-agnostic APIs, conditional CUDA imports, etc.).
Fix NV shared lib bug (#16) — Fixed shared library loading when CUDA is unavailable.
Fix import bugs (#6) — Resolved import errors in plugin attention backends.

CI/CD & Quality

Add workflows to validate TE QA test cases (#41) — Introduced CI workflows and expanded test coverage (85 files, 7,658 insertions), including softmax test improvements.

Documentation

Polish README (#11) — Updated project README.

Contributors

🚀 Release Notes: v2.9.0+fl.0.1.0

⚠️ ALPHA RELEASE - UNSTABLE

This is the first public release of the project. It is currently in an unstable alpha state. Expect bugs, breaking changes, and incomplete features. Use in production environments at your own risk.

🌟 What's New

This release marks the initial foundation of the project. We've focused on establishing the core architecture and basic functionality:

🛠 Known Issues

Since this is a "first-light" build, please be aware of the following:

Stability: Unexpected crashes may occur under heavy load.
Incomplete Features: Some features are visible but not yet functional.
Documentation: README and API docs are still a work in progress.

🧪 Feedback Wanted

Help us make the stable release better! If you encounter a bug or have a suggestion:

Check the [Issues Tab] to see if it's already known.
If not, please open a new issue with the label bug or feedback.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

TransformerEngine-FL Release Notes

Highlights

Core Architecture

New Vendor Backends

FlagOS Backend Enhancements

Attention System

Operator & Optimizer Improvements

Platform Compatibility

CI/CD & Quality

Documentation

Contributors

Contributors

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

🚀 Release Notes: v2.9.0+fl.0.1.0

⚠️ ALPHA RELEASE - UNSTABLE

🌟 What's New

🛠 Known Issues

🧪 Feedback Wanted

Uh oh!

Releases: flagos-ai/TransformerEngine-FL

v0.1.0+te2.9.0

TransformerEngine-FL Release Notes

Highlights

Core Architecture

New Vendor Backends

FlagOS Backend Enhancements

Attention System

Operator & Optimizer Improvements

Platform Compatibility

CI/CD & Quality

Documentation

Contributors

Contributors

Uh oh!

Initial Preview

🚀 Release Notes: v2.9.0+fl.0.1.0

⚠️ ALPHA RELEASE - UNSTABLE

🌟 What's New

🛠 Known Issues

🧪 Feedback Wanted

Uh oh!