Dispatch Mechanism

This directory implements the operator dispatch mechanism for vllm-plugin-FL, providing a flexible operator dispatch system that selects between different backend implementations (FlagGems, PyTorch, vendor-specific) based on availability and policy configuration.

Directory Structure

dispatch/
├── __init__.py              # Module entry point, exports public API
├── types.py                 # Core type definitions (OpImpl, BackendImplKind)
├── registry.py              # Thread-safe operator registry
├── policy.py                # Selection policy management
├── manager.py               # Core dispatch manager
├── builtin_ops.py           # Built-in operator registration
├── ops.py                   # Backend base interface
├── discovery.py             # Plugin discovery mechanism
├── logger_manager.py        # Centralized logging configuration
├── config/                  # Platform-specific configurations
│   ├── __init__.py          # Config loader module
│   ├── ascend.yaml          # Ascend NPU default configuration
│   └── cuda.yaml            # CUDA default configuration
└── backends/                # Backend implementations
    ├── base.py              # Backend abstract base class
    ├── flaggems/            # FlagGems backend (DEFAULT, priority 150)
    │   ├── flaggems.py      # Backend class
    │   ├── register_ops.py  # Registration function
    │   └── impl/            # Operator implementations
    │       ├── activation.py
    │       ├── normalization.py
    │       ├── rotary.py
    │       ├── attention.py       # AttentionFLBackend, AttentionFLImpl
    │       ├── mla.py             # MLAFLBackend, MLAFLImpl
    │       └── custom_attention.py # Attention backend registration
    ├── reference/           # Reference backend (PyTorch, priority 50)
    └── vendor/              # Vendor-specific backends (priority 100)
        ├── cuda/            # NVIDIA CUDA backend
        │   └── impl/
        │       ├── activation.py
        │       ├── normalization.py
        │       └── rotary.py
        └── ascend/          # Huawei Ascend NPU backend
            └── impl/
                ├── activation.py
                ├── normalization.py
                ├── rotary.py
                ├── attention.py       # AscendAttentionBackend
                └── attention_mask.py  # Attention mask utilities

Core Concepts

1. Backend Implementation Kind

DEFAULT: Default implementation (FlagGems), priority 150
VENDOR: Vendor-specific implementation, priority 100
REFERENCE: Reference implementation (PyTorch native), priority 50

2. Operator Implementation (OpImpl)

Each operator implementation contains:

op_name: Operator name (e.g., "silu_and_mul", "rms_norm")
impl_id: Unique implementation identifier (e.g., "default.flagos")
kind: Implementation type
fn: Actual implementation function
vendor: Vendor name (required for VENDOR type)
priority: Selection priority (higher value = preferred)

3. Selection Policy

Policy controls operator implementation selection:

prefer: Preferred implementation type
strict: Strict mode, whether to raise error when primary implementation fails
per_op_order: Custom selection order for each operator
deny_vendors: List of denied vendors
allow_vendors: Whitelist of allowed vendors

Architecture Overview

Dispatch Flow Diagram

┌─────────────────────────────────────────────────────────────────┐
│                         User Code                                │
│                 call_op("rms_norm", x, ...)                      │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                       OpManager                                  │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │ 1. Check Cache                                            │  │
│  │ 2. Get Policy (from env or context)                      │  │
│  │ 3. Query Registry for all implementations                │  │
│  │ 4. Filter by vendor allow/deny list                      │  │
│  │ 5. Check availability (is_available())                   │  │
│  │ 6. Sort by priority & selection order                    │  │
│  │ 7. Cache & return selected implementation                │  │
│  └──────────────────────────────────────────────────────────┘  │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                        OpRegistry                                │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐         │
│  │   FlagGems   │  │    Vendor    │  │  Reference   │         │
│  │ Priority: 150│  │ Priority: 100│  │ Priority: 50 │         │
│  └──────────────┘  └──────────────┘  └──────────────┘         │
└─────────────────────────────────────────────────────────────────┘

Priority Selection Flow

┌─────────────────────────────────────────────────────────────────┐
│                     VLLM_FL_PREFER=flagos                       │
│                    (Default Behavior)                            │
└─────────────────────────────────────────────────────────────────┘
                             │
                             ▼
        ┌────────────────────┴────────────────────┐
        │                                          │
        ▼                                          ▼
┌──────────────┐  Available?  ┌──────────────┐  Available?
│   FlagGems   │─────No──────▶│    Vendor    │─────No──────▶
│ Priority: 150│              │ Priority: 100│
└──────────────┘              └──────────────┘
        │                              │
       Yes                            Yes
        │                              │
        ▼                              ▼
    ✓ Selected                    ✓ Selected

                                                  ┌──────────────┐
                                                  │  Reference   │
                                                  │ Priority: 50 │
                                                  └──────────────┘
                                                         │
                                                        Yes
                                                         │
                                                         ▼
                                                    ✓ Selected

Plugin Integration Points

┌─────────────────────────────────────────────────────────────────┐
│                    Plugin Discovery                              │
│                                                                   │
│  ┌────────────────┐  ┌────────────────┐  ┌────────────────┐   │
│  │   Built-in     │  │  Entry Points  │  │  Environment   │   │
│  │   backends/    │  │  (setuptools)  │  │  PLUGIN_MODULES│   │
│  │   vendor/      │  │                │  │                │   │
│  └────────┬───────┘  └────────┬───────┘  └────────┬───────┘   │
│           │                   │                    │            │
│           └───────────────────┴────────────────────┘            │
│                               │                                  │
└───────────────────────────────┼──────────────────────────────────┘
                                │
                                ▼
                        ┌───────────────┐
                        │   Registry    │
                        │  register()   │
                        └───────────────┘

Quick Start

Basic Usage

from vllm_fl.dispatch import call_op, resolve_op

# Method 1: Call operator directly
result = call_op("silu_and_mul", x)

# Method 2: Resolve first, then call
fn = resolve_op("rms_norm")
result = fn(x, residual, weight, epsilon)

Using the Manager

from vllm_fl.dispatch import get_default_manager

manager = get_default_manager()

# Resolve operator
fn = manager.resolve("rotary_embedding")
result = fn(query, key, cos, sin, position_ids)

# Or call directly
result = manager.call("silu_and_mul", x)

Configuration

The dispatch system supports multiple ways to configure backend selection:

User-specified configuration file (YAML) - Complete override
Environment variables - Override specific items
Platform-specific configuration file - Auto-detected defaults
Built-in default values

Configuration Priority

┌─────────────────────────────────────────────────────────────────┐
│                    Configuration Priority                        │
│                  (Highest to Lowest)                             │
├─────────────────────────────────────────────────────────────────┤
│  1. VLLM_FL_CONFIG        │ User config file, complete override │
│  2. Environment Variables │ Override specific items              │
│  3. Platform Config File  │ ascend.yaml / cuda.yaml defaults     │
│  4. Built-in Defaults     │ Code-defined default values          │
└─────────────────────────────────────────────────────────────────┘

Key Points:

Environment variables can override specific items from platform config
If user doesn't set any environment variable, platform config is used
Users can also modify platform config files directly

Platform-Specific Configuration

The system automatically detects hardware and loads the corresponding configuration file from config/ directory:

Platform	Config File	Auto-Detection
Ascend NPU	`config/ascend.yaml`	`torch.npu.is_available()`
NVIDIA GPU	`config/cuda.yaml`	`torch.cuda.is_available()`

You can force a specific platform using VLLM_FL_PLATFORM environment variable:

export VLLM_FL_PLATFORM=ascend  # Force Ascend config
export VLLM_FL_PLATFORM=cuda    # Force CUDA config

User-Specified Configuration File (YAML)

Set the VLLM_FL_CONFIG environment variable to specify a YAML configuration file that completely overrides all other settings:

export VLLM_FL_CONFIG=/path/to/vllm_fl_dispatch.yaml

Example Configuration File

# vllm_fl_dispatch.yaml

# Preferred backend type: flagos, vendor, or reference
prefer: vendor

# Strict mode:
#   true  = fail immediately on error, no fallback
#   false = try next backend on failure (default)
strict: false

# Vendor whitelist (optional)
allow_vendors:
  - cuda

# Vendor blacklist (optional)
deny_vendors:
  - ascend

# Per-operator backend selection order (optional)
# Only the backends listed will be tried, in the specified order.
op_backends:
  rms_norm:
    - vendor        # Try any available vendor first
    - flagos        # Then try flagos
    # reference not listed, so it won't be used for rms_norm

  silu_and_mul:
    - vendor:cuda   # Only try CUDA, not other vendors
    - flagos
    - reference

# FlagGems operator blacklist (optional)
# These operators will NOT use FlagGems implementation
flagos_blacklist:
  - to_copy
  - zeros
  - mm

# OOT operator blacklist (optional)
# These operators will NOT be registered as OOT replacements
oot_blacklist:
  - fused_moe

Token Types Explained

Token	Description
`flagos`	FlagOS default implementation
`reference`	PyTorch reference implementation
`vendor`	Any available vendor backend (auto-detects hardware)
`vendor:cuda`	Only CUDA vendor backend
`vendor:ascend`	Only Ascend vendor backend

Note: When using vendor (without specifying a vendor name), the system automatically selects an available vendor backend based on hardware detection.

Environment Variables

Environment variables can override specific items from platform config. If not set, values from platform config file are used.

Core Configuration

Variable	Default	Description
`VLLM_FL_PREFER_ENABLED`	`true`	Global switch. Set `false` to disable all dispatch features
`VLLM_FL_CONFIG`	(none)	Path to YAML config file (complete override)
`VLLM_FL_PLATFORM`	(auto)	Force platform: `ascend`, `cuda`

Backend Selection

Variable	Default	Description
`VLLM_FL_PREFER`	`flagos`	Preferred backend: `flagos`, `vendor`, `reference`
`VLLM_FL_STRICT`	`0`	Strict mode: `1` = fail on error, `0` = try fallback
`VLLM_FL_PER_OP`	(none)	Per-operator order: `op1=a\|b\|c;op2=x\|y`
`VLLM_FL_ALLOW_VENDORS`	(none)	Vendor whitelist, comma-separated
`VLLM_FL_DENY_VENDORS`	(none)	Vendor blacklist, comma-separated

FlagGems Control

Variable	Default	Description
`USE_FLAGGEMS`	`true`	Enable/disable FlagGems
`VLLM_FL_FLAGOS_WHITELIST`	(none)	FlagGems ops whitelist (mutually exclusive with blacklist)
`VLLM_FL_FLAGOS_BLACKLIST`	(none)	FlagGems ops blacklist (mutually exclusive with whitelist)

Priority: WHITELIST > BLACKLIST (env) > flagos_blacklist (config file)

OOT Operator Control

Variable	Default	Description
`VLLM_FL_OOT_ENABLED`	`1`	Enable OOT operator registration
`VLLM_FL_OOT_WHITELIST`	(none)	OOT ops whitelist
`VLLM_FL_OOT_BLACKLIST`	(none)	OOT ops blacklist

Priority: WHITELIST > BLACKLIST (env) > oot_blacklist (config file)

Debug & Logging

Variable	Default	Description
`VLLM_FL_LOG_LEVEL`	`INFO`	Log level: `DEBUG`, `INFO`, `WARNING`, `ERROR`
`VLLM_FL_DISPATCH_DEBUG`	`0`	Enable dispatch debug mode

Plugins

Variable	Default	Description
`VLLM_FL_PLUGIN_MODULES`	(none)	External plugin modules, comma-separated
`VLLM_FL_OP_CONFIG`	(none)	Operator config JSON file path

Other

Variable	Default	Description
`FLAGCX_PATH`	(none)	FlagCX library path (enables FlagCX communication backend)
`FLAGGEMS_ENABLE_OPLIST_PATH`	`/tmp/flaggems_enable_oplist.txt`	FlagGems enabled ops list file

Examples

# Use platform default config (auto-detected)
# Nothing to set - just run your application

# Override only the prefer setting (other items from platform config)
export VLLM_FL_PREFER=vendor

# Override FlagGems blacklist (overrides config file blacklist)
export VLLM_FL_FLAGOS_BLACKLIST="mm,to_copy,zeros"

# Use whitelist instead (completely ignores any blacklist)
export VLLM_FL_FLAGOS_WHITELIST="silu_and_mul,rms_norm"

# Specify per-operator order
export VLLM_FL_PER_OP="rms_norm=vendor|flagos|reference"

# Use completely custom config file
export VLLM_FL_CONFIG=/path/to/my_config.yaml

# Force specific platform
export VLLM_FL_PLATFORM=ascend

# Enable debug logging
export VLLM_FL_LOG_LEVEL=DEBUG

Op Backends Selection Example

op_backends:
  mul:
    - flagos
  silu_and_mul:
    - flagos
    - vendor
    - reference

Configuration Priority Details

The dispatch system applies configuration in the following order:

┌─────────────────────────────────────────────────────────────────────┐
│                     Configuration Resolution                          │
├─────────────────────────────────────────────────────────────────────┤
│                                                                       │
│  VLLM_FL_CONFIG set?                                                 │
│       │                                                               │
│       ├── Yes ──▶ Use user config file (complete override)           │
│       │                                                               │
│       └── No ──▶ For each setting:                                   │
│                       │                                               │
│                       ├── Env var set? ──▶ Use env var value         │
│                       │                                               │
│                       └── Not set ──▶ Use platform config value      │
│                                              │                        │
│                                              └── Not found ──▶ Default│
│                                                                       │
└─────────────────────────────────────────────────────────────────────┘

Whitelist vs Blacklist Priority

For FlagGems and OOT operators:

WHITELIST (env) ──▶ Completely overrides blacklist
       │
       └── Not set ──▶ BLACKLIST (env) ──▶ Overrides config blacklist
                              │
                              └── Not set ──▶ Config file blacklist
                                                    │
                                                    └── Not set ──▶ Allow all

Important Notes:

Whitelist and blacklist environment variables are mutually exclusive (error if both set)
If whitelist is set, it completely ignores any blacklist (env or config)
Environment blacklist overrides config file blacklist (not merged)

Example: Combined Environment Variables

# Platform config (ascend.yaml) has:
#   prefer: flagos
#   flagos_blacklist: [to_copy, zeros, mm, ...]

# User overrides only prefer, blacklist still from config
export VLLM_FL_PREFER=vendor

# Result:
#   prefer: vendor (from env)
#   flagos_blacklist: [to_copy, zeros, mm, ...] (from config)

# User wants to override blacklist too
export VLLM_FL_PREFER=vendor
export VLLM_FL_FLAGOS_BLACKLIST="custom_op1,custom_op2"

# Result:
#   prefer: vendor (from env)
#   flagos_blacklist: [custom_op1, custom_op2] (from env, config ignored)

Important Notes

Environment variables override, not merge: Setting an env var replaces the config value entirely
VLLM_FL_PREFER sets preference, not exclusivity: It defines the selection order but will fall back to other backends if the preferred one is unavailable
To force a specific backend: Combine PREFER with DENY_VENDORS or use PER_OP to exclude unwanted backends
VLLM_FL_STRICT=1: Enables strict mode — fails immediately if the primary implementation fails, no fallback is attempted

Backend Priority Values

Priority values are spaced by 50 to allow future insertion of intermediate priorities:

BackendPriority.DEFAULT = 150 (FlagGems)
BackendPriority.VENDOR = 100 (Vendor-specific)
BackendPriority.REFERENCE = 50 (PyTorch)

Policy Context Management

Supports temporary policy override in code:

from vllm_fl.dispatch import (
    with_strict_mode,
    with_preference,
    with_allowed_vendors,
    with_denied_vendors,
)

# Temporarily enable strict mode
with with_strict_mode():
    result = call_op("silu_and_mul", x)

# Temporarily switch preferred backend
with with_preference("reference"):
    result = call_op("rms_norm", x, residual, weight, epsilon)

# Temporarily restrict allowed vendors
with with_allowed_vendors("vendor_a"):
    result = call_op("rotary_embedding", query, key, cos, sin, position_ids)

Supported Operators

Currently supported operators:

Operator	Description	FlagGems	Reference	Vendor
`silu_and_mul`	SiLU activation + element-wise multiplication	✓	✓	✓
`rms_norm`	RMS normalization	✓	✓	✓
`rotary_embedding`	Rotary position embedding	✓	✓	✓
`attention_backend`	Attention backend class path	✓	-	✓

Selection Process

Cache Check: Check if dispatch cache hits
Get Implementations: Retrieve all registered implementations from registry
Vendor Filtering: Filter by policy's allow/deny lists
Availability Check: Call is_available() to check if implementation is available
Priority Sorting: Select best implementation based on per-op order or default order
Cache Result: Cache selection result to speed up subsequent calls

Fallback Mechanism

When VLLM_FL_STRICT=0 (default), if the primary implementation fails, the system automatically tries other available implementations:

Op 'rms_norm' using 'default.flagos' (kind=flagos, vendor=None)
[WARNING] Implementation 'default.flagos' failed for op 'rms_norm': ...
Op 'rms_norm' fallback to 'reference.torch' (kind=reference, vendor=None)

Extending the System

Adding New Operators

When adding a new operator, modify these files:

backends/flaggems/impl/*.py - Add FlagGems implementation
backends/flaggems/flaggems.py - Add method to backend class
backends/flaggems/register_ops.py - Register OpImpl
backends/reference/impl/*.py - Add PyTorch implementation (if applicable)
backends/reference/reference.py - Add method to backend class
backends/reference/register_ops.py - Register OpImpl
backends/vendor/<vendor>/impl/*.py - Add vendor-specific implementation (optional)
backends/vendor/<vendor>/<vendor>.py - Add method to vendor backend class
backends/vendor/<vendor>/register_ops.py - Register vendor OpImpl
ops.py - Add abstract method declaration

Note: Not all operators require a reference implementation. For example, attention_backend only has FlagGems and vendor implementations since it returns a backend class path rather than executing a computation.

Adding Vendor Backends

The dispatch system supports three ways to integrate vendor backends:

Built-in vendor backends - Located in backends/vendor/ (recommended for core vendors)
External plugin packages - Distributed as separate Python packages
Environment-based plugins - Loaded via VLLM_FL_PLUGIN_MODULES

Option 1: Built-in Vendor Backend

Directory structure:

backends/vendor/<vendor_name>/
├── __init__.py
├── <vendor_name>.py        # Backend class
├── register_ops.py         # Registration function
└── impl/                   # Operator implementations
    ├── __init__.py
    ├── activation.py
    ├── normalization.py
    ├── rotary.py
    └── attention.py        # (optional) Vendor-specific attention backend

Step 1: Create Backend Class (<vendor_name>.py):

from ...base import Backend

class <VendorName>Backend(Backend):
    _available = None

    @property
    def name(self) -> str:
        return "<vendor_name>"

    @property
    def vendor(self) -> str:
        return "<vendor_name>"  # Required for vendor backends

    def is_available(self) -> bool:
        if <VendorName>Backend._available is None:
            try:
                import <vendor_library>
                <VendorName>Backend._available = True
            except ImportError:
                <VendorName>Backend._available = False
        return <VendorName>Backend._available

    def silu_and_mul(self, x):
        from .impl.activation import silu_and_mul_<vendor>
        return silu_and_mul_<vendor>(x)

Step 2: Create Registration Module (register_ops.py):

from ....types import OpImpl, BackendImplKind, BackendPriority

def register_builtins(registry):
    from .<vendor_name> import <VendorName>Backend
    backend = <VendorName>Backend()

    impls = [
        OpImpl(
            op_name="silu_and_mul",
            impl_id="vendor.<vendor_name>",
            kind=BackendImplKind.VENDOR,
            fn=backend.silu_and_mul,
            vendor="<vendor_name>",
            priority=BackendPriority.VENDOR,  # 100
        ),
    ]
    registry.register_many(impls)

Step 3: Register in builtin_ops.py:

try:
    from .backends.vendor.<vendor_name>.register_ops import register_builtins as register_<vendor>
    register_<vendor>(registry)
except Exception as e:
    logger.debug(f"<Vendor> operators not available: {e}")

Option 2: External Plugin Package

Create a separate package with entry points:

# setup.py
setup(
    name="vllm-plugin-<vendor>",
    entry_points={
        "vllm_fl.plugin": [
            "<vendor> = vllm_fl_<vendor>.register_ops:register_builtins",
        ],
    },
)

Install and use:

pip install vllm-plugin-<vendor>
# Plugin auto-discovered via entry points

Option 3: Environment-based Plugin

export VLLM_FL_PLUGIN_MODULES=my_custom_backend.register_ops

The module should provide a register_builtins(registry) function.

Priority Levels

Use constants from types.py:

BackendPriority.DEFAULT (150) - FlagGems
BackendPriority.VENDOR (100) - Vendor backends
BackendPriority.REFERENCE (50) - PyTorch

Testing Your Backend

from vllm_fl.dispatch import get_default_manager

manager = get_default_manager()
manager.ensure_initialized()

# Check registration
snap = manager.registry.snapshot()
for op_name, impls in snap.impls_by_op.items():
    for impl in impls:
        if impl.vendor == "<vendor_name>":
            print(f"{op_name}: {impl.impl_id}, available={impl.is_available()}")

Enable debug output:

export VLLM_FL_LOG_LEVEL=DEBUG

Vendor Backend Checklist

Backend class inherits from Backend
vendor property returns vendor name (not None)
is_available() checks hardware/library availability
register_ops.py uses BackendImplKind.VENDOR
impl_id follows format: vendor.<vendor_name>
Priority set to BackendPriority.VENDOR (100)
Error handling for missing dependencies
(Optional) attention_backend() returns vendor-specific attention backend class path

Current Vendor Backends

Vendor	Device	Library	Attention Backend
`cuda`	NVIDIA GPU	`vllm._custom_ops`	- (uses vLLM native)
`ascend`	Huawei NPU	`torch_npu`	`AscendAttentionBackend`

See backends/vendor/template/ for a template to create new vendor backends.

Multi-Process Safety

OpManager supports multi-process environments:

Uses os.register_at_fork() to automatically reset state after fork
PID detection ensures independent initialization per process
Thread-safe registry and cache operations

API Reference

Convenience Functions

call_op(op_name, *args, **kwargs): Call an operator
resolve_op(op_name): Resolve operator implementation

Policy Management

get_policy(): Get current policy
set_global_policy(policy): Set global policy
reset_global_policy(): Reset to environment variable defaults
policy_context(policy): Temporary policy context
policy_from_config(config_path): Create policy from YAML config file

Manager

get_default_manager(): Get default manager instance
reset_default_manager(): Reset default manager

Plugin Discovery

discover_plugins(registry): Discover and load plugins
get_discovered_plugins(): Get list of discovered plugins
clear_discovered_plugins(): Clear discovered plugins list

Logging

get_logger(name): Get logger instance
set_log_level(level, name): Set log level

FilesExpand file tree

README.md

Latest commit

History