FlagOS is a fully open-source AI system software stack for heterogeneous AI chips, allowing AI models to be developed once and seamlessly ported to a wide range of AI hardware with minimal effort. This repository collects reusable Skills for FlagOS — injecting domain knowledge, workflow standards, and best practices into AI coding agents.
Skills are folder-based capability packages: each skill uses documentation, scripts,
and resources to teach agents to reliably and reproducibly complete tasks in a specific domain.
Each skill folder contains a SKILL.md file with YAML frontmatter (name + description)
followed by detailed agent instructions.
Skills can also include reference docs, scripts, and assets.
This repository follows the Agent Skills open standard.
FlagOS Skills are compatible with Claude Code, Cursor, Codex, and any agent supporting the Agent Skills standard.
Use the skills CLI to install skills directly — no cloning needed:
# List available skills in this repository
npx skills add flagos-ai/skills --list
# Install a specific skill into your project
npx skills add flagos-ai/skills --skill model-migrate-flagos
# Install a specific skill globally (user-level)
npx skills add flagos-ai/skills --skill model-migrate-flagos --global
# Install all skills at once
npx skills add flagos-ai/skills --all
# Install for specific agents only
npx skills add flagos-ai/skills --agent claude-code cursorOther useful commands:
npx skills list # List installed skills
npx skills find # Search for skills interactively
npx skills update # Update all skills to latest versions
npx skills remove # Interactive removeNote: No prior installation needed —
npxdownloads theskillsCLI automatically.
-
Register the repository as a plugin marketplace (in Claude Code interactive mode):
/plugin marketplace add flagos-ai/skillsOr from the terminal:
claude plugin marketplace add flagos-ai/skills
-
Install skills:
/plugin install flagos-skills@flagos-skillsOr from the terminal:
claude plugin install flagos-skills@flagos-skills
After installation, mention the skill in your prompt — Claude automatically
loads the corresponding SKILL.md instructions.
This repository includes Cursor plugin manifests (.cursor-plugin/plugin.json
and .cursor-plugin/marketplace.json).
Install from the repository URL or local checkout via the Cursor plugin flow.
Use the $skill-installer inside Codex:
$skill-installer install model-migrate-flagos from flagos-ai/skills
Or provide the GitHub directory URL:
$skill-installer install https://github.com/flagos-ai/skills/tree/main/skills/model-migrate-flagos
Alternatively, copy skill folders into Codex's standard .agents/skills location:
cp -r skills/model-migrate-flagos $REPO_ROOT/.agents/skills/See the Codex Skills guide for more details.
gemini extensions install https://github.com/flagos-ai/skills.git --consentThis repo includes gemini-extension.json and agents/AGENTS.md for Gemini CLI integration.
See Gemini CLI extensions docs for more help.
For any agent that supports the Agent Skills standard,
point it at the skills/ directory in this repository.
Each skill is self-contained with a SKILL.md entry point.
The agents/AGENTS.md file can also be used as a fallback for agents that don't support skills natively.
| Category | Sub-category | Skill | Description |
|---|---|---|---|
| Deployment & Release | Base Image Selection | gpu-container-setup-flagos |
Automatically detect GPU vendor, find appropriate PyTorch container image, launch with correct mounts, and validate GPU functionality. Supports NVIDIA, Ascend, Metax, Iluvatar, and AMD/ROCm. Use when user says "setup container", "start pytorch container", or invokes /gpu-container-setup. |
| Model Migration | model-migrate-flagos |
Migrate a model from the latest vLLM upstream repository into the vllm-plugin-FL project (pinned at vLLM v0.13.0). Use this skill whenever someone wants to add support for a new model to vllm-plugin-FL, port model code from upstream vLLM, or backport a newly released model. Trigger when the user says things like "migrate X model", "add X model support", "port X from upstream vLLM", "make X work with the FL plugin", or simply "/model-migrate-flagos model_name". The model_name argument uses snake_case (e.g. qwen3_5, kimi_k25, deepseek_v4). Do NOT use for models already supported by vLLM 0.13.0 core, or for multimodal-only components that don't need backporting. | |
| Release Pipeline | flagrelease-entrance-flagos |
Full FlagRelease pipeline orchestrator. Runs the complete LLM deployment, verification, and benchmarking pipeline for multi-chip GPU backends. Executes: install-stack → env-verify → model-verify → perf-test in sequence, passing state between steps and producing a final structured report. Assumes gpu-container-setup (Step 1) is already done — a running container with PyTorch + GPU access must exist. | |
| Stack Installation | install-stack-flagos |
Install the 5-package multi-chip software stack (vLLM, FlagTree, FlagGems, FlagCX, vllm-plugin-FL) inside a GPU container. Handles network mirror detection, dependency ordering, wheel selection, and per-package validation. Use after gpu-container-setup has produced a running container with PyTorch + GPU access. | |
| Plugin Environment Setup | vllm-plugin-fl-setup-flagos |
Install and configure vLLM-Plugin-FL for multiple hardware backends including NVIDIA, Ascend, MetaX, Iluvatar, Moore Threads, and more. Automates the full setup workflow: detect hardware → install vLLM-Plugin-FL → install FlagGems → (optionally) install FlagCX → backend-specific configuration → inference verification. Trigger when the user says "setup vllm-plugin-fl", "install vllm-plugin-fl", "configure FL plugin", "set up FlagGems", or "set up FlagCX". | |
| Benchmarking & Eval | Accuracy & Performance Test | perf-test-flagos |
Run accuracy benchmarks (FlagEval, when available) and performance benchmarks (vllm bench serve) against a served model. Covers 5 workload profiles: short/long prefill x short/long decode + high concurrency. Collects throughput, latency, TTFT, TPOT metrics. |
| Deployment A/B Verification | model-verify-flagos |
Verify the serving stack with a user-specified target model. Runs twice: first with FlagGems/FlagCX disabled (isolate model-specific errors), then with full multi-chip stack enabled. Diffs the two runs to pinpoint which layer caused any failure. | |
| FlagPerf Case Creation | Planned | Generate FlagPerf-compliant directory structures, config files, run scripts, and expected metric baselines for new model/chip benchmark cases. | |
| Post-Deploy Auto Eval | Planned | Automatically trigger evaluation after model deployment, track evaluation status, report errors on failure, and push notifications with results upon completion. | |
| Kernel & Operator Development | Complex Operator Dev | Planned | Generate skeleton code for multi-step fused operators (fused attention, fused MoE, etc.), handling shared memory tiling strategies and multi-backend branching. |
| Experimental Op Promotion | Planned | Scan FlagGems ~130 experimental ops, check test coverage, align signatures, complete _FULL_CONFIG registration, and generate migration PRs to promote them to main ops. |
|
| Kernel Gen for FlagGems | kernelgen-flagos |
FlagGems-specific kernel generation with @pointwise_dynamic wrapper rewriting, _FULL_CONFIG registration, and operator signature alignment. |
|
| Kernel Gen for vLLM | kernelgen-flagos |
vLLM-specific kernel generation with SPDX headers, @triton.autotune, custom op registration, and dispatch integration. |
|
| Kernel Generation | kernelgen-flagos |
Unified GPU kernel operator generation and optimization skill. Automatically detects the target repository type (FlagGems, vLLM, or general Python/Triton) and dispatches to the appropriate specialized sub-skill. Includes operator generation, MCP-based iterative optimization, and feedback submission sub-skills. Use this skill when the user wants to generate or optimize a GPU kernel operator, create a Triton kernel, or says things like "generate an operator", "create a kernel for X", "optimize triton kernel", or "/kernelgen-flagos". | |
| MCP Service Setup | kernelgen-flagos |
Auto-detect and configure the kernelgen-mcp MCP service. Checks project-local config files for existing setup, guides the user through token acquisition if needed, and writes the configuration automatically. Runs before any generation/optimization/specialization sub-skill. |
|
| Kernel Optimization | kernelgen-flagos |
General-purpose Triton kernel optimization via MCP iterative loop. Analyzes existing kernels, identifies bottlenecks, and applies optimizations through multiple rounds until the target speedup is reached. | |
| Kernel Optimization for FlagGems | kernelgen-flagos |
FlagGems-specific kernel optimization with 3 modes: optimize built-in operators in-place, optimize external operators and integrate into experimental_ops, or optimize existing experimental operators. Includes accuracy tests and performance benchmarks. | |
| Kernel Optimization for vLLM | kernelgen-flagos |
vLLM-specific kernel optimization with CustomOp registration, accuracy tests, and performance benchmark integration. Optimizes Triton operators and automatically integrates them into the vLLM project. | |
| Kernel Platform Specialization | kernelgen-flagos |
Platform specialization for Triton operators via MCP specialize_kernel tool. Migrates GPU Triton operators to target platforms (e.g., Huawei Ascend NPU), handling architecture differences, Grid configuration, and memory alignment. |
|
| Kernel Specialization for FlagGems | kernelgen-flagos |
Combines MCP platform specialization with FlagGems framework integration. Supports four integration modes: vendor-ops, vendor-fused, override-builtin, and experimental. Includes automated testing and performance benchmarking. | |
| Operator Diagnosis | Planned | Diagnose abnormal operators in the FlagOS stack — identify precision errors, performance regressions, and backend-specific failures across chips. | |
| Multi-Chip Backend Onboarding | Dispatch Op Extension | Planned | Query dispatchable ops from base.py, generate impl template files, add OpImpl registration to register_ops.py, and create unit test skeletons. |
| FlagCX Comm Backend | Planned | Parse 20+ device and 15+ CCL function pointer signatures from header files, generate all stub implementations with trivial function fills, plus CMake build configuration. | |
| FlagGems Chip Backend | Planned | Generate the full _vendor/ scaffold: __init__.py (VendorInfoBase config) + heuristics_config_utils.py + tune_configs.yaml + ops/ directory following the FlagGems backend contribution guide. |
|
| Heterogeneous Training Config | Planned | Generate valid FlagScale heterogeneous training configs from hardware topology descriptions, auto-compute hetero_process_meshes / hetero_pipeline_layer_split, and validate constraints (TP×DP×PP = device count). |
|
| vLLM Vendor Backend | Planned | Scaffold a new vllm-plugin-FL vendor backend from the template: generate vendor directory, Backend subclass, is_available detection, register_ops framework, and test skeleton. |
|
| Developer Tooling | Feedback Submission | kernelgen-flagos |
Auto-collect environment info, construct structured GitHub issues, and submit to flagos-ai/skills with email fallback when GitHub CLI is unavailable. |
| General | tle-developer-flagos |
Self-contained orchestration skill for writing high-performance TLE kernels and shipping TLE feature changes with reproducible validation. Use when the user wants to write/optimize TLE kernels, implement TLE API/verifier/lowering features, or debug TLE correctness/performance issues. Trigger on phrases like "write a TLE kernel", "optimize TLE operator", and "debug TLE local_ptr". | |
| Local Dev Environment | Planned | Set up local development and debugging environments for FlagOS modules (FlagGems / FlagTree / FlagCX / etc.) — configure dependencies, environment variables, and debug toolchains. | |
| Skill Development | skill-creator-flagos |
Create new skills, modify existing skills, and validate skill quality for the FlagOS skills repository. Use this skill whenever someone wants to create a skill from scratch, improve or edit an existing skill, scaffold a new skill directory, validate skill structure, or run test cases against a skill. Trigger when the user says things like "create a skill", "make a new skill for X", "scaffold a skill", "improve this skill", "validate my skill", or simply "/skill-creator-flagos". Also trigger when users mention turning a workflow into a reusable skill, or want to package a repeated process as a skill. |
Once a skill is installed, mention it directly in your prompt:
- "Use model-migrate-flagos to migrate the Qwen3-5 model from upstream vLLM"
- "/model-migrate-flagos qwen3_5"
- "Port the DeepSeek-V4 model to vllm-plugin-FL"
Your agent automatically loads the corresponding SKILL.md instructions and helper scripts.
├── .claude-plugin/ # Claude Code plugin manifest
│ └── marketplace.json
├── .cursor-plugin/ # Cursor plugin manifest
│ ├── marketplace.json
│ └── plugin.json
├── agents/ # Codex / Gemini CLI fallback
│ └── AGENTS.md
├── assets/ # Repository-level static resources
├── contributing.md # Contribution guidelines
├── gemini-extension.json # Gemini CLI extension manifest
├── scripts/ # Repository-level utility scripts
│ └── validate_skills.py # Batch validate all skills
├── skills/ # Skill directories
│ ├── model-migrate-flagos/ # Model migration workflow
│ └── ...
├── spec/ # Agent Skills standard & local conventions
│ ├── README.md
│ └── agent-skills-spec.md
└── template/ # Template for creating new skills
└── SKILL.md
-
Create directory & copy template
mkdir skills/<skill-name> cp template/SKILL.md skills/<skill-name>/SKILL.md
-
Edit frontmatter —
name(lowercase + hyphens, must match directory name) anddescription(what it does + when to trigger) -
Write the body — Overview, Prerequisites, Execution steps, Examples (2-3), Troubleshooting
-
Add supporting files (optional) —
references/,scripts/,assets/,LICENSE.txt -
Validate
python scripts/validate_skills.py
See contributing.md for the full contribution guide.
This project is licenced under the Apache License version 2.0 license.