diff --git a/skills/contribute/SKILL.md b/skills/contribute/SKILL.md index cd4b6db..71c9bfa 100644 --- a/skills/contribute/SKILL.md +++ b/skills/contribute/SKILL.md @@ -98,23 +98,23 @@ let value = parse(input).context("parsing model config")?; | Change config | `model_gateway/src/config/types.rs` | | Change worker creation | `model_gateway/src/core/steps/worker/local/` | | Change service discovery | `model_gateway/src/service_discovery.rs` | -| Change API types | `protocols/src/` (careful — shared by all crates) | +| Change API types | `crates/protocols/src/` (careful — shared by all crates) | | Add routing policy | `model_gateway/src/core/routing/` | -| Add tool parser | `tool_parser/src/parsers/` | -| Add reasoning parser | `reasoning_parser/src/parsers/` | +| Add tool parser | `crates/tool_parser/src/parsers/` | +| Add reasoning parser | `crates/reasoning_parser/src/parsers/` | | Update Python bindings | `bindings/python/src/lib.rs` | | Update Go SDK | `bindings/golang/` | -| Add storage backend | `data_connector/src/` | +| Add storage backend | `crates/data_connector/src/` | | Add E2E tests | `e2e_test/` | -| Add WASM middleware | `wasm/examples/` | -| Add MCP tool support | `mcp/src/` | +| Add WASM middleware | `crates/wasm/examples/` | +| Add MCP tool support | `crates/mcp/src/` | ## Rationalization Prevention | Excuse | Reality | |--------|---------| | "Clippy is clean enough with a few warnings" | `-D warnings` means zero. One warning = not clean. | -| "I didn't change bindings, skip step 4" | If you touched `config/types.rs` or `protocols/`, the struct literal in `bindings/python/src/lib.rs` may need a default. Check. | +| "I didn't change bindings, skip step 4" | If you touched `config/types.rs` or `crates/protocols/`, the struct literal in `bindings/python/src/lib.rs` may need a default. Check. | | "Only touched one file, don't need full gate" | The two-path config rule means a one-file change can silently break propagation. Run all five. | | "Tests are slow, I'll run them later" | "Later" means shipping untested code. Run them now. | | "It's just a docs change" | Even docs PRs need clean formatting and conventional commits. Steps 1 and 5 still apply. | diff --git a/skills/implement/SKILL.md b/skills/implement/SKILL.md index 92765b2..af641be 100644 --- a/skills/implement/SKILL.md +++ b/skills/implement/SKILL.md @@ -22,7 +22,7 @@ Do NOT write implementation code until you have: 3. Created a task for each step in the recipe -**Escape hatch:** Single-file changes under 20 lines that don't touch `config/types.rs`, `protocols/`, `main.rs` (CliArgs or conversion functions), or `bindings/` may skip the full recipe. You MUST still chain to `smg:contribute` before PR. +**Escape hatch:** Single-file changes under 20 lines that don't touch `config/types.rs`, `crates/protocols/`, `main.rs` (CliArgs or conversion functions), or `bindings/` may skip the full recipe. You MUST still chain to `smg:contribute` before PR. ## Detection Table diff --git a/skills/implement/auth-feature.md b/skills/implement/auth-feature.md index 456cea9..4fa9eb0 100644 --- a/skills/implement/auth-feature.md +++ b/skills/implement/auth-feature.md @@ -6,7 +6,7 @@ Two-factor auth: API key (SHA-256) + JWT/OIDC. Roles: Admin (control plane) and ### Adding a New Role -1. Extend `Role` enum in `auth/src/` +1. Extend `Role` enum in `crates/auth/src/` 2. Update permission checks in middleware 3. Update role mapping for JWT claims 4. Add audit logging for new role actions @@ -26,7 +26,7 @@ Two-factor auth: API key (SHA-256) + JWT/OIDC. Roles: Admin (control plane) and ### Adding a Custom Auth Method -1. Implement validation logic in `auth/src/` +1. Implement validation logic in `crates/auth/src/` 2. Extract `Principal` from request 3. Integrate in middleware chain (`/admin/*` routes) 4. Add audit event for the new method diff --git a/skills/implement/grpc-backend.md b/skills/implement/grpc-backend.md index af95bb2..d3f06d8 100644 --- a/skills/implement/grpc-backend.md +++ b/skills/implement/grpc-backend.md @@ -6,7 +6,7 @@ gRPC clients connect to LLM backends (SGLang, vLLM, TRT). Use shared macros for ### Step 1: Create client file -**File:** `grpc_client/src/mybackend.rs` +**File:** `crates/grpc_client/src/mybackend.rs` Implement connection, health check, and inference methods. Use shared macros: ```rust diff --git a/skills/implement/kv-index-feature.md b/skills/implement/kv-index-feature.md index c58b92c..0a18bf7 100644 --- a/skills/implement/kv-index-feature.md +++ b/skills/implement/kv-index-feature.md @@ -15,7 +15,7 @@ Both implement `RadixTree` trait: prefix insertion, longest-prefix-match, LRU ev ### Adding Index Features -1. Implement in `kv_index/src/` +1. Implement in `crates/kv_index/src/` 2. Ensure `Send + Sync` (accessed from routing hot path) 3. Support both String and Token variants if applicable 4. Add eviction/cleanup mechanism (prevent unbounded memory) diff --git a/skills/implement/mcp-feature.md b/skills/implement/mcp-feature.md index e9cf688..00ddb5c 100644 --- a/skills/implement/mcp-feature.md +++ b/skills/implement/mcp-feature.md @@ -5,7 +5,7 @@ Model Context Protocol client for external tool servers. Manages discovery, exec ## Architecture ``` -mcp/src/ +crates/mcp/src/ core/ orchestrator.rs → Tool execution, routing, validation (101KB) session.rs → Server bindings, tool sessions @@ -25,7 +25,7 @@ Implement `rmcp::Transport` trait for the new connection type. ### Adding a Response Format -**Directory:** `mcp/src/transform/` +**Directory:** `crates/mcp/src/transform/` Convert MCP tool results to API-compatible format (OpenAI function calling, Claude tool use, custom). @@ -63,4 +63,4 @@ ToolAnnotations { read_only, destructive, idempotent, open_world } Use `#[serial_test]` for approval workflow tests (shared state). -**Verify:** `cargo test -p mcp` +**Verify:** `cargo test -p smg-mcp` diff --git a/skills/implement/mesh-feature.md b/skills/implement/mesh-feature.md index c452eb2..d837186 100644 --- a/skills/implement/mesh-feature.md +++ b/skills/implement/mesh-feature.md @@ -5,7 +5,7 @@ SWIM gossip protocol with CRDT stores. Optional — only active with multiple ga ## Architecture ``` -mesh/src/ +crates/mesh/src/ service.rs → MeshServerBuilder, cluster state ping_server.rs → SWIM gossip (60KB), message batching sync.rs → MeshSyncManager, state reconciliation @@ -20,10 +20,10 @@ mesh/src/ ### Adding a New CRDT Store -1. Define CRDT type in `mesh/src/crdt_kv/` -2. Register in `StateStores` (`mesh/src/stores.rs`) -3. Add sync logic in `MeshSyncManager` (`mesh/src/sync.rs`) -4. Emit updates in gossip messages (`mesh/src/ping_server.rs`) +1. Define CRDT type in `crates/mesh/src/crdt_kv/` +2. Register in `StateStores` (`crates/mesh/src/stores.rs`) +3. Add sync logic in `MeshSyncManager` (`crates/mesh/src/sync.rs`) +4. Emit updates in gossip messages (`crates/mesh/src/ping_server.rs`) 5. Version with `version: u64` for causality tracking ### Adding a Cluster Integration diff --git a/skills/implement/multimodal-feature.md b/skills/implement/multimodal-feature.md index c88e9fa..cd6aec8 100644 --- a/skills/implement/multimodal-feature.md +++ b/skills/implement/multimodal-feature.md @@ -17,19 +17,20 @@ User message with image URL/data ### Adding a New Modality -1. Extend `Modality` enum and `ChatContentPart` in `multimodal/src/` +1. Extend `Modality` enum and `ChatContentPart` in `crates/multimodal/src/` 2. Add fetch method to media connector 3. Implement processing pipeline 4. Track with UUID for deduplication ### Adding a Vision Processor -**Directory:** `multimodal/src/vision/` +**Directory:** `crates/multimodal/src/vision/` 1. Implement processor trait (image → model-specific tensor format) 2. Handle resizing, normalization, placeholder insertion -3. Register in `ImageProcessorRegistry` -4. Add NPZ array comparison tests for output validation +3. Add per-model spec module in `crates/multimodal/src/registry/` (e.g. `mymodel.rs`) +4. Register in the registry's `mod.rs` +5. Add NPZ array comparison tests for output validation ### Adding a Media Source diff --git a/skills/implement/reasoning-parser.md b/skills/implement/reasoning-parser.md index 84557ae..1973785 100644 --- a/skills/implement/reasoning-parser.md +++ b/skills/implement/reasoning-parser.md @@ -140,7 +140,7 @@ This flag controls whether the parser assumes the first token is reasoning or no ## Step 1: Create parser file -**File:** `reasoning_parser/src/parsers/{MODEL_NAME}.rs` +**File:** `crates/reasoning_parser/src/parsers/{MODEL_NAME}.rs` Generate this file, substituting the 4 inputs: @@ -282,26 +282,26 @@ mod tests { } ``` -**Verify:** `cargo check -p reasoning_parser` +**Verify:** `cargo check -p reasoning-parser` ## Step 2: Register in module exports -**File:** `reasoning_parser/src/parsers/mod.rs` — add: +**File:** `crates/reasoning_parser/src/parsers/mod.rs` — add: ```rust pub mod {MODEL_NAME}; pub use {MODEL_NAME}::{ModelName}Parser; ``` -**File:** `reasoning_parser/src/lib.rs` — add to the `pub use parsers::{ ... }` block: +**File:** `crates/reasoning_parser/src/lib.rs` — add to the `pub use parsers::{ ... }` block: ```rust {ModelName}Parser, ``` -**Verify:** `cargo check -p reasoning_parser` +**Verify:** `cargo check -p reasoning-parser` ## Step 3: Register in factory -**File:** `reasoning_parser/src/factory.rs` — in `ParserFactory::new()`, add: +**File:** `crates/reasoning_parser/src/factory.rs` — in `ParserFactory::new()`, add: ```rust // Parser registration @@ -314,12 +314,12 @@ registry.register_pattern("{pattern-2}", "{MODEL_NAME}"); Pattern matching is **case-insensitive substring**: `model_id.to_lowercase().contains(pattern)`. -**Verify:** `cargo check -p reasoning_parser` +**Verify:** `cargo check -p reasoning-parser` ## Step 4: Run tests ```bash -cargo test -p reasoning_parser +cargo test -p reasoning-parser ``` All 7 tests in the new file plus all existing tests must pass. diff --git a/skills/implement/storage-backend.md b/skills/implement/storage-backend.md index 7358e0f..5ab49fc 100644 --- a/skills/implement/storage-backend.md +++ b/skills/implement/storage-backend.md @@ -6,7 +6,7 @@ ### Step 1: Create module -**Directory:** `data_connector/src/mybackend/` +**Directory:** `crates/data_connector/src/mybackend/` Implement all storage trait methods with consistent behavior across operations. diff --git a/skills/implement/tool-parser.md b/skills/implement/tool-parser.md index c3ecc66..141fa20 100644 --- a/skills/implement/tool-parser.md +++ b/skills/implement/tool-parser.md @@ -104,7 +104,7 @@ Observe the raw format before the API normalizes it. ## Step 1: Create parser file -**File:** `tool_parser/src/parsers/{PARSER_NAME}.rs` +**File:** `crates/tool_parser/src/parsers/{PARSER_NAME}.rs` For the most common case — **JSON with tags** — generate this template: @@ -251,26 +251,26 @@ impl ToolParser for {ParserName}Parser { } ``` -**Verify:** `cargo check -p tool_parser` +**Verify:** `cargo check -p tool-parser` ## Step 2: Register in module exports -**File:** `tool_parser/src/parsers/mod.rs` — add: +**File:** `crates/tool_parser/src/parsers/mod.rs` — add: ```rust pub mod {PARSER_NAME}; pub use {PARSER_NAME}::{ParserName}Parser; ``` -**File:** `tool_parser/src/lib.rs` — add to the `pub use parsers::{ ... }` block: +**File:** `crates/tool_parser/src/lib.rs` — add to the `pub use parsers::{ ... }` block: ```rust {ParserName}Parser, ``` -**Verify:** `cargo check -p tool_parser` +**Verify:** `cargo check -p tool-parser` ## Step 3: Register in factory -**File:** `tool_parser/src/factory.rs` +**File:** `crates/tool_parser/src/factory.rs` In `ParserFactory::new()`: ```rust @@ -285,11 +285,11 @@ registry.map_model("{model-pattern-2}*", "{PARSER_NAME}"); Pattern matching uses **glob wildcards** (`*` matches any characters). -**Verify:** `cargo check -p tool_parser` +**Verify:** `cargo check -p tool-parser` ## Step 4: Write tests -**File:** `tool_parser/tests/tool_parser_{PARSER_NAME}.rs` +**File:** `crates/tool_parser/tests/tool_parser_{PARSER_NAME}.rs` ```rust mod common; diff --git a/skills/implement/wasm-plugin.md b/skills/implement/wasm-plugin.md index d454015..19d8f37 100644 --- a/skills/implement/wasm-plugin.md +++ b/skills/implement/wasm-plugin.md @@ -21,7 +21,7 @@ Wasmtime component model with WIT interface. Plugins intercept requests/response ### Step 1: Define types in WIT (if new interface needed) -**File:** `wasm/src/interface/spec.wit` +**File:** `crates/wasm/src/interface/spec.wit` ```wit interface middleware-types { @@ -33,23 +33,23 @@ interface middleware-types { ### Step 2: Add attachment point (if new hook) -**File:** `wasm/src/module.rs` +**File:** `crates/wasm/src/module.rs` Add to `MiddlewareAttachPoint` enum. ### Step 3: Implement handler matching -**File:** `wasm/src/runtime.rs` +**File:** `crates/wasm/src/runtime.rs` Match the new attachment point and execute WASM module. ### Step 4: Update module validation -**File:** `wasm/src/module_manager.rs` +**File:** `crates/wasm/src/module_manager.rs` ### Step 5: Write example guest plugin -**Directory:** `wasm/examples/` +**Directory:** `crates/wasm/examples/` ```rust wit_bindgen::generate!({ world: "smg" }); diff --git a/skills/map/SKILL.md b/skills/map/SKILL.md index 18b2efa..f8b0afe 100644 --- a/skills/map/SKILL.md +++ b/skills/map/SKILL.md @@ -25,23 +25,26 @@ High-performance Rust gateway for LLM inference backends. Routes requests to wor | `tool_parser` | 13+ tool call parsers (JSON, Mistral, Qwen, DeepSeek, Pythonic, etc.). Streaming with incremental JSON | `ToolParser` trait, `ParserFactory`, `StreamingParseResult` | | `reasoning_parser` | Reasoning extraction from 10+ model families (DeepSeek-R1, Qwen3, Kimi, Cohere). Streaming | `ReasoningParser` trait, `ParserFactory`, `ParserResult` | | `tokenizer` | LLM tokenization, chat templates | `Tokenizer` | -| `multimodal` | Image/audio processing. Vision processors (LLaVA, LLaVA-Next), media fetching | `ImageFrame`, `MultiModalInputs`, `ChatContentPart` | +| `multimodal` | Image/audio processing. Per-model vision specs (LLaVA, Qwen-VL, Llama4, Phi3-V), media fetching | `ImageFrame`, `ChatContentPart`, `MediaConnector` | | `workflow` | Step-based async workflow engine (wfaas) | `StepExecutor`, `WorkflowContext` | | `bindings/python` | PyO3 bindings. `Router` class with ~80 constructor params, enum mapping | `Router`, `PolicyType` | | `bindings/golang` | Go SDK via FFI (cgo). OpenAI-style API, streaming, tool calling | `Client`, `ChatCompletionRequest` | | `clients/rust` | Rust client library | | +| `grpc_servicer` | Python gRPC servicer wrapping vLLM/SGLang backends | | ## Layering Rule ``` -protocols (shared types — ALL consumers) +crates/protocols (shared types — ALL consumers) ↑ model_gateway (implementation — ONE consumer writes each field) ↑ bindings/* (language SDKs — wrap model_gateway + protocols) ``` -**Iron law**: If only one crate writes a field, it doesn't belong in `protocols/`. K8s-specific, runtime-specific, or gateway-specific fields stay in `model_gateway`. +**Directory layout**: Library crates live under `crates/` (e.g. `crates/mcp/`, `crates/mesh/`). `model_gateway/` and `bindings/` remain at repo root. + +**Iron law**: If only one crate writes a field, it doesn't belong in `crates/protocols/`. K8s-specific, runtime-specific, or gateway-specific fields stay in `model_gateway`. ## Config Propagation (3-Stage) @@ -61,6 +64,9 @@ ServiceDiscoveryConfig / ServerConfig — typed, runtime Client → HTTP/gRPC handler → Auth middleware → WASM OnRequest → Routing policy selects worker → Proxy to backend → Stream response → Tool/reasoning parsing → WASM OnResponse → Client + +Realtime (WebSocket): +Client → WS upgrade → Realtime session registry → Proxy to backend WS ``` ## Worker Lifecycle (5-Step Workflow) diff --git a/skills/review-pr/SKILL.md b/skills/review-pr/SKILL.md index 587822f..721f229 100644 --- a/skills/review-pr/SKILL.md +++ b/skills/review-pr/SKILL.md @@ -38,15 +38,15 @@ Do NOT write review comments, approve, or provide feedback until you have: | Files Changed | Review Sections | |---------------|-----------------| -| `protocols/src/` | 1 (Layering), 3 (Worker Lifecycle) | +| `crates/protocols/src/` | 1 (Layering), 3 (Worker Lifecycle) | | `model_gateway/src/config/` | 2 (Config Plumbing) | | `model_gateway/src/main.rs` | 2 (Config Plumbing) | | `model_gateway/src/service_discovery.rs` | 3 (Worker Lifecycle) | | `model_gateway/src/core/steps/worker/` | 3 (Worker Lifecycle) | | `model_gateway/src/core/routing/` | 4 (Routing Policy) | -| `tool_parser/src/` | 5 (Parser Changes) | -| `reasoning_parser/src/` | 5 (Parser Changes) | -| `data_connector/src/` | 6 (Storage) | +| `crates/tool_parser/src/` | 5 (Parser Changes) | +| `crates/reasoning_parser/src/` | 5 (Parser Changes) | +| `crates/data_connector/src/` | 6 (Storage) | | `bindings/` | 2 (Config Plumbing) | | Any file | 7 (Error Handling), 8 (Testing), 9 (Code Quality) | @@ -56,7 +56,7 @@ Sections 7, 8, 9 always apply. Section 10 applies to PRs touching 3+ files or ad ### 1. Layering & Separation of Concerns -- [ ] No new fields in `protocols/` types that only one crate sets +- [ ] No new fields in `crates/protocols/` types that only one crate sets - [ ] Config types at correct layer: user-facing → `config/types.rs`, runtime → module-specific - [ ] No raw strings parsed at runtime — parse at boundary - [ ] WASM/MCP concerns stay in their crates, not leaking into core @@ -111,8 +111,8 @@ Sections 7, 8, 9 always apply. Section 10 applies to PRs touching 3+ files or ad - [ ] Unit tests for new types/parsing including error cases - [ ] Integration test for full flow - [ ] Existing test struct literals updated with new fields -- [ ] E2E tests if user-facing behavior changes -- [ ] Thread-unsafe tests marked `@pytest.mark.thread_unsafe` +- [ ] E2E tests if user-facing behavior changes (in `e2e_test/` — tests run sequentially with class-scoped backends) +- [ ] E2E test markers set: `@pytest.mark.engine(...)`, `@pytest.mark.gpu(count)`, `@pytest.mark.model(...)` as needed ### 9. Code Quality @@ -126,7 +126,7 @@ Sections 7, 8, 9 always apply. Section 10 applies to PRs touching 3+ files or ad ### 10. Architecture Smell Tests -- "If I remove K8s, does this change still make sense?" → shouldn't be in `protocols/` +- "If I remove K8s, does this change still make sense?" → shouldn't be in `crates/protocols/` - "Can existing config overrides or labels achieve this?" → may be unnecessary - "Does this compose with DP-aware mode, PD disagg, mesh HA?" → don't break existing - "Is this Send + Sync safe under concurrent load?" → all routing state thread-safe diff --git a/skills/review-pr/anti-patterns.md b/skills/review-pr/anti-patterns.md index d3488b7..68f02d3 100644 --- a/skills/review-pr/anti-patterns.md +++ b/skills/review-pr/anti-patterns.md @@ -14,9 +14,9 @@ Per-subsystem anti-patterns to check during PR review. | Anti-Pattern | Consequence | What to Look For | |-------------|-------------|------------------| -| Adding `_override` field to WorkerSpec | Bypasses label pipeline, creates parallel data path | New fields on `WorkerSpec` in `protocols/src/worker.rs` | +| Adding `_override` field to WorkerSpec | Bypasses label pipeline, creates parallel data path | New fields on `WorkerSpec` in `crates/protocols/src/worker.rs` | | Post-hoc ModelCard mutation | Race conditions, stale data in routing | `model_card.model_id = ...` after `build_model_card()` | -| Injecting K8s-specific data into `protocols/` types | Tight coupling to K8s, breaks non-K8s deployments | New fields in `protocols/` that reference namespaces, pods, labels | +| Injecting K8s-specific data into `crates/protocols/` types | Tight coupling to K8s, breaks non-K8s deployments | New fields in `crates/protocols/` that reference namespaces, pods, labels | ## Routing