feat: extend capability gate to all task types#76
Merged
Conversation
#74 gated dispatch on a single SSH capability. Generalize it: each Executor declares the task types it serves, the worker advertises the union over the executors it initialized as WorkerCapabilities.supported_task_types, and the dispatcher routes a task only to workers advertising its type. A worker missing an executor (e.g. no Docker for SSH, or absent GPU/training deps) is no longer handed a task it would fail inside the runner. Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
4eeff4d to
e73f90f
Compare
A vLLM-backed inference task could previously be accepted and then fail late in the runner. Classify the resolved backend once (InferenceBackend: TRANSFORMERS / VLLM / AUTO), shared by the worker's executor selection and the server parser, and reject at submit: a vLLM backend without a GPU, an enforce_cpu task that also configures vLLM (silently ignored otherwise), and an adapter that specifies neither path nor url. Templated enforce_cpu placeholders defer, and an unhinted model (AUTO) is left alone since the runner falls back to the transformers executor. Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
timzsu
requested changes
Jun 19, 2026
Generalize the inference checks into TaskSpec.validate_dispatchable() (a no-op on the base spec, overridden by inference), called both at submit (parser) and again after placeholder resolution at dispatch on the resolved strict spec. Submit-time validation defers when enforce_cpu is an unresolved template placeholder, so re-checking post-resolution catches a templated task that resolves into an invalid combination instead of failing late in the runner. Also accept an adapter's task_id (upstream-artifact reference) as a valid source, matching the vLLM LoRA executor. Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
timzsu
approved these changes
Jun 19, 2026
timzsu
left a comment
Collaborator
There was a problem hiding this comment.
LGTM. Please merge after the CI passes
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Implements the follow-up noted in #74. That PR added a capability gate for the single SSH case; every other task type was still selected on hardware fit alone, so a worker that lacks an executor (no Docker for SSH, absent GPU/training/omni dependencies, or an init failure) remained a candidate for that type and the task failed inside the runner — falling back to the default executor or hitting a spec-type mismatch — instead of being routed to a worker that can actually serve it. This generalizes the gate to all task types so the dispatcher never hands a worker a task its executors can't run.
It also closes a related gap raised in review: a single
inferencetaskType maps to multiple backends (vLLM vs HF transformers) chosen at runtime, which the capability gate alone can't tell apart. The server now validates inference specs at submit — and again at dispatch once template placeholders resolve — so a vLLM-only task can't slip onto a worker that would fail it.Changes
WorkerCapabilitiescarriessupported_task_types(replacing the SSH-onlysshflag), surfaced on the SDKWorkermodel soflowmesh worker listshows each worker's serviceable types.Executordeclares the task types it serves; the worker advertises the union over the executors that initialized.capability_satisfiesgates ontaskType ∈ supported_task_types, replacing the SSHisinstancespecial-case.InferenceBackendclassifier (transformers/vllm/auto) drives the worker's executor selection and aTaskSpec.validate_dispatchable()hook that rejects a vLLM-backed inference task without a GPU,enforce_cpucombined with a vLLM config, and an adapter that specifies none ofpath/url/task_id. The hook runs at submit and again pre-dispatch on the resolved spec.Design
Capability is derived from the worker's initialized executors rather than config: an executor whose runtime dependency is missing is never constructed, so the advertised set reflects what the worker can actually run. Each executor declares its task types, the worker advertises the union, and the dispatcher does a single membership test. A worker reporting an empty set is treated as incapable, so during a rolling upgrade a task waits for an upgraded worker rather than risking a wrong dispatch.
For inference, the capability gate stays at
taskTypegranularity; the GPU/CPU split is handled separately. TheInferenceBackendclassifier is the single source of truth for both the worker's executor selection and the validation, so the two can't drift. Only an explicit vLLM backend (avllmconfig or adapters) is treated as GPU-required — the transformers executor runs on either device, and the unhintedautocase falls back to transformers, so neither is gated.Spec validation lives on a
validate_dispatchable()hook on the spec base — a no-op by default, overridden by inference for its own invariants. The parser runs it at submit (where a templatedenforce_cpudefers, since the backend can't be determined until it resolves) and the dispatcher runs it again on the fully-resolved strict spec, so a deferred case is caught once resolved rather than failing late in the runner. The hook is scoped to spec-internal invariants; config-dependent checks like SSH access-mode (which depend on server feature flags) stay server-side.Test Plan
Two end-to-end suites on a freshly built stack (
FLOWMESH_VERSION=cap-gate):ppotask with no GPU hardware requirement fails fast (excluded on capability though its hardware fits) while a hardware-identicalechotask runs; on a mixed CPU+GPU fleet theppotask routes to the GPU worker.enforce_cpu+ vLLM task, and an adapter lacking a source are each rejected at submit; a vLLM task that declares a GPU is accepted.Test Result
ppotask fast (never dispatched, no executor error) and ran theechocontrol toDONE, and the mixed fleet routed theppotask to the GPU worker, never the idle CPU one. The CPU worker advertisedinference/embeddingdespite its default executor being subprocess-wrapped, confirming the wrapped-executor path is accounted for.Pre-submission Checklist
pre-commiton the changed files and fixed any issues.uv sync --all-packages --group ci --frozen).[BREAKING]and described migration steps above.