Skip to content

feat(compose): optional gpu-coordinator-proxy + ollama services for VRAM contention#650

Open
nnnet wants to merge 1 commit into
builderz-labs:mainfrom
nnnet:pr/gpu-coordinator-proxy
Open

feat(compose): optional gpu-coordinator-proxy + ollama services for VRAM contention#650
nnnet wants to merge 1 commit into
builderz-labs:mainfrom
nnnet:pr/gpu-coordinator-proxy

Conversation

@nnnet
Copy link
Copy Markdown

@nnnet nnnet commented May 5, 2026

Summary

Adds two optional docker compose services for operators running both LMStudio and Ollama on a single GPU:

  • gpu-coordinator-proxy — fronts LMStudio (:1234) and Ollama (:11434) on alternative ports (:1235, :11435) with a shared VRAM lock. Before forwarding to either backend it unloads everything from the other runtime; with GPU_FREE_STRATEGY=wipe-all it can also evict everything from the requested runtime itself for cold-start determinism.
  • ollama — official ollama/ollama:latest image with a named volume for model storage, exposing :11434/v1.

The proxy lives in its own repo at https://github.com/nnnet/gpu-coordinator-proxy and is cloned next to MC as ./gpu-coordinator-proxy-src/ (gitignored — the sibling clone is not part of MC's tree). The build.context in the compose file points at that sibling clone.

Why this is useful upstream

Operators running both runtimes on a single dev machine would otherwise have to manually unload models between provider switches. With the proxy, MC and the gateway just point lmstudio.baseUrl=http://host.docker.internal:1235/v1 and ollama.baseUrl=http://host.docker.internal:11435/v1 and the contention is invisible. The 5/6 swap matrix tested cleanly on a 24 GB RTX 5090.

The change is purely additive — omit the two service blocks in docker-compose-openclaw.yml to keep the existing direct-LMStudio / direct-Ollama paths unchanged.

Test plan

  1. Clone the proxy: git clone https://github.com/nnnet/gpu-coordinator-proxy gpu-coordinator-proxy-src (next to mission-control/).
  2. make up gpu-coordinator-proxy ollama — both services healthy.
  3. Pull a model: docker exec mc-ollama ollama pull gpt-oss:20b.
  4. Point an MC agent at http://127.0.0.1:1235/v1 (LMStudio) and another at http://127.0.0.1:11435/v1 (Ollama).
  5. Dispatch tasks alternating between the two; observe in proxy logs that the other runtime gets evicted on each switch.

Dependencies

Provenance

Squashes our fork's commit:

  • 3249ee7 feat(compose): add gpu-coordinator-proxy service from sibling repo clone

…RAM contention

## Summary

Adds two **optional** docker compose services for operators running both LMStudio and Ollama on a single GPU:

- **`gpu-coordinator-proxy`** — fronts LMStudio (`:1234`) and Ollama (`:11434`) on alternative ports (`:1235`, `:11435`) with a shared VRAM lock. Before forwarding to either backend it unloads everything from the *other* runtime; with `GPU_FREE_STRATEGY=wipe-all` it can also evict everything from the requested runtime itself for cold-start determinism.
- **`ollama`** — official `ollama/ollama:latest` image with a named volume for model storage, exposing `:11434/v1`.

The proxy lives in its own repo at https://github.com/nnnet/gpu-coordinator-proxy and is cloned next to MC as `./gpu-coordinator-proxy-src/` (gitignored — the sibling clone is not part of MC's tree). The `build.context` in the compose file points at that sibling clone.

## Why this is useful upstream

Operators running both runtimes on a single dev machine would otherwise have to manually unload models between provider switches. With the proxy, MC and the gateway just point `lmstudio.baseUrl=http://host.docker.internal:1235/v1` and `ollama.baseUrl=http://host.docker.internal:11435/v1` and the contention is invisible. The 5/6 swap matrix tested cleanly on a 24 GB RTX 5090.

The change is purely additive — omit the two service blocks in `docker-compose-openclaw.yml` to keep the existing direct-LMStudio / direct-Ollama paths unchanged.

## Test plan

1. Clone the proxy: `git clone https://github.com/nnnet/gpu-coordinator-proxy gpu-coordinator-proxy-src` (next to `mission-control/`).
2. `make up gpu-coordinator-proxy ollama` — both services healthy.
3. Pull a model: `docker exec mc-ollama ollama pull gpt-oss:20b`.
4. Point an MC agent at `http://127.0.0.1:1235/v1` (LMStudio) and another at `http://127.0.0.1:11435/v1` (Ollama).
5. Dispatch tasks alternating between the two; observe in proxy logs that the other runtime gets evicted on each switch.

## Dependencies

- Touches `docker-compose-openclaw.yml` — same file as builderz-labs#649 (OpenClaw integration). Reviewers can land in either order; later one rebases.

## Provenance

Squashes our fork's commit:
- `3249ee7 feat(compose): add gpu-coordinator-proxy service from sibling repo clone`
@nnnet nnnet requested a review from 0xNyk as a code owner May 5, 2026 21:50
Copy link
Copy Markdown

@0xbrainkid 0xbrainkid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

  • Blocking: the optional GPU proxy is included as an always-on service with a required local build context, and the compose file also carries a machine-specific OpenClaw state path.

Strengths

  • The proxy/ports/eviction-strategy comments are clear and explain the intended LMStudio/Ollama topology well.
  • Keeping Ollama model data in a named volume is the right operational default.

Issues

  • [BLOCKING] gpu-coordinator-proxy is part of the default compose graph but builds from ./gpu-coordinator-proxy-src, which is intentionally gitignored and absent on a normal clone. docker compose up / make up will fail unless the operator manually clones the sibling repo or edits the compose file, even though the comments call the proxy optional. Please put this service behind a profile (for example profiles: ["gpu-proxy"]) or split it into an optional override compose file so the default OpenClaw stack starts without the external repo.
  • [BLOCKING] Same portability issue as the OpenClaw integration compose: OPENCLAW_STATE_DIR, OPENCLAW_PLUGIN_STAGE_DIR, OPENCLAW_CONFIG_PATH, and the state bind mount hard-code /mnt/9/gt/rig_PlatformsAI/mayor/rig/beads/discovered/mission-control/.openclaw-data. That path will be wrong for other clones/hosts. Please make it configurable/generated locally rather than committed.

Questions

  • Should Ollama also be profile-gated, or is it intended to be part of the default OpenClaw stack even for cloud-only users?

Verification

  • gh pr checks 650: no checks reported.
  • docker compose -f docker-compose-openclaw.yml config --quiet: passed syntax/config rendering, but does not validate that the optional build context exists or that the committed absolute path is portable.

@0xNyk
Copy link
Copy Markdown
Member

0xNyk commented May 7, 2026

Thanks — holding this one until #649 lands. They both modify docker-compose-openclaw.yml (this PR adds the gpu-coordinator-proxy and ollama services to the file that #649 introduces), so the merge order matters and the diff is currently hard to evaluate in isolation.

Once #649 lands, please rebase this onto the new main so the diff shows just the GPU-coordinator delta (the actual ~600 lines you're adding), not the full file. That'll make this trivially reviewable.

One question while you're rebasing:

The proxy service description mentions GPU_FREE_STRATEGY=wipe-all for cold-start determinism — that strategy evicts everything from the requested runtime as well, which on shared LMStudio/Ollama setups would kill any other concurrent workload. Could you add a note in the compose comment that operators on multi-user GPU hosts should leave it on the default wipe-other strategy? Just so the trade-off is visible at the spot where someone enables the flag.

Tagging as blocked on #649 for now. The technical content here looks sensible — VRAM contention is a real problem and a coordinator proxy is the right shape for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants