Skip to content

feat/name alias#122

Open
jlapenna wants to merge 2 commits intospark-arena:developfrom
jlapenna:feat/name-alias
Open

feat/name alias#122
jlapenna wants to merge 2 commits intospark-arena:developfrom
jlapenna:feat/name-alias

Conversation

@jlapenna
Copy link
Copy Markdown
Contributor

@jlapenna jlapenna commented Apr 8, 2026

  • feat: allow aliasing container instances via --name flag
  • test: add test suite for cluster_id name alias and executor argument extraction

@dbotwinick dbotwinick changed the base branch from main to develop April 9, 2026 03:47
@dbotwinick
Copy link
Copy Markdown
Contributor

Is this a feature you use/need? I guess it could be convenient in some circumstances, but it could break some other things rather easily if used inconsistently and/or without care by the user. (e.g. creating conflicting IDs). That was why I noted it in other PR. Do you use this or want to use this for convenience?

@jlapenna
Copy link
Copy Markdown
Contributor Author

jlapenna commented Apr 9, 2026

For context on why this --name alias flag is architecturally necessary:

We cannot rely solely on Docker labels for service discovery. Docker's internal/embedded DNS server (127.0.0.11) maps traffic using the container's name (or its network aliases), not its labels.

If sparkrun falls back to its default behavior of generating a random cluster UUID (e.g., sparkrun-42adf-solo), our internal components face the following issues:

  1. Network Discovery: The vllm-gateway (LiteLLM proxy) is hard-configured to route requests to backend models via predictable hostnames like http://main_solo:8001 or http://embedding_solo:8002. Random container names completely break this Layer 7 routing.
  2. Health Checks / Tooling: The vllm-progress-manager and scripts.verify E2E suites explicitly rely on predictable container names to find the engines, ping health probes, and scrape initialization status. Custom --name aliasing ensures these dependencies can find their targets.

While we do use --label extensively for Layer 8 telemetry (telling Prometheus/Cadvisor to scrape metrics for sparkrun.role=main), the explicit --name alias is strictly required to hold the actual network routing together.

@dbotwinick
Copy link
Copy Markdown
Contributor

convinced. I'm also curious to know more specifically about how you're using it!

then let's do --container-name and make it hidden; I think that I'll have to add some docs for something like Advanced Automation and then document a bunch of these hidden flags, but I think they're noise to the majority of the userbase -- so I rather not expose every option by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants