Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 81 additions & 0 deletions docs/proposals/runconfig-as-architectural-boundary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# `RunConfig` as Architectural Boundary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment worth noting: There are a bunch of fields in the run config right now which are only needed for backwards compatibility. We should clean those up as part of stabilization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: we have a VersionSchema field in the RunConfig struct, we should leverage that.
Action Item: list fields that are either no longer used or not desired anymore.


## Problem Statement

As of the time of this writing, ToolHive has a few interfaces and as released as part of three user facing
systems, namely

* **ToolHive CLI** `thv`
* **ToolHive Operator**, which is used in both Kubernetes and OpenShift
* **ToolHive UI**, which internally uses APIs exposed by `thv serve`

These interfaces differ in their execution environment and "quality of life" features that one might implement. For example, one obvious difference is the way configuration is accessed: CLI must both accept CLI options and read files on file system, while the Kubernetes Operator requires all parameters to be fully specified in the main CRD or into "linked" ones (e.g. `MCPToolConfig`). Another one is in config reload semantics, which must implemented in an ad-hoc fashion for CLI, while Kubernetes handles it as part of the life cycle of resources. Yet another example is the location of runtime configuration and state. The Kubernetes Operator relies on both being stored "inside the cluster", while the CLI and UI must both rely on file system, yet the configuration might be semantically equivalent. A final useful use case is exporting `RunConfig` so that the same workload can be "moved" to a different place or shipped as configuration. A similar use case is that implemented by the `thv restart` command, which fetches the serialized version of a run config to restart the workload when necessary.
Copy link
Member

@dmjb dmjb Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth noting that in the UI/API case - the preference has been to specify the payload ("literal" as this doc uses) of the configuration instead of a path. This allows the payload to be constructed by a UI form.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: there's general agreement on the fact that HTTP API must not contain references to local files.
Action Item: this is a bit outside of the RunConfig design doc, but we should document it regardless.


These differences warrant the introduction of an architectural boundary where workloads are executed, which is currently specified via what we call a RunConfig.

## Goals

* make RunConfig the entrypoint for workloads execution
* stabilize RunConfig so that new user or application interfaces can be built upon it
* clearly specify responsibilities of code above and below this new boundary

## Responsibilities of a RunConfig

A `RunConfig` struct contains either information necessary for a OCI-compatible runner to run a workload, or alternatively the remote URL at which an already running MCP is reachable. In both scenarios, ToolHive aims to "wrap" the workload in a proxy to handle auth and gather telemetry.

Finally, `RunConfig` are currently serialized as JSON and used by `thv restart` command.

## Current Interface

Conceptually, a `RunConfig` contains three things
* details on how to run or reach the Workload
* configuration for the Proxy itself
* Metadata like name pertaining both components that allow ToolHive to refer to them as one

**Metadata details** amount to
* name
* group
* schema version
* debug settings, common to both proxy and workload

**Workload details** for local Workloads amount to
* OCI-compatible container config (image, its command arguments, container name, etc...)
* desired workload name
* host and port to expose
* environment variables to set (literal or file-based)
* secrets to set
* volumes to mount
* container labels
* Kubernetes pod template patch (Kubernetes specific)
* network isolation flag

While when the Workload is remote, details are
* remote URL
* auth configuration

**Proxy details** amount to
* host and port to expose
* workload transport type
* permission profile (literal or file-based)
* OIDC configuration parameters
* authorization config (literal or file-based)
* audit config (literal or file-based)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these are deprecated.

* proxy headers trust flag
* proxy mode (i.e. transport to expose)
* CA bundle
* JWKS token file
* tools config
* IgnoreConfig (?)
* middleware configuration settings

## Responsibility Split

We propose the following split in responsibilities

**RunConfig** data structure and routines will be responsible for holding configuration parameters, basic validation, and serialization, but not storage. Simply put, the package should accept bytes and readers, and return bytes, similarly to how `encoding/json` works.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: currently, validation for middleware configs is done late in the process and should be done at least where it's currently done as well as when instantiating a RunConfig.
Action Item: document here where validation is implemented and where validation routines should be invoked.


**CLI** and **Operator** will be responsible for mapping their respective representation of configuration parameters to the representation allowed by `RunConfig`. Specifically, no file-based representation of configuration parameters is allowed in the `RunConfig` struct.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could we have a simple diagram here of the types? Picture paints a thousand words in that sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action Item: list users of RunConfig structs and how they represent information that ends up in the run config.


Consequences of changes to configuration parameters must be managed outside the `RunConfig` code. For example, configuration reload for the CLI must be managed within CLI commands, and not within the `Runner` or `RunConfig`. That said, `Runner`s can implement behaviors specific to their execution environment, but they must not rely on references to the "outside world" being stored in the `RunConfig`.

Types exposed by `RunConfig` package should not be used for externally facing formats like HTTP API if not for trivial cases. In case diverging becomes necessary, said types won't be modified (to avoid breaking changes) and CLI/Operator must expose their own type that is then mapped to the `RunConfig` equivalent.
Copy link
Collaborator

@ChrisJBurns ChrisJBurns Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is it worth mentioning that a decorator can be used here? Not sure if we want to mention specific terms right now, but it may help somewhat keep custom types following a certain pattern

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action Item: I need to rephrase this mentioning "API contracts", this should clarify what I meant.

Loading