`RunConfig` as architectural boundary proposal #2096

blkt · 2025-10-06T16:30:39Z

This change contains a design doc describing an architectural boundary at RunConfig level. Aim of this design doc is

listing the current responsibilities of RunConfig and related code (e.g. Runner)
propose a new split of responsibilities and a clear golang API for it
clarify what responsibilities should be moved in the respective user interfaces, e.g. CLI or Operator

codecov · 2025-10-06T16:35:44Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 48.31%. Comparing base (d9498fe) to head (fb9c334).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2096      +/-   ##
==========================================
- Coverage   48.33%   48.31%   -0.02%     
==========================================
  Files         238      238              
  Lines       30034    30034              
==========================================
- Hits        14518    14512       -6     
- Misses      14418    14423       +5     
- Partials     1098     1099       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

This change contains a design doc describing an architectural boundary at `RunConfig` level. Aim of this design doc is * listing the current responsibilities of `RunConfig` and related code (e.g. `Runner`) * propose a new split of responsibilities and a clear golang API for it * clarify what responsibilities should be moved in the respective user interfaces, e.g. CLI or Operator

JAORMX · 2025-10-07T08:13:52Z

@claude please review this architectural proposal

claude · 2025-10-07T08:14:06Z

Claude finished @JAORMX's task —— View job

Architectural Proposal Review

I've reviewed the RunConfig as architectural boundary proposal. Here's my comprehensive feedback:

Overall Assessment

This is a well-structured and necessary proposal that addresses real architectural concerns in ToolHive. The separation of concerns between RunConfig and the consumer interfaces (CLI, Operator, UI) is architecturally sound and will improve maintainability and extensibility.

Strengths

Clear Problem Statement: The proposal effectively articulates the current challenges with configuration handling across different interfaces (CLI, Operator, UI).
Practical Examples: The use cases (config reload semantics, storage location differences, export/import) clearly demonstrate why this boundary is needed.
Appropriate Scope: Making RunConfig analogous to encoding/json (accepting bytes/readers, returning bytes) is the right level of abstraction.
Current State Documentation: Lines 22-70 provide an excellent inventory of what RunConfig currently contains, split into metadata, workload details, and proxy details.

Areas for Improvement

1. Implementation Guidance Missing

The proposal describes what should change but lacks how to achieve it. Consider adding:

Migration Strategy: How do we move from the current state to the proposed state?
- Which fields need to be removed from RunConfig?
- What's the deprecation timeline for file-based parameters?
- How do we handle backward compatibility?

API Surface Definition: What should the new public API look like?

// Example:
// Instead of exposing RunConfig directly, consider:
type RunConfig struct { /* ... */ }
func NewRunConfig(opts ...Option) (*RunConfig, error)
func (rc *RunConfig) Validate() error
func (rc *RunConfig) MarshalJSON() ([]byte, error)
func UnmarshalRunConfig(data []byte) (*RunConfig, error)

2. Concrete Examples of Violations

The proposal would benefit from specific examples of current violations:

Current problematic fields in RunConfig (from pkg/runner/config.go:34-162):

EnvFileDir (line 88) - File system reference
PermissionProfileNameOrPath (line 79) - File path reference
AuthzConfigPath (line 107) - File path reference
AuditConfigPath (line 113) - File path reference
JWKSAuthTokenFile (line 145) - File path reference
Deployer (line 127) - Runtime-specific dependency

These should be called out explicitly as fields that violate the proposed boundary.

3. Validation Responsibility Unclear

Line 75 mentions "basic validation" as a RunConfig responsibility, but doesn't define what qualifies as "basic":

Suggested clarification:

Basic validation (RunConfig's responsibility): Structural validity, required fields present, type constraints
Contextual validation (CLI/Operator responsibility): File existence, network availability, secret references, registry lookups

4. Serialization Format Not Specified

Line 75 mentions serialization but doesn't specify:

What formats should be supported? (JSON only? YAML? Both?)
Should the format be versioned? (The struct has SchemaVersion but no migration strategy)
What happens when SchemaVersion changes?

5. Missing Discussion of `Runner` Responsibilities

The proposal mentions Runner in the title and problem statement but doesn't clearly define its responsibilities post-refactor. Consider adding:

Proposed Runner Responsibilities:

Accept a validated RunConfig
Execute workloads using the provided deployer
Manage workload lifecycle (start, stop, monitor)
Handle middleware chain construction
NOT responsible for: config file reading, secret fetching, registry lookups

6. Impact on Existing Code Not Assessed

The proposal doesn't mention:

How many call sites need updates?
What's the blast radius of this change?
Which packages are affected?

Affected areas (from my analysis):

cmd/thv/app/run.go - CLI configuration building
cmd/thv-operator/controllers/mcpserver_runconfig.go - Operator CRD mapping
pkg/runner/config_builder.go - Builder pattern needs updating
pkg/state/runconfig.go - Storage abstraction

7. "No File-Based Representation" Needs Clarification

Line 77 states: "no file-based representation of configuration parameters is allowed in the RunConfig struct"

Question: Does this mean:

❌ No file paths (like EnvFileDir, AuthzConfigPath)?
❌ No references to external files at all?
✅ File contents should be resolved by CLI/Operator before creating RunConfig?

This should be explicit with before/after examples:

// ❌ Current (bad)
type RunConfig struct {
    AuthzConfigPath string
}

// ✅ Proposed (good)  
type RunConfig struct {
    AuthzConfig *authz.Config // already resolved
}

8. External API Stability Concern

Line 81 mentions: "Types exposed by RunConfig package should not be used for externally facing formats like HTTP API"

Concern: The current RunConfig struct (pkg/runner/config.go:34-162) has a comment stating it's "part of ToolHive's API contract" (line 36-37). This proposal doesn't address:

Is RunConfig JSON format considered a public API?
How do we version it if we need breaking changes?
What's the compatibility guarantee?

Specific Technical Concerns

1. Storage Abstraction Already Exists

The proposal treats storage as CLI/Operator responsibility, but pkg/state/runconfig.go already provides a storage abstraction layer. How does this fit into the new design?

Recommendation: Clarify whether pkg/state is above or below the boundary.

2. Current Builder Pattern May Need Redesign

pkg/runner/config_builder.go currently handles file resolution during building. With the proposed changes, this logic moves to CLI/Operator, but the builder pattern might still be useful for constructing valid RunConfig instances.

Recommendation: Provide guidance on whether builders should exist and where.

3. Middleware Configuration Complexity

The MiddlewareConfigs field (pkg/runner/config.go:159-161) is complex and touches both proxy configuration and execution environment. The proposal doesn't clarify if/how middleware configuration should be handled.

Missing Sections

Consider adding:

Non-Goals: What are we explicitly NOT trying to solve?
- Are we keeping backward compatibility for existing serialized configs?
- Are we refactoring the Operator CRD structure?
Success Criteria: How do we know when this is complete?
- All file path references removed from RunConfig?
- CLI and Operator can construct RunConfig without accessing filesystem during Runner.Run()?
- Configuration reload is handled at CLI/Operator level?
Testing Strategy: How do we validate the boundary is maintained?
- Linter rules to prevent file paths in RunConfig?
- Interface contracts to enforce separation?
Timeline: When do we want this implemented?

Recommendations

Priority 1 (Critical for proposal acceptance):

Add concrete examples of fields that violate the current boundary
Define what "basic validation" means
Clarify the "no file-based representation" rule with examples
Add migration strategy and backward compatibility approach

Priority 2 (Important for implementation):

Define the new public API surface
Clarify Runner responsibilities post-refactor
Specify serialization format and versioning strategy
Assess impact on existing code

Priority 3 (Nice to have):

Add non-goals and success criteria
Include testing strategy
Define timeline

Conclusion

This proposal identifies a legitimate architectural issue and proposes a sensible solution. However, it needs more implementation detail to be actionable. The core idea of treating RunConfig as a pure data structure (like encoding/json) is sound, but the transition path from current state to desired state needs to be mapped out.

Recommendation: Expand the "Responsibility Split" section significantly before implementation begins, or create a follow-up implementation plan document that addresses the gaps identified above.

References

Current RunConfig implementation: pkg/runner/config.go:34-162
Current storage abstraction: pkg/state/runconfig.go
Operator usage: cmd/thv-operator/controllers/mcpserver_runconfig.go
CLI usage: cmd/thv/app/run.go

Job Run | Branch: docs/runconfig-proposal

JAORMX

An open question I'd like to discuss tomorrow is: Should we have information about the secret provider that's used and a key to retrieve the secret at runtime? This would make it easier to share configurations.

dmjb · 2025-10-08T12:41:41Z

docs/proposals/runconfig-as-architectural-boundary.md

+* **ToolHive Operator**, which is used in both Kubernetes and OpenShift
+* **ToolHive UI**, which internally uses APIs exposed by `thv serve`
+
+These interfaces differ in their execution environment and "quality of life" features that one might implement. For example, one obvious difference is the way configuration is accessed: CLI must both accept CLI options and read files on file system, while the Kubernetes Operator requires all parameters to be fully specified in the main CRD or into "linked" ones (e.g. `MCPToolConfig`). Another one is in config reload semantics, which must implemented in an ad-hoc fashion for CLI, while Kubernetes handles it as part of the life cycle of resources. Yet another example is the location of runtime configuration and state. The Kubernetes Operator relies on both being stored "inside the cluster", while the CLI and UI must both rely on file system, yet the configuration might be semantically equivalent. A final useful use case is exporting `RunConfig` so that the same workload can be "moved" to a different place or shipped as configuration. A similar use case is that implemented by the `thv restart` command, which fetches the serialized version of a run config to restart the workload when necessary.


It's worth noting that in the UI/API case - the preference has been to specify the payload ("literal" as this doc uses) of the configuration instead of a path. This allows the payload to be constructed by a UI form.

Note: there's general agreement on the fact that HTTP API must not contain references to local files.
Action Item: this is a bit outside of the RunConfig design doc, but we should document it regardless.

dmjb · 2025-10-08T12:43:41Z

docs/proposals/runconfig-as-architectural-boundary.md

@@ -0,0 +1,81 @@
+# `RunConfig` as Architectural Boundary


General comment worth noting: There are a bunch of fields in the run config right now which are only needed for backwards compatibility. We should clean those up as part of stabilization.

Note: we have a VersionSchema field in the RunConfig struct, we should leverage that.
Action Item: list fields that are either no longer used or not desired anymore.

ChrisJBurns · 2025-10-08T12:43:41Z

docs/proposals/runconfig-as-architectural-boundary.md

+
+Consequences of changes to configuration parameters must be managed outside the `RunConfig` code. For example, configuration reload for the CLI must be managed within CLI commands, and not within the `Runner` or `RunConfig`. That said, `Runner`s can implement behaviors specific to their execution environment, but they must not rely on references to the "outside world" being stored in the `RunConfig`.
+
+Types exposed by `RunConfig` package should not be used for externally facing formats like HTTP API if not for trivial cases. In case diverging becomes necessary, said types won't be modified (to avoid breaking changes) and CLI/Operator must expose their own type that is then mapped to the `RunConfig` equivalent.


nit: is it worth mentioning that a decorator can be used here? Not sure if we want to mention specific terms right now, but it may help somewhat keep custom types following a certain pattern

Action Item: I need to rephrase this mentioning "API contracts", this should clarify what I meant.

ChrisJBurns · 2025-10-08T12:45:51Z

docs/proposals/runconfig-as-architectural-boundary.md

+
+**RunConfig** data structure and routines will be responsible for holding configuration parameters, basic validation, and serialization, but not storage. Simply put, the package should accept bytes and readers, and return bytes, similarly to how `encoding/json` works.
+
+**CLI** and **Operator** will be responsible for mapping their respective representation of configuration parameters to the representation allowed by `RunConfig`. Specifically, no file-based representation of configuration parameters is allowed in the `RunConfig` struct.


nit: could we have a simple diagram here of the types? Picture paints a thousand words in that sense

Action Item: list users of RunConfig structs and how they represent information that ends up in the run config.

blkt · 2025-10-08T12:49:41Z

An open question I'd like to discuss tomorrow is: Should we have information about the secret provider that's used and a key to retrieve the secret at runtime? This would make it easier to share configurations.

@JAORMX currently, RunConfig contains "how to identify the secret", but not "where to get it from". Additionally, it is only possible to configure a single secrets provider at a time. We're fine keeping this as is for now.

dmjb · 2025-10-08T12:55:34Z

docs/proposals/runconfig-as-architectural-boundary.md

+* permission profile (literal or file-based)
+* OIDC configuration parameters
+* authorization config (literal or file-based)
+* audit config (literal or file-based)


Some of these are deprecated.

blkt · 2025-10-08T13:14:25Z

docs/proposals/runconfig-as-architectural-boundary.md

+
+We propose the following split in responsibilities
+
+**RunConfig** data structure and routines will be responsible for holding configuration parameters, basic validation, and serialization, but not storage. Simply put, the package should accept bytes and readers, and return bytes, similarly to how `encoding/json` works.


Note: currently, validation for middleware configs is done late in the process and should be done at least where it's currently done as well as when instantiating a RunConfig.
Action Item: document here where validation is implemented and where validation routines should be invoked.

blkt self-assigned this Oct 6, 2025

blkt force-pushed the docs/runconfig-proposal branch from b7dca4b to fb9c334 Compare October 7, 2025 07:34

JAORMX reviewed Oct 7, 2025

View reviewed changes

dmjb reviewed Oct 8, 2025

View reviewed changes

ChrisJBurns reviewed Oct 8, 2025

View reviewed changes

dmjb reviewed Oct 8, 2025

View reviewed changes

blkt commented Oct 8, 2025

View reviewed changes


		Consequences of changes to configuration parameters must be managed outside the `RunConfig` code. For example, configuration reload for the CLI must be managed within CLI commands, and not within the `Runner` or `RunConfig`. That said, `Runner`s can implement behaviors specific to their execution environment, but they must not rely on references to the "outside world" being stored in the `RunConfig`.

		Types exposed by `RunConfig` package should not be used for externally facing formats like HTTP API if not for trivial cases. In case diverging becomes necessary, said types won't be modified (to avoid breaking changes) and CLI/Operator must expose their own type that is then mapped to the `RunConfig` equivalent.


		RunConfig data structure and routines will be responsible for holding configuration parameters, basic validation, and serialization, but not storage. Simply put, the package should accept bytes and readers, and return bytes, similarly to how `encoding/json` works.

		CLI and Operator will be responsible for mapping their respective representation of configuration parameters to the representation allowed by `RunConfig`. Specifically, no file-based representation of configuration parameters is allowed in the `RunConfig` struct.


		We propose the following split in responsibilities

		RunConfig data structure and routines will be responsible for holding configuration parameters, basic validation, and serialization, but not storage. Simply put, the package should accept bytes and readers, and return bytes, similarly to how `encoding/json` works.

RunConfig as architectural boundary proposal #2096

Are you sure you want to change the base?

RunConfig as architectural boundary proposal #2096

Uh oh!

Conversation

blkt commented Oct 6, 2025

Uh oh!

codecov bot commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

JAORMX commented Oct 7, 2025

Uh oh!

claude bot commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Architectural Proposal Review

Overall Assessment

Strengths

Areas for Improvement

1. Implementation Guidance Missing

2. Concrete Examples of Violations

3. Validation Responsibility Unclear

4. Serialization Format Not Specified

5. Missing Discussion of Runner Responsibilities

6. Impact on Existing Code Not Assessed

7. "No File-Based Representation" Needs Clarification

8. External API Stability Concern

Specific Technical Concerns

1. Storage Abstraction Already Exists

2. Current Builder Pattern May Need Redesign

3. Middleware Configuration Complexity

Missing Sections

Recommendations

Priority 1 (Critical for proposal acceptance):

Priority 2 (Important for implementation):

Priority 3 (Nice to have):

Conclusion

References

Uh oh!

JAORMX left a comment

Choose a reason for hiding this comment

Uh oh!

dmjb Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ChrisJBurns Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

blkt commented Oct 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

`RunConfig` as architectural boundary proposal #2096

`RunConfig` as architectural boundary proposal #2096

codecov bot commented Oct 6, 2025 •

edited

Loading

claude bot commented Oct 7, 2025 •

edited

Loading

5. Missing Discussion of `Runner` Responsibilities

dmjb Oct 8, 2025 •

edited

Loading

ChrisJBurns Oct 8, 2025 •

edited

Loading