Skip to content

Conversation

blkt
Copy link
Contributor

@blkt blkt commented Oct 6, 2025

This change contains a design doc describing an architectural boundary at RunConfig level. Aim of this design doc is

  • listing the current responsibilities of RunConfig and related code (e.g. Runner)
  • propose a new split of responsibilities and a clear golang API for it
  • clarify what responsibilities should be moved in the respective user interfaces, e.g. CLI or Operator

@blkt blkt self-assigned this Oct 6, 2025
Copy link

codecov bot commented Oct 6, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 48.31%. Comparing base (d9498fe) to head (fb9c334).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2096      +/-   ##
==========================================
- Coverage   48.33%   48.31%   -0.02%     
==========================================
  Files         238      238              
  Lines       30034    30034              
==========================================
- Hits        14518    14512       -6     
- Misses      14418    14423       +5     
- Partials     1098     1099       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

This change contains a design doc describing an architectural boundary
at `RunConfig` level. Aim of this design doc is
* listing the current responsibilities of `RunConfig` and related code
  (e.g. `Runner`)
* propose a new split of responsibilities and a clear golang API for
  it
* clarify what responsibilities should be moved in the respective user
  interfaces, e.g. CLI or Operator
@blkt blkt force-pushed the docs/runconfig-proposal branch from b7dca4b to fb9c334 Compare October 7, 2025 07:34
@JAORMX
Copy link
Collaborator

JAORMX commented Oct 7, 2025

@claude please review this architectural proposal

Copy link
Contributor

claude bot commented Oct 7, 2025

Claude finished @JAORMX's task —— View job


Architectural Proposal Review

I've reviewed the RunConfig as architectural boundary proposal. Here's my comprehensive feedback:


Overall Assessment

This is a well-structured and necessary proposal that addresses real architectural concerns in ToolHive. The separation of concerns between RunConfig and the consumer interfaces (CLI, Operator, UI) is architecturally sound and will improve maintainability and extensibility.


Strengths

  1. Clear Problem Statement: The proposal effectively articulates the current challenges with configuration handling across different interfaces (CLI, Operator, UI).

  2. Practical Examples: The use cases (config reload semantics, storage location differences, export/import) clearly demonstrate why this boundary is needed.

  3. Appropriate Scope: Making RunConfig analogous to encoding/json (accepting bytes/readers, returning bytes) is the right level of abstraction.

  4. Current State Documentation: Lines 22-70 provide an excellent inventory of what RunConfig currently contains, split into metadata, workload details, and proxy details.


Areas for Improvement

1. Implementation Guidance Missing

The proposal describes what should change but lacks how to achieve it. Consider adding:

  • Migration Strategy: How do we move from the current state to the proposed state?

    • Which fields need to be removed from RunConfig?
    • What's the deprecation timeline for file-based parameters?
    • How do we handle backward compatibility?
  • API Surface Definition: What should the new public API look like?

    // Example:
    // Instead of exposing RunConfig directly, consider:
    type RunConfig struct { /* ... */ }
    func NewRunConfig(opts ...Option) (*RunConfig, error)
    func (rc *RunConfig) Validate() error
    func (rc *RunConfig) MarshalJSON() ([]byte, error)
    func UnmarshalRunConfig(data []byte) (*RunConfig, error)

2. Concrete Examples of Violations

The proposal would benefit from specific examples of current violations:

Current problematic fields in RunConfig (from pkg/runner/config.go:34-162):

  • EnvFileDir (line 88) - File system reference
  • PermissionProfileNameOrPath (line 79) - File path reference
  • AuthzConfigPath (line 107) - File path reference
  • AuditConfigPath (line 113) - File path reference
  • JWKSAuthTokenFile (line 145) - File path reference
  • Deployer (line 127) - Runtime-specific dependency

These should be called out explicitly as fields that violate the proposed boundary.

3. Validation Responsibility Unclear

Line 75 mentions "basic validation" as a RunConfig responsibility, but doesn't define what qualifies as "basic":

Suggested clarification:

  • Basic validation (RunConfig's responsibility): Structural validity, required fields present, type constraints
  • Contextual validation (CLI/Operator responsibility): File existence, network availability, secret references, registry lookups

4. Serialization Format Not Specified

Line 75 mentions serialization but doesn't specify:

  • What formats should be supported? (JSON only? YAML? Both?)
  • Should the format be versioned? (The struct has SchemaVersion but no migration strategy)
  • What happens when SchemaVersion changes?

5. Missing Discussion of Runner Responsibilities

The proposal mentions Runner in the title and problem statement but doesn't clearly define its responsibilities post-refactor. Consider adding:

Proposed Runner Responsibilities:

  • Accept a validated RunConfig
  • Execute workloads using the provided deployer
  • Manage workload lifecycle (start, stop, monitor)
  • Handle middleware chain construction
  • NOT responsible for: config file reading, secret fetching, registry lookups

6. Impact on Existing Code Not Assessed

The proposal doesn't mention:

  • How many call sites need updates?
  • What's the blast radius of this change?
  • Which packages are affected?

Affected areas (from my analysis):

  • cmd/thv/app/run.go - CLI configuration building
  • cmd/thv-operator/controllers/mcpserver_runconfig.go - Operator CRD mapping
  • pkg/runner/config_builder.go - Builder pattern needs updating
  • pkg/state/runconfig.go - Storage abstraction

7. "No File-Based Representation" Needs Clarification

Line 77 states: "no file-based representation of configuration parameters is allowed in the RunConfig struct"

Question: Does this mean:

  • ❌ No file paths (like EnvFileDir, AuthzConfigPath)?
  • ❌ No references to external files at all?
  • ✅ File contents should be resolved by CLI/Operator before creating RunConfig?

This should be explicit with before/after examples:

// ❌ Current (bad)
type RunConfig struct {
    AuthzConfigPath string
}

// ✅ Proposed (good)  
type RunConfig struct {
    AuthzConfig *authz.Config // already resolved
}

8. External API Stability Concern

Line 81 mentions: "Types exposed by RunConfig package should not be used for externally facing formats like HTTP API"

Concern: The current RunConfig struct (pkg/runner/config.go:34-162) has a comment stating it's "part of ToolHive's API contract" (line 36-37). This proposal doesn't address:

  • Is RunConfig JSON format considered a public API?
  • How do we version it if we need breaking changes?
  • What's the compatibility guarantee?

Specific Technical Concerns

1. Storage Abstraction Already Exists

The proposal treats storage as CLI/Operator responsibility, but pkg/state/runconfig.go already provides a storage abstraction layer. How does this fit into the new design?

Recommendation: Clarify whether pkg/state is above or below the boundary.

2. Current Builder Pattern May Need Redesign

pkg/runner/config_builder.go currently handles file resolution during building. With the proposed changes, this logic moves to CLI/Operator, but the builder pattern might still be useful for constructing valid RunConfig instances.

Recommendation: Provide guidance on whether builders should exist and where.

3. Middleware Configuration Complexity

The MiddlewareConfigs field (pkg/runner/config.go:159-161) is complex and touches both proxy configuration and execution environment. The proposal doesn't clarify if/how middleware configuration should be handled.


Missing Sections

Consider adding:

  1. Non-Goals: What are we explicitly NOT trying to solve?

    • Are we keeping backward compatibility for existing serialized configs?
    • Are we refactoring the Operator CRD structure?
  2. Success Criteria: How do we know when this is complete?

    • All file path references removed from RunConfig?
    • CLI and Operator can construct RunConfig without accessing filesystem during Runner.Run()?
    • Configuration reload is handled at CLI/Operator level?
  3. Testing Strategy: How do we validate the boundary is maintained?

    • Linter rules to prevent file paths in RunConfig?
    • Interface contracts to enforce separation?
  4. Timeline: When do we want this implemented?


Recommendations

Priority 1 (Critical for proposal acceptance):

  1. Add concrete examples of fields that violate the current boundary
  2. Define what "basic validation" means
  3. Clarify the "no file-based representation" rule with examples
  4. Add migration strategy and backward compatibility approach

Priority 2 (Important for implementation):

  1. Define the new public API surface
  2. Clarify Runner responsibilities post-refactor
  3. Specify serialization format and versioning strategy
  4. Assess impact on existing code

Priority 3 (Nice to have):

  1. Add non-goals and success criteria
  2. Include testing strategy
  3. Define timeline

Conclusion

This proposal identifies a legitimate architectural issue and proposes a sensible solution. However, it needs more implementation detail to be actionable. The core idea of treating RunConfig as a pure data structure (like encoding/json) is sound, but the transition path from current state to desired state needs to be mapped out.

Recommendation: Expand the "Responsibility Split" section significantly before implementation begins, or create a follow-up implementation plan document that addresses the gaps identified above.


References

  • Current RunConfig implementation: pkg/runner/config.go:34-162
  • Current storage abstraction: pkg/state/runconfig.go
  • Operator usage: cmd/thv-operator/controllers/mcpserver_runconfig.go
  • CLI usage: cmd/thv/app/run.go

Job Run | Branch: docs/runconfig-proposal

Copy link
Collaborator

@JAORMX JAORMX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An open question I'd like to discuss tomorrow is: Should we have information about the secret provider that's used and a key to retrieve the secret at runtime? This would make it easier to share configurations.

* **ToolHive Operator**, which is used in both Kubernetes and OpenShift
* **ToolHive UI**, which internally uses APIs exposed by `thv serve`

These interfaces differ in their execution environment and "quality of life" features that one might implement. For example, one obvious difference is the way configuration is accessed: CLI must both accept CLI options and read files on file system, while the Kubernetes Operator requires all parameters to be fully specified in the main CRD or into "linked" ones (e.g. `MCPToolConfig`). Another one is in config reload semantics, which must implemented in an ad-hoc fashion for CLI, while Kubernetes handles it as part of the life cycle of resources. Yet another example is the location of runtime configuration and state. The Kubernetes Operator relies on both being stored "inside the cluster", while the CLI and UI must both rely on file system, yet the configuration might be semantically equivalent. A final useful use case is exporting `RunConfig` so that the same workload can be "moved" to a different place or shipped as configuration. A similar use case is that implemented by the `thv restart` command, which fetches the serialized version of a run config to restart the workload when necessary.
Copy link
Member

@dmjb dmjb Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth noting that in the UI/API case - the preference has been to specify the payload ("literal" as this doc uses) of the configuration instead of a path. This allows the payload to be constructed by a UI form.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: there's general agreement on the fact that HTTP API must not contain references to local files.
Action Item: this is a bit outside of the RunConfig design doc, but we should document it regardless.

@@ -0,0 +1,81 @@
# `RunConfig` as Architectural Boundary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment worth noting: There are a bunch of fields in the run config right now which are only needed for backwards compatibility. We should clean those up as part of stabilization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: we have a VersionSchema field in the RunConfig struct, we should leverage that.
Action Item: list fields that are either no longer used or not desired anymore.


Consequences of changes to configuration parameters must be managed outside the `RunConfig` code. For example, configuration reload for the CLI must be managed within CLI commands, and not within the `Runner` or `RunConfig`. That said, `Runner`s can implement behaviors specific to their execution environment, but they must not rely on references to the "outside world" being stored in the `RunConfig`.

Types exposed by `RunConfig` package should not be used for externally facing formats like HTTP API if not for trivial cases. In case diverging becomes necessary, said types won't be modified (to avoid breaking changes) and CLI/Operator must expose their own type that is then mapped to the `RunConfig` equivalent.
Copy link
Collaborator

@ChrisJBurns ChrisJBurns Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is it worth mentioning that a decorator can be used here? Not sure if we want to mention specific terms right now, but it may help somewhat keep custom types following a certain pattern

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action Item: I need to rephrase this mentioning "API contracts", this should clarify what I meant.


**RunConfig** data structure and routines will be responsible for holding configuration parameters, basic validation, and serialization, but not storage. Simply put, the package should accept bytes and readers, and return bytes, similarly to how `encoding/json` works.

**CLI** and **Operator** will be responsible for mapping their respective representation of configuration parameters to the representation allowed by `RunConfig`. Specifically, no file-based representation of configuration parameters is allowed in the `RunConfig` struct.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could we have a simple diagram here of the types? Picture paints a thousand words in that sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action Item: list users of RunConfig structs and how they represent information that ends up in the run config.

@blkt
Copy link
Contributor Author

blkt commented Oct 8, 2025

An open question I'd like to discuss tomorrow is: Should we have information about the secret provider that's used and a key to retrieve the secret at runtime? This would make it easier to share configurations.

@JAORMX currently, RunConfig contains "how to identify the secret", but not "where to get it from". Additionally, it is only possible to configure a single secrets provider at a time. We're fine keeping this as is for now.

* permission profile (literal or file-based)
* OIDC configuration parameters
* authorization config (literal or file-based)
* audit config (literal or file-based)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these are deprecated.


We propose the following split in responsibilities

**RunConfig** data structure and routines will be responsible for holding configuration parameters, basic validation, and serialization, but not storage. Simply put, the package should accept bytes and readers, and return bytes, similarly to how `encoding/json` works.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: currently, validation for middleware configs is done late in the process and should be done at least where it's currently done as well as when instantiating a RunConfig.
Action Item: document here where validation is implemented and where validation routines should be invoked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants