Skip to content

Project bootstrap #1

@cjimti

Description

@cjimti

plexara-agents

A Go reference implementation for building MCP-driven AI agents.

Status: Draft specification, v0.1
Repository: github.com/plexara/plexara-agents
License: Apache 2.0
Language: Go 1.26+


1. Purpose

plexara-agents is an open source Go project that does two things at once:

  1. Ships a small, opinionated library (core) for building local-first AI agents that drive Model Context Protocol (MCP) servers.
  2. Ships ready-to-run binaries that use the library to demonstrate the Plexara MCP data platform (txn2/mcp-data-platform), specifically against the public ACME Corp demo deployment.

The project is designed to serve as a reference: anyone reading the source should come away with a clear, idiomatic answer to "how do I build a Go agent that talks to one or more MCP servers using a local model." The Plexara demo is the headline use case, but nothing in core assumes Plexara is the MCP being driven.

Why this exists

The MCP ecosystem has matured fast on the server side. Client tooling, especially in Go, has not. Most public agent code is Python and most public Python agent code optimizes for cloud APIs. A Go project that runs entirely on local inference, treats MCP as a first-class primitive, and is small enough to read end-to-end fills a real gap and makes the case for Plexara's architectural model along the way.


2. Goals and Non-Goals

Goals

  • Serve as a readable, idiomatic Go reference for MCP-driven agent construction.
  • Run entirely on local model inference for the v1 release. No cloud API support, no fallback paths.
  • Drive any MCP server. Plexara is the showcase, not a coupling.
  • Provide a clean, minimal core library that other agents (current and future) can depend on without inheriting CLI or transport concerns.
  • Support multiple binaries built on core: a single-shot CLI (ask), an interactive REPL (repl), and a starter for hosted deployments later.
  • Ship with example workflows that exercise the Plexara ACME Corp demo end to end, demonstrating how an agent benefits from Plexara's context enrichment.

Non-Goals (v1)

  • Cloud API providers (Anthropic, OpenAI, Bedrock, etc.). Local only.
  • Embedded inference. The agent talks HTTP to a local model server (Ollama or MLX), not via CGo or in-process llama.cpp.
  • Web UI. The hosted web variant is planned, but core and the v1 binaries do not assume a browser.
  • Plugin system. Composition happens at the Go module level, not via dynamic loading.
  • A bespoke prompt framework. Prompts are plain text or templated Go strings, not a DSL.

3. Audience

Two readers, with different reading paths:

The agent builder. Someone who needs to build an MCP-driven agent for their own product. They read core/ top to bottom, treat the binaries in cmd/ as worked examples, and import core into their own project.

The Plexara evaluator. Someone evaluating whether Plexara's MCP design holds up under real agent traffic. They run cmd/ask against the ACME demo, look at the example workflows, and learn the platform by interaction.

The spec, the layout, and the code should serve both.


4. Architecture Overview

+---------------------+        +-----------------------+
|   cmd/ask           |        |   cmd/repl            |
|   (single-shot CLI) |        |   (interactive TUI)   |
+----------+----------+        +-----------+-----------+
           |                               |
           +---------------+---------------+
                           |
                  +--------v--------+
                  |     core        |
                  |                 |
                  | loop  session   |
                  | provider  mcp   |
                  | event  router   |
                  +--------+--------+
                           |
            +--------------+--------------+
            |                             |
   +--------v--------+          +---------v---------+
   |  Local model    |          |   MCP servers     |
   |  (Ollama / MLX) |          |   (Plexara, etc.) |
   +-----------------+          +-------------------+

The agent loop is a function: given a Provider, one or more MCP clients, and a user message, it produces a stream of events until the model signals completion. Everything else is composition around that function.


5. Module Layout

plexara-agents/                    # github.com/plexara/plexara-agents
├── core/                          # importable library, the heart of the project
│   ├── event/                     # event types (TextDelta, ToolCall, Finish, Error)
│   ├── provider/                  # Provider interface and adapters
│   │   ├── provider.go            # interface + shared types
│   │   ├── openai_compatible.go   # works for Ollama, llama.cpp server, MLX, vLLM
│   │   └── testing.go             # in-memory fake for tests
│   ├── mcp/                       # MCP client wrapper around go-sdk
│   │   ├── client.go              # connection lifecycle, multi-server aggregation
│   │   └── catalog.go             # tool catalog with toolkit grouping
│   ├── router/                    # tool router (toolkit classification, narrowing)
│   ├── session/                   # message history, persistence, replay
│   ├── loop/                      # the agent loop itself
│   ├── approval/                  # tool-call approval gates
│   └── log/                       # slog-based structured logging helpers
├── cmd/
│   ├── ask/                       # single-shot CLI: one question, one answer
│   └── repl/                      # interactive REPL with history and approval prompts
├── examples/
│   ├── acme-revenue/              # end-to-end Plexara ACME demo: revenue by region
│   ├── acme-lineage/              # lineage walk + downstream impact analysis
│   └── multi-mcp/                 # one agent, two MCP servers, demonstrating router
├── docs/
│   ├── architecture.md
│   ├── mcp-integration.md
│   ├── prompts/                   # canonical prompts kept under version control
│   └── adrs/                      # architecture decision records
├── go.mod
├── go.sum
├── Makefile
├── LICENSE
└── README.md

Rules:

  • core has no dependencies on cmd/ or examples/.
  • cmd/ binaries are intentionally small (target ~150 lines each). If a binary grows logic worth keeping, it moves into core.
  • Examples are runnable. If they break, CI breaks.

6. Core Abstractions

6.1 Events

Everything streamed out of the agent loop is an event. Events are a closed sum type, expressed as an interface with a sealing method.

// core/event/event.go
package event

type Event interface {
    isEvent()
}

type TextDelta struct {
    Text string
}
func (TextDelta) isEvent() {}

type ToolCallRequest struct {
    ID        string
    Name      string
    Server    string          // which MCP server owns this tool
    Arguments json.RawMessage // never partial; always complete JSON
}
func (ToolCallRequest) isEvent() {}

type ToolCallResult struct {
    ID      string
    Content []ToolContent   // text, image, resource refs
    IsError bool
}
func (ToolCallResult) isEvent() {}

type Finish struct {
    Reason FinishReason
    Usage  Usage
}
func (Finish) isEvent() {}

type Error struct {
    Err error
}
func (Error) isEvent() {}

Consumers (CLI, REPL, future web server) switch on event type and render accordingly.

6.2 Provider

Defined from the consumer side, narrow on purpose.

// core/provider/provider.go
package provider

type Provider interface {
    // Stream returns a channel of events. The channel closes when the request
    // ends (success, error, or context cancellation). Implementations must
    // never emit a partial ToolCallRequest; they buffer until arguments
    // are complete JSON.
    Stream(ctx context.Context, req Request) (<-chan event.Event, error)

    // Name identifies the provider for logging and diagnostics.
    Name() string
}

type Request struct {
    Model    string
    Messages []Message
    Tools    []Tool
    // Sampling parameters; zero values mean provider default.
    Temperature *float32
    TopP        *float32
    MaxTokens   *int
}

Why a channel and not an iterator: range-over-func iterators are stable and tempting, but channels compose better with select, cancellation, and the existing context-cancellation idiom. An iterator wrapper can be added later if there's demand.

For v1 there is one Provider implementation: OpenAICompatible. Pointing it at http://localhost:11434/v1 gives Ollama. Pointing it at an MLX server, llama.cpp's server, or a future vLLM deployment is a config change. This is deliberate: a single well-tested adapter beats four half-tested ones.

6.3 Session

// core/session/session.go
package session

type Session struct {
    ID       string
    Messages []provider.Message
    Created  time.Time
    Updated  time.Time
}

func (s *Session) Append(m provider.Message)
func (s *Session) Truncate(maxTokens int, tokenizer Tokenizer) // sliding window
func (s *Session) Save(w io.Writer) error
func Load(r io.Reader) (*Session, error)

Session is a value type, easy to serialize, easy to replay. Persistence is JSON Lines on disk by default. Replay (feeding a saved session back into the loop) is a first-class operation, useful for debugging and for evaluations.

6.4 MCP Client

Thin wrapper over modelcontextprotocol/go-sdk. Responsibilities:

  • Manage connections to one or more MCP servers concurrently.
  • Aggregate tool catalogs across servers, namespaced by server name.
  • Route tool calls to the right server.
  • Reconnect on transport failure with bounded backoff.
  • Expose resources and prompts published by each server to the loop.
// core/mcp/client.go
package mcp

type Client struct { /* unexported */ }

type ServerConfig struct {
    Name      string
    Transport Transport          // stdio, sse, http
    Endpoint  string
    Headers   map[string]string
}

func New(cfgs []ServerConfig, opts ...Option) (*Client, error)

func (c *Client) Connect(ctx context.Context) error
func (c *Client) Close() error

func (c *Client) Catalog() *Catalog
func (c *Client) Call(ctx context.Context, req ToolCall) (ToolResult, error)
func (c *Client) Resources(ctx context.Context, server string) ([]Resource, error)
func (c *Client) Prompts(ctx context.Context, server string) ([]Prompt, error)

Tool names sent to the model are namespaced as server__tool (double underscore separator) to keep them legal under all provider tool-name regexes while remaining trivially parseable.

6.5 Agent Loop

The single most important file in the project. It is short on purpose.

// core/loop/loop.go
package loop

type Config struct {
    Provider provider.Provider
    MCP      *mcp.Client
    Router   router.Router          // nil means "all tools, no narrowing"
    Approver approval.Approver      // nil means auto-approve
    Logger   *slog.Logger
    MaxSteps int                    // safety cap on tool-call iterations
}

func Run(ctx context.Context, cfg Config, sess *session.Session, userMessage string) (<-chan event.Event, error)

The loop does this and only this:

  1. Append the user message to the session.
  2. Call Router.Narrow if configured, to pick a relevant subset of tools for this turn.
  3. Stream from the Provider with the narrowed tool set.
  4. For each event:
    • TextDelta: forward.
    • ToolCallRequest: ask the Approver, then dispatch to MCP, append the result to the session, and re-stream from the Provider.
    • Finish: forward and exit.
    • Error: forward and exit.
  5. Enforce MaxSteps so a misbehaving model cannot tool-loop indefinitely.

Cancellation, error wrapping, and structured logging happen here, not in providers.

6.6 Tool Router

The router is what makes a 30B local model tractable against a 30+ tool MCP surface. Without it, every turn carries the full catalog and small models drift.

// core/router/router.go
package router

type Router interface {
    Narrow(ctx context.Context, userMessage string, catalog *mcp.Catalog) ([]mcp.Tool, error)
}

Two implementations ship in v1:

  • PassThrough: returns the full catalog. Good for small MCP servers and large models.
  • ToolkitClassifier: a lightweight first pass against the same Provider that asks "which toolkits are likely needed?" then returns only those toolkits' tools. Plexara's datahub_*, trino_*, s3_*, and memory_* namespaces map cleanly to toolkits.

The classifier prompt and toolkit definitions live in docs/prompts/ so they are reviewable and version-controlled rather than buried in code.

6.7 Approval

Mutating tool calls (writes, deletes, expensive queries) require explicit human approval by default for interactive binaries. Single-shot binaries can be configured to auto-approve, deny, or prompt out-of-band.

// core/approval/approval.go
package approval

type Decision int
const (
    Allow Decision = iota
    Deny
    AllowAll          // for this session
)

type Approver interface {
    Approve(ctx context.Context, call event.ToolCallRequest) (Decision, error)
}

Standard implementations:

  • AutoAllow: trust the MCP server's declared permissions.
  • Interactive: TTY prompt, used by cmd/repl.
  • Policy: rule-based, e.g. allow all datahub_* reads, prompt on trino_execute, deny on datahub_delete.

7. Local Model Strategy

7.1 Target Model

Primary: Qwen3-30B-A3B at Q4_K_M quantization.

  • Mixture of experts: 30B total, 3B active per token.
  • Memory footprint around 17 to 18 GB at Q4, leaving room for context and OS on a 32 GB machine.
  • Reliable tool-call emission in OpenAI-compatible format via Ollama or MLX.
  • Strong enough at SQL generation to drive Trino through Plexara, especially with schema enrichment in context.

Validated alternatives (docs/models.md documents results):

  • Qwen3-32B (dense): better single-pass reasoning, slower decode, similar memory.
  • Mistral Small 3.2 (24B): solid generalist, slightly weaker at chained tool use.
  • gpt-oss-20b: smaller footprint baseline.

Models below 14B parameters are not recommended for this agent. Their tool-call reliability falls off on chains longer than two or three steps, which is exactly the workload Plexara generates.

7.2 Runtime

Default: Ollama. One install command, OpenAI-compatible at localhost:11434/v1, ships on macOS, Linux, and Windows.

Documented alternative: mlx-lm's OpenAI-compatible server on Apple Silicon for roughly 1.5x to 2x faster decode at equivalent quants.

Both are reached through the same OpenAICompatible provider. Switching is a config change.

7.3 What the agent does NOT do

  • Does not embed a tokenizer. Token counting for sliding-window truncation uses a length heuristic in v1 and a model-aware tokenizer (via a small HTTP /tokenize call to the runtime when available) in v1.1.
  • Does not manage model downloads. ollama pull qwen3:30b-a3b-q4_K_M is a documented prerequisite, not a runtime concern.
  • Does not implement speculative decoding, KV cache management, or other inference-layer concerns. Those belong in the runtime.

8. MCP Integration

8.1 Context Enrichment Is on the Server Side

Plexara MCP performs context enrichment as part of its protocol surface: tool descriptions carry domain context, datahub_get_schema and datahub_get_glossary_term exist as first-class tools, lineage is queryable, and resources expose curated views over the catalog. The agent in this project does not implement enrichment. It consumes whatever the MCP server presents.

This is the right division of responsibility. The MCP server has the catalog, the lineage graph, the access policies, and the domain knowledge. Putting enrichment logic in the agent would duplicate it and couple the agent to one MCP's data model.

What the agent does instead:

  • Surfaces the full enriched tool catalog to the model after toolkit narrowing, so descriptions written by Plexara reach the model verbatim.
  • Calls discovery tools eagerly when the toolkit classifier suggests they are relevant. For Plexara that means datahub_search and datahub_get_schema are likely first calls when the user asks about data, before the model attempts a trino_query.
  • Exposes MCP resources and prompts to the loop so the model can pull curated context (resource reads, prompt templates) when the server offers them.

The Plexara examples make this concrete. examples/acme-revenue/ shows a turn where the model issues datahub_search then datahub_get_schema before writing SQL, because the toolkit classifier surfaced both tools and Plexara's tool descriptions made their purpose obvious. No agent-side enrichment code is required for that to work.

8.2 Tool Routing

See section 6.6. Tool routing is the agent's responsibility because it depends on the user message and the conversation, not on the MCP server's catalog alone. Once narrowed, the agent passes the selected tools (with their server-supplied descriptions intact) to the model.

8.3 SQL Safety Pattern

For Plexara and any MCP exposing SQL execution, the agent ships with an optional SQLValidator middleware that:

  1. Intercepts *_query and *_execute tool calls.
  2. Calls the corresponding *_explain tool first.
  3. If EXPLAIN fails, returns the error to the model as a tool result so the model self-corrects.
  4. If EXPLAIN succeeds, lets the original call proceed.

This costs one extra round trip and dramatically reduces the rate at which local models produce broken queries. It is opt-in but enabled by default in the Plexara examples.

This is a client-side pattern, not enrichment. The MCP server already provides _explain; the agent just orchestrates the two-step.

8.4 Tool-Call Streaming Discipline

The agent never parses partial tool-call JSON. The OpenAICompatible provider buffers tool-call deltas internally and emits a ToolCallRequest event only when the runtime signals the call is complete (finish_reason: tool_calls or equivalent). This is non-negotiable. Half-parsed tool calls are the single most common source of agent flakiness in the wild.


9. CLI Surfaces

9.1 cmd/ask

Single-shot. One question in, streamed answer out, exit.

ask --model qwen3:30b-a3b --mcp plexara-acme \
    "Top five products by revenue in the West region last quarter"

Flags:

  • --model: model name passed to the runtime.
  • --mcp: named MCP config from ~/.config/plexara-agents/config.yaml, or a path.
  • --no-router: disable toolkit narrowing, send the full catalog every turn.
  • --auto-approve: bypass approval prompts (for scripted use).
  • --session FILE: append to or replay a session file.
  • --json: structured event output for piping into tooling.

9.2 cmd/repl

Interactive. Multi-turn session with approval prompts, history, and slash commands.

Slash commands (planned):

  • /tools list narrowed tools for the next turn
  • /catalog show full catalog from all connected MCPs
  • /save FILE, /load FILE session persistence
  • /explain show the last tool call's parameters and result
  • /prompt print the assembled system prompt for the next turn

Implementation uses bubbletea for the TUI layer. This pulls in a real dependency, but bubbletea is the de facto Go TUI library and the alternative (raw terminal handling) is not where reference-quality code should be spent.


10. Configuration

YAML, located at ~/.config/plexara-agents/config.yaml by default, overridable with --config.

defaults:
  model: qwen3:30b-a3b
  provider: ollama-local
  router: toolkit-classifier
  approval: interactive

providers:
  ollama-local:
    type: openai-compatible
    base_url: http://localhost:11434/v1
    api_key_env: OLLAMA_API_KEY    # optional, ignored if unset

  mlx-local:
    type: openai-compatible
    base_url: http://localhost:8080/v1

mcp_servers:
  plexara-acme:
    transport: http
    endpoint: https://mcp-demo.plexara.io

Configuration values are also overridable via flags and environment variables. Precedence: flag > env > config file > built-in default. No silent overrides.


11. Observability

Structured logging via log/slog, JSON by default, text in TTYs.

Every tool call emits a log line with: server, tool name, argument digest (hashed, not raw), latency, success or error class. Resource fetches and enrichment calls do the same.

Optional OpenTelemetry tracing behind a build tag. Off by default to keep the dependency tree light. When on, the agent loop emits one span per turn with child spans for enrichment, provider streaming, and each tool call.

A --debug flag dumps the full assembled system prompt, the narrowed tool list (with the MCP server's descriptions), and the messages sent to the Provider to stderr before each turn. This is the single most useful debugging affordance.


12. Error Handling

Conventions:

  • All errors wrap with fmt.Errorf("%w", ...) and never lose the original.
  • Sentinel errors live next to the package that owns them (mcp.ErrServerUnavailable, provider.ErrModelNotFound, etc.).
  • The agent loop converts internal errors into event.Error for streaming consumers, but also returns them from Run for callers that want to handle errors imperatively.
  • Network and tool errors are not fatal by default. The model sees the error as a tool result and is free to recover. Programmer errors (bad config, missing model) are fatal.

13. Testing

Three layers, plus fuzzing.

Unit tests. Each package, table-driven where it makes sense. The Provider interface has a provider/testing.Fake implementation that lets the loop, router, and approval be tested without any network or model. All tests run with -race and -shuffle=on in CI.

Integration tests. Run against a real MCP server and a real local model. Gated behind a build tag (//go:build integration) and an INTEGRATION=1 env var. Run on a self-hosted CI runner with Ollama and Qwen3 pre-pulled, and locally during development. Never block the standard PR pipeline.

Replay tests. Saved sessions in testdata/sessions/ are replayed against a recorded Provider transcript. This catches regressions in the loop, router, and tool dispatch without needing a live model or network. New examples must ship with at least one replay test; the spec is enforced by CI.

Fuzz tests. Native Go fuzzing for parsers and serializers: tool-name namespacing, event JSON marshaling, MCP server response handling, session file decoding. Fuzz corpora committed under testdata/fuzz/. CI runs short fuzz cycles per PR; longer scheduled runs catch regressions over time.

Coverage is measured with -covermode=atomic and uploaded to Codecov. Quality gates are defined in section 14.


14. CI, Security, and Repository Standards

This is an OSS reference project. The CI surface, supply-chain posture, and repository hygiene are part of what's being demonstrated. Standards align with what matured projects in the same space ship (kubefwd's pipeline is the reference baseline).

14.1 Repository Hygiene

Files committed at the repo root or under .github/:

  • LICENSE (Apache 2.0).
  • NOTICE if any third-party attribution is required.
  • README.md with status badges (build, coverage, Go Report Card, Scorecard, license, latest release).
  • CONTRIBUTING.md describing local dev setup, commit conventions, and the PR process.
  • CODE_OF_CONDUCT.md (Contributor Covenant 2.1).
  • SECURITY.md with a vulnerability disclosure policy, supported-versions matrix, and a security contact (security@plexara.io or equivalent). GitHub Private Vulnerability Reporting enabled.
  • CODEOWNERS mapping directories to maintainers.
  • .github/ISSUE_TEMPLATE/ with bug, feature, and security-redirect templates.
  • .github/PULL_REQUEST_TEMPLATE.md.
  • .github/dependabot.yml configured for gomod, github-actions, and docker (if applicable).

Branch protection on main:

  • Required PR review (at least one approver, code owner review for owned paths).
  • Required status checks: build, test, lint, security, codeql, govulncheck, dependency-review.
  • Linear history required (squash or rebase, no merge commits).
  • Signed commits required.
  • Force-pushes blocked.
  • Auto-merge enabled for Dependabot PRs that pass all checks.

14.2 Continuous Integration

Workflows under .github/workflows/:

  • ci.yml: build, vet, lint, test (race + coverage), go mod verify, go mod tidy -diff. Runs on every PR and every push to main.
  • security.yml: gosec, govulncheck, Semgrep. Runs on every PR and on a weekly schedule.
  • codeql.yml: GitHub CodeQL with the security-extended and security-and-quality query packs. Runs on every PR, every push to main, and weekly.
  • scorecard.yml: OpenSSF Scorecard. Runs weekly and on main pushes; uploads SARIF to the security tab and publishes results.
  • dependency-review.yml: blocks PRs that introduce dependencies with known vulnerabilities or non-permissive licenses.
  • release.yml: triggered on v*.*.* tags; runs GoReleaser, generates SBOMs, signs artifacts, attaches SLSA provenance.
  • fuzz.yml: runs Go fuzz targets for an extended cycle (e.g., 5 minutes per target) on a nightly schedule. Failures open issues automatically.

All jobs run on the latest stable Ubuntu runner image. macOS jobs cover the Apple Silicon developer path for the Ollama/MLX integration tests.

14.3 Test and Coverage Gate

Standard test invocation in CI:

go test -race -shuffle=on -count=1 \
        -covermode=atomic -coverprofile=coverage.out \
        ./...
  • Coverage uploaded to Codecov on every CI run. Codecov badge in README.
  • Coverage gate: >80% of statements for core/.... PRs that drop coverage below the threshold fail CI.
  • cmd/... and examples/... are exercised but excluded from the gate; they exist primarily as worked examples and integration scaffolding.
  • -race is mandatory in CI and recommended locally.
  • -shuffle=on to surface order-dependent tests.
  • Replay tests live under testdata/ and run as part of the standard suite.
  • Integration tests gated behind the integration build tag run only on the self-hosted runner.

14.4 Build

go build runs as a separate step before tests, on its own to fail fast on compile errors before the longer test suite kicks off. Build verification includes:

  • go build ./... for every supported platform via a matrix (darwin/arm64, darwin/amd64, linux/amd64, linux/arm64).
  • go vet ./....
  • go mod verify.
  • go mod tidy -diff to confirm go.mod and go.sum are clean (no hidden drift).
  • gofmt -l . to fail on unformatted files.
  • Build flags for release artifacts: -trimpath, ldflags -s -w only on release binaries (debug builds keep symbols).
  • CGO disabled (CGO_ENABLED=0) for portability.

14.5 Linting

golangci-lint with a comprehensive, opinionated configuration in .golangci.yml. Enabled linters (in addition to the default set):

errcheck, errorlint, errname, govet, ineffassign, staticcheck, unused, gosimple, gofmt, goimports, misspell, revive, gocritic, gocyclo, gocognit, gosec, prealloc, unconvert, unparam, copyloopvar, intrange, nilerr, nilnil, contextcheck, durationcheck, exhaustive, gomoddirectives, gomodguard, importas, predeclared, whitespace, godot, dupl, nolintlint.

Specific rules:

  • gocyclo cyclomatic complexity threshold 15.
  • gocognit cognitive complexity threshold 20.
  • dupl set to flag genuine duplication, with a high enough threshold that table-driven test rows don't trip it.
  • nolintlint enforces that every //nolint: directive includes a reason.
  • gosec runs in lint mode here; a separate gosec job (section 14.6) runs with stricter settings.

The lint job fails CI on any new finding. Existing legitimate exceptions are listed inline with //nolint:<linter> // <reason>.

14.6 Security Scanning

Multiple complementary scanners. They overlap intentionally; coverage gaps in one are filled by another.

  • gosec: dedicated job using securego/gosec GitHub Action with full ruleset. Findings as SARIF uploaded to the security tab.
  • govulncheck: official golang.org/x/vuln/cmd/govulncheck against ./... on every CI run. Failures block the merge.
  • Semgrep: returntocorp/semgrep-action with rulesets p/security-audit, p/secrets, p/golang, p/owasp-top-ten. Findings posted as PR comments and uploaded as SARIF.
  • CodeQL: github/codeql-action with go language, security-extended and security-and-quality query packs.
  • Trivy: aquasecurity/trivy-action filesystem scan for misconfigurations and secrets. Container scan added when the project starts publishing images.

All scanner outputs go through GitHub's SARIF interface so findings are visible in the security tab and surfaceable in PR review.

14.7 Supply Chain Security

  • OpenSSF Scorecard: ossf/scorecard-action weekly. Target score >=8.0. Score badge in README.
  • SLSA Level 3 provenance: slsa-framework/slsa-github-generator produces signed provenance for every release artifact.
  • SBOM: generated via anchore/sbom-action (Syft) in both CycloneDX and SPDX formats; attached to every GitHub release.
  • Cosign keyless signing: every release artifact, every container image, every SBOM signed via Sigstore OIDC. Verification commands documented in SECURITY.md.
  • Reproducible builds: -trimpath, fixed module cache, frozen BUILD_ID from the tag's commit. Documented procedure for third-party reproduction.
  • License scan: google/go-licenses blocks PRs that introduce dependencies with non-permissive licenses (anything not in an allowlist of MIT, Apache-2.0, BSD-2-Clause, BSD-3-Clause, ISC, MPL-2.0, etc.).

14.8 Releases

Releases are tag-driven. Cutting vX.Y.Z triggers release.yml, which runs GoReleaser.

  • Strict semantic versioning. Pre-1.0 tags signal API instability.
  • GoReleaser config at .goreleaser.yaml.
  • Cross-compiled binaries: darwin/arm64, darwin/amd64, linux/amd64, linux/arm64, windows/amd64.
  • Each binary signed with cosign and accompanied by a .sig and a .cert.
  • SBOMs (CycloneDX and SPDX) attached.
  • SLSA Level 3 provenance attached.
  • Conventional Commits enforced for changelog generation. PR titles validated by a CI check using commitlint or equivalent.
  • Container images (when introduced) published to ghcr.io/plexara/plexara-agents with cosign signature and SBOM.
  • Homebrew tap formula updated automatically by GoReleaser for the ask and repl binaries.

14.9 Dependency Management

  • Dependabot for gomod and github-actions, weekly schedule, grouped updates for non-major bumps.
  • Renovate considered as an alternative; for v1 stick with Dependabot to keep the toolchain native to GitHub.
  • govulncheck provides the runtime safety net for transitive vulnerabilities Dependabot cannot reach.
  • No vendoring. Modules are resolved from the proxy and verified via go.sum and go mod verify.

14.10 Action Pinning

Every third-party GitHub Action pinned to a full commit SHA, with the human-readable version as a trailing comment:

- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
- uses: actions/setup-go@0a12ed9d6a96ab950c8f026ed9f722fe0da7ef32   # v5.0.2
- uses: golangci/golangci-lint-action@aaa42aa0628b4ae2578232a66b541047968fac86 # v6.1.0

This is non-negotiable: SHA pinning is what closes the supply-chain attack surface that tag-pinning leaves open. Dependabot updates the SHAs on its weekly cadence; reviewers verify the new SHA points at the same release the comment claims.

First-party actions (actions/*, github/*) follow the same rule. The Scorecard Pinned-Dependencies check enforces this; missing pins fail Scorecard.

14.11 Pre-commit and Local Tooling

  • .pre-commit-config.yaml with hooks: trailing-whitespace, end-of-file-fixer, check-yaml, check-json, check-added-large-files, mixed-line-ending.
  • Local make targets that mirror CI: make build, make test, make lint, make sec, make cover, make tidy. Developers run these before pushing.
  • A tools.go file under internal/tools/ pins versions of the dev-only tools (golangci-lint, goimports, govulncheck) so that go install produces reproducible local toolchains.
  • pre-commit hooks are a developer convenience; CI is the source of truth.

14.12 Frontend Build (where applicable)

For v1, no frontend exists. The core library is consumed by terminal binaries.

When the hosted web variant lands (separate repository, see section 19), the standards inherited and enforced are:

  • TypeScript with "strict": true and noUncheckedIndexedAccess.
  • Vite or equivalent for build, with type-only imports enforced.
  • ESLint with @typescript-eslint/strict-type-checked and security plugins.
  • Prettier for formatting.
  • tsc --noEmit typecheck as a CI gate.
  • Dependency scanning via npm audit, osv-scanner, and Dependabot.
  • Same CodeQL, Semgrep, Scorecard, and Cosign posture as the Go side.

This section is documented now so the future repository is set up correctly from day one.

14.13 Badges

Badges in README.md, in this order:

  1. CI status
  2. Coverage (Codecov)
  3. Go Report Card
  4. OpenSSF Scorecard
  5. Latest release
  6. License
  7. Go reference (pkg.go.dev)
  8. Slack/Discord/Discussions (if community channels exist)

15. Coding Standards

  • Go 1.26+. We use range-over-func iterators where they earn their keep, generics where they remove duplication, and neither for show.
  • gofmt, goimports, and golangci-lint clean. Lint config in .golangci.yml, conservative rule set.
  • Public APIs documented with full sentences. Private functions documented when they are non-obvious, not as a rule.
  • Interfaces defined at the consumer, not the producer. The Provider interface lives in core/provider because the loop, the router, and tests all consume it; defining it inside core/loop would be wrong. When in doubt, define small.
  • No package-level mutable state. Constructors return values; lifetimes are explicit.
  • context.Context is the first parameter of any function that does I/O. No exceptions.
  • Errors are values; panics are bugs. The agent loop recovers from panics in tool handlers and surfaces them as event.Error, but does not recover from panics in the Provider or core. Those crash by design.
  • internal/ is used freely. If a package is not meant to be imported by downstream consumers, it goes there.

Idiom we are deliberately avoiding

Functional options for the Provider and MCP client are appealing but cost readability. We use plain config structs with sensible zero values instead. Functional options are reserved for places where a long tail of optional knobs really exists (the agent loop's Config).


16. Dependencies

16.1 Runtime

A small list, intentionally:

  • github.com/modelcontextprotocol/go-sdk MCP client (official, jointly maintained by Anthropic and Google).
  • github.com/charmbracelet/bubbletea REPL TUI (only pulled in by cmd/repl, not by core).
  • gopkg.in/yaml.v3 config parsing.
  • golang.org/x/sync/errgroup concurrent MCP server connection management.

That's it for core. The OpenTelemetry path adds dependencies behind a build tag.

No HTTP framework. The standard library is fine for the OpenAI-compatible client and any future server.

No agent framework. We are the agent framework.

16.2 Developer and CI Tooling

Pinned via internal/tools/tools.go (the standard //go:build tools pattern) so go install produces reproducible local toolchains:

  • github.com/golangci/golangci-lint/cmd/golangci-lint
  • golang.org/x/tools/cmd/goimports
  • golang.org/x/vuln/cmd/govulncheck
  • github.com/securego/gosec/v2/cmd/gosec
  • github.com/google/go-licenses
  • github.com/anchore/syft/cmd/syft (SBOM, used by GoReleaser)
  • github.com/sigstore/cosign/v2/cmd/cosign (signing, used by release workflow)

CI installs these from the same pinned versions. Local make tools produces an identical toolchain.


17. Plexara Demo Workflows

examples/acme-revenue/ is the headline. It demonstrates:

  1. User question: natural language revenue query.
  2. Toolkit classifier narrows to datahub_* and trino_*.
  3. Model sees Plexara's enriched tool descriptions and chooses to call datahub_search and datahub_get_schema to ground itself before writing SQL.
  4. Model issues trino_explain (via the SQL safety wrapper), then trino_query.
  5. Result formatted as a small table in the response.
  6. Saved session replayable as a regression test.

The point of this example is to show that an unmodified, generic agent driving a richly enriched MCP produces good results. The MCP earns its keep; the agent stays small.

examples/acme-lineage/ demonstrates the lineage walk: a question about downstream impact triggers datahub_get_lineage calls, the model assembles a small graph in its response, and Plexara's glossary tools (datahub_get_glossary_term) come into play when terms need defining. Again, the agent does not coordinate this; the model drives it because Plexara's tool descriptions make the path obvious.

examples/multi-mcp/ connects to two MCP servers at once (Plexara ACME and a second small server, possibly Filesystem from the official examples) and demonstrates that namespacing and routing work cleanly across servers.

Each example has a README, a main.go that's small enough to read in one sitting, and a recorded session for replay testing.


18. Roadmap

v0.1 (initial public release)

Code:

  • core/event, core/provider/openai_compatible, core/mcp, core/session, core/loop, core/router/{passthrough,toolkit_classifier}, core/approval.
  • cmd/ask.
  • examples/acme-revenue with replay test.

Documentation:

  • README with full badge set.
  • Architecture doc.
  • One ADR (provider model choice and the decision to ship only openai_compatible in v1).
  • CONTRIBUTING.md, CODE_OF_CONDUCT.md, SECURITY.md, CODEOWNERS, issue and PR templates.

CI and supply chain (the entire section 14 baseline):

  • ci.yml, security.yml, codeql.yml, scorecard.yml, dependency-review.yml, release.yml, fuzz.yml.
  • All third-party actions SHA-pinned with version comments.
  • Coverage gate at >80% for core/..., Codecov upload.
  • gosec, govulncheck, Semgrep, CodeQL, Trivy fs scan all green.
  • OpenSSF Scorecard >=8.0.
  • GoReleaser config with cosign signing, SBOM, SLSA Level 3 provenance.
  • Dependabot configured for gomod and github-actions.
  • Branch protection on main enforced.
  • Pre-commit config and Makefile.

v0.2

  • cmd/repl with TUI and slash commands.
  • SQL safety middleware.
  • examples/acme-lineage, examples/multi-mcp.
  • Documented MLX runtime path.
  • Session save/load.

v0.3

  • Policy approver.
  • Token-aware session truncation via runtime /tokenize.
  • OpenTelemetry tracing behind a build tag.

v1.0

  • Documented stability for the core API.
  • Comprehensive ADRs for every significant design decision.
  • Benchmarks comparing Qwen3-30B-A3B against alternatives on the ACME workflows.

Post-1.0

A separate repository (plexara-agents-server or similar) builds a multi-user web service on top of core. That is a different project with a different deployment story; this one stays single-user, local-first, terminal-native.


19. Future: Hosted Multi-User

Out of scope for this repository, but worth recording the shape so core does not paint into a corner.

When the hosted variant is built, core will be imported unchanged. The differences are at the edges:

  • Runtime: vLLM or sglang behind an OpenAI-compatible endpoint, batching concurrent sessions for throughput.
  • Hardware: single L40S (48 GB) or dual RTX 5090 reasonable starting point.
  • Concurrency: one loop.Run call per HTTP session, with sessions persisted to Postgres or similar.
  • Approval: policy-based (no human at the terminal), with sensitive operations rejected outright or escalated to an out-of-band review.
  • Auth: per-tenant MCP server selection; each user's mcp.Client is built from their authorized server list.

If the v1 core API supports all of that without modification, we did our job.


20. Open Questions

These are deliberately unresolved and should be settled before implementation begins.

  1. Toolkit classifier prompt. The first version will be hand-tuned for Plexara's namespaces. Should a generic toolkit-classification prompt ship in core, or should each MCP user supply their own? Leaning toward "shipped generic, override per MCP."
  2. Tool-name namespacing separator. Double underscore (server__tool) is safe but ugly. Other MCP clients have used colon and dot. The choice locks in: do we want to align with any community-emerging convention? Worth surveying before locking in.
  3. Resource handling. MCP resources are first-class but the model has no native way to consume "resource references" mid-stream. v1 will inline small resources into the system prompt and link larger ones; v2 may grow a richer resource handler. Worth an ADR.
  4. Persistence format. JSON Lines for sessions is simple and tooling-friendly but verbose. Some hybrid (JSONL for messages, sidecar for metadata) may end up cleaner. Decide before v0.2.

21. References

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions