Project bootstrap

# plexara-agents

**A Go reference implementation for building MCP-driven AI agents.**

Status: Draft specification, v0.1
Repository: `github.com/plexara/plexara-agents`
License: Apache 2.0
Language: Go 1.26+

---

## 1. Purpose

`plexara-agents` is an open source Go project that does two things at once:

1. Ships a small, opinionated library (`core`) for building local-first AI agents that drive Model Context Protocol (MCP) servers.
2. Ships ready-to-run binaries that use the library to demonstrate the Plexara MCP data platform (`txn2/mcp-data-platform`), specifically against the public ACME Corp demo deployment.

The project is designed to serve as a reference: anyone reading the source should come away with a clear, idiomatic answer to "how do I build a Go agent that talks to one or more MCP servers using a local model." The Plexara demo is the headline use case, but nothing in `core` assumes Plexara is the MCP being driven.

### Why this exists

The MCP ecosystem has matured fast on the server side. Client tooling, especially in Go, has not. Most public agent code is Python and most public Python agent code optimizes for cloud APIs. A Go project that runs entirely on local inference, treats MCP as a first-class primitive, and is small enough to read end-to-end fills a real gap and makes the case for Plexara's architectural model along the way.

---

## 2. Goals and Non-Goals

### Goals

- Serve as a readable, idiomatic Go reference for MCP-driven agent construction.
- Run entirely on local model inference for the v1 release. No cloud API support, no fallback paths.
- Drive any MCP server. Plexara is the showcase, not a coupling.
- Provide a clean, minimal `core` library that other agents (current and future) can depend on without inheriting CLI or transport concerns.
- Support multiple binaries built on `core`: a single-shot CLI (`ask`), an interactive REPL (`repl`), and a starter for hosted deployments later.
- Ship with example workflows that exercise the Plexara ACME Corp demo end to end, demonstrating how an agent benefits from Plexara's context enrichment.

### Non-Goals (v1)

- Cloud API providers (Anthropic, OpenAI, Bedrock, etc.). Local only.
- Embedded inference. The agent talks HTTP to a local model server (Ollama or MLX), not via CGo or in-process llama.cpp.
- Web UI. The hosted web variant is planned, but `core` and the v1 binaries do not assume a browser.
- Plugin system. Composition happens at the Go module level, not via dynamic loading.
- A bespoke prompt framework. Prompts are plain text or templated Go strings, not a DSL.

---

## 3. Audience

Two readers, with different reading paths:

**The agent builder.** Someone who needs to build an MCP-driven agent for their own product. They read `core/` top to bottom, treat the binaries in `cmd/` as worked examples, and import `core` into their own project.

**The Plexara evaluator.** Someone evaluating whether Plexara's MCP design holds up under real agent traffic. They run `cmd/ask` against the ACME demo, look at the example workflows, and learn the platform by interaction.

The spec, the layout, and the code should serve both.

---

## 4. Architecture Overview

```
+---------------------+        +-----------------------+
|   cmd/ask           |        |   cmd/repl            |
|   (single-shot CLI) |        |   (interactive TUI)   |
+----------+----------+        +-----------+-----------+
           |                               |
           +---------------+---------------+
                           |
                  +--------v--------+
                  |     core        |
                  |                 |
                  | loop  session   |
                  | provider  mcp   |
                  | event  router   |
                  +--------+--------+
                           |
            +--------------+--------------+
            |                             |
   +--------v--------+          +---------v---------+
   |  Local model    |          |   MCP servers     |
   |  (Ollama / MLX) |          |   (Plexara, etc.) |
   +-----------------+          +-------------------+
```

The agent loop is a function: given a Provider, one or more MCP clients, and a user message, it produces a stream of events until the model signals completion. Everything else is composition around that function.

---

## 5. Module Layout

```
plexara-agents/                    # github.com/plexara/plexara-agents
├── core/                          # importable library, the heart of the project
│   ├── event/                     # event types (TextDelta, ToolCall, Finish, Error)
│   ├── provider/                  # Provider interface and adapters
│   │   ├── provider.go            # interface + shared types
│   │   ├── openai_compatible.go   # works for Ollama, llama.cpp server, MLX, vLLM
│   │   └── testing.go             # in-memory fake for tests
│   ├── mcp/                       # MCP client wrapper around go-sdk
│   │   ├── client.go              # connection lifecycle, multi-server aggregation
│   │   └── catalog.go             # tool catalog with toolkit grouping
│   ├── router/                    # tool router (toolkit classification, narrowing)
│   ├── session/                   # message history, persistence, replay
│   ├── loop/                      # the agent loop itself
│   ├── approval/                  # tool-call approval gates
│   └── log/                       # slog-based structured logging helpers
├── cmd/
│   ├── ask/                       # single-shot CLI: one question, one answer
│   └── repl/                      # interactive REPL with history and approval prompts
├── examples/
│   ├── acme-revenue/              # end-to-end Plexara ACME demo: revenue by region
│   ├── acme-lineage/              # lineage walk + downstream impact analysis
│   └── multi-mcp/                 # one agent, two MCP servers, demonstrating router
├── docs/
│   ├── architecture.md
│   ├── mcp-integration.md
│   ├── prompts/                   # canonical prompts kept under version control
│   └── adrs/                      # architecture decision records
├── go.mod
├── go.sum
├── Makefile
├── LICENSE
└── README.md
```

Rules:

- `core` has no dependencies on `cmd/` or `examples/`.
- `cmd/` binaries are intentionally small (target ~150 lines each). If a binary grows logic worth keeping, it moves into `core`.
- Examples are runnable. If they break, CI breaks.

---

## 6. Core Abstractions

### 6.1 Events

Everything streamed out of the agent loop is an event. Events are a closed sum type, expressed as an interface with a sealing method.

```go
// core/event/event.go
package event

type Event interface {
    isEvent()
}

type TextDelta struct {
    Text string
}
func (TextDelta) isEvent() {}

type ToolCallRequest struct {
    ID        string
    Name      string
    Server    string          // which MCP server owns this tool
    Arguments json.RawMessage // never partial; always complete JSON
}
func (ToolCallRequest) isEvent() {}

type ToolCallResult struct {
    ID      string
    Content []ToolContent   // text, image, resource refs
    IsError bool
}
func (ToolCallResult) isEvent() {}

type Finish struct {
    Reason FinishReason
    Usage  Usage
}
func (Finish) isEvent() {}

type Error struct {
    Err error
}
func (Error) isEvent() {}
```

Consumers (CLI, REPL, future web server) switch on event type and render accordingly.

### 6.2 Provider

Defined from the consumer side, narrow on purpose.

```go
// core/provider/provider.go
package provider

type Provider interface {
    // Stream returns a channel of events. The channel closes when the request
    // ends (success, error, or context cancellation). Implementations must
    // never emit a partial ToolCallRequest; they buffer until arguments
    // are complete JSON.
    Stream(ctx context.Context, req Request) (<-chan event.Event, error)

    // Name identifies the provider for logging and diagnostics.
    Name() string
}

type Request struct {
    Model    string
    Messages []Message
    Tools    []Tool
    // Sampling parameters; zero values mean provider default.
    Temperature *float32
    TopP        *float32
    MaxTokens   *int
}
```

Why a channel and not an iterator: range-over-func iterators are stable and tempting, but channels compose better with `select`, cancellation, and the existing context-cancellation idiom. An iterator wrapper can be added later if there's demand.

For v1 there is **one** Provider implementation: `OpenAICompatible`. Pointing it at `http://localhost:11434/v1` gives Ollama. Pointing it at an MLX server, llama.cpp's server, or a future vLLM deployment is a config change. This is deliberate: a single well-tested adapter beats four half-tested ones.

### 6.3 Session

```go
// core/session/session.go
package session

type Session struct {
    ID       string
    Messages []provider.Message
    Created  time.Time
    Updated  time.Time
}

func (s *Session) Append(m provider.Message)
func (s *Session) Truncate(maxTokens int, tokenizer Tokenizer) // sliding window
func (s *Session) Save(w io.Writer) error
func Load(r io.Reader) (*Session, error)
```

Session is a value type, easy to serialize, easy to replay. Persistence is JSON Lines on disk by default. Replay (feeding a saved session back into the loop) is a first-class operation, useful for debugging and for evaluations.

### 6.4 MCP Client

Thin wrapper over `modelcontextprotocol/go-sdk`. Responsibilities:

- Manage connections to one or more MCP servers concurrently.
- Aggregate tool catalogs across servers, namespaced by server name.
- Route tool calls to the right server.
- Reconnect on transport failure with bounded backoff.
- Expose resources and prompts published by each server to the loop.

```go
// core/mcp/client.go
package mcp

type Client struct { /* unexported */ }

type ServerConfig struct {
    Name      string
    Transport Transport          // stdio, sse, http
    Endpoint  string
    Headers   map[string]string
}

func New(cfgs []ServerConfig, opts ...Option) (*Client, error)

func (c *Client) Connect(ctx context.Context) error
func (c *Client) Close() error

func (c *Client) Catalog() *Catalog
func (c *Client) Call(ctx context.Context, req ToolCall) (ToolResult, error)
func (c *Client) Resources(ctx context.Context, server string) ([]Resource, error)
func (c *Client) Prompts(ctx context.Context, server string) ([]Prompt, error)
```

Tool names sent to the model are namespaced as `server__tool` (double underscore separator) to keep them legal under all provider tool-name regexes while remaining trivially parseable.

### 6.5 Agent Loop

The single most important file in the project. It is short on purpose.

```go
// core/loop/loop.go
package loop

type Config struct {
    Provider provider.Provider
    MCP      *mcp.Client
    Router   router.Router          // nil means "all tools, no narrowing"
    Approver approval.Approver      // nil means auto-approve
    Logger   *slog.Logger
    MaxSteps int                    // safety cap on tool-call iterations
}

func Run(ctx context.Context, cfg Config, sess *session.Session, userMessage string) (<-chan event.Event, error)
```

The loop does this and only this:

1. Append the user message to the session.
2. Call `Router.Narrow` if configured, to pick a relevant subset of tools for this turn.
3. Stream from the Provider with the narrowed tool set.
4. For each event:
   - `TextDelta`: forward.
   - `ToolCallRequest`: ask the Approver, then dispatch to MCP, append the result to the session, and re-stream from the Provider.
   - `Finish`: forward and exit.
   - `Error`: forward and exit.
5. Enforce `MaxSteps` so a misbehaving model cannot tool-loop indefinitely.

Cancellation, error wrapping, and structured logging happen here, not in providers.

### 6.6 Tool Router

The router is what makes a 30B local model tractable against a 30+ tool MCP surface. Without it, every turn carries the full catalog and small models drift.

```go
// core/router/router.go
package router

type Router interface {
    Narrow(ctx context.Context, userMessage string, catalog *mcp.Catalog) ([]mcp.Tool, error)
}
```

Two implementations ship in v1:

- `PassThrough`: returns the full catalog. Good for small MCP servers and large models.
- `ToolkitClassifier`: a lightweight first pass against the same Provider that asks "which toolkits are likely needed?" then returns only those toolkits' tools. Plexara's `datahub_*`, `trino_*`, `s3_*`, and `memory_*` namespaces map cleanly to toolkits.

The classifier prompt and toolkit definitions live in `docs/prompts/` so they are reviewable and version-controlled rather than buried in code.

### 6.7 Approval

Mutating tool calls (writes, deletes, expensive queries) require explicit human approval by default for interactive binaries. Single-shot binaries can be configured to auto-approve, deny, or prompt out-of-band.

```go
// core/approval/approval.go
package approval

type Decision int
const (
    Allow Decision = iota
    Deny
    AllowAll          // for this session
)

type Approver interface {
    Approve(ctx context.Context, call event.ToolCallRequest) (Decision, error)
}
```

Standard implementations:

- `AutoAllow`: trust the MCP server's declared permissions.
- `Interactive`: TTY prompt, used by `cmd/repl`.
- `Policy`: rule-based, e.g. allow all `datahub_*` reads, prompt on `trino_execute`, deny on `datahub_delete`.

---

## 7. Local Model Strategy

### 7.1 Target Model

**Primary:** Qwen3-30B-A3B at Q4_K_M quantization.

- Mixture of experts: 30B total, 3B active per token.
- Memory footprint around 17 to 18 GB at Q4, leaving room for context and OS on a 32 GB machine.
- Reliable tool-call emission in OpenAI-compatible format via Ollama or MLX.
- Strong enough at SQL generation to drive Trino through Plexara, especially with schema enrichment in context.

**Validated alternatives** (`docs/models.md` documents results):

- Qwen3-32B (dense): better single-pass reasoning, slower decode, similar memory.
- Mistral Small 3.2 (24B): solid generalist, slightly weaker at chained tool use.
- gpt-oss-20b: smaller footprint baseline.

Models below 14B parameters are not recommended for this agent. Their tool-call reliability falls off on chains longer than two or three steps, which is exactly the workload Plexara generates.

### 7.2 Runtime

**Default:** Ollama. One install command, OpenAI-compatible at `localhost:11434/v1`, ships on macOS, Linux, and Windows.

**Documented alternative:** mlx-lm's OpenAI-compatible server on Apple Silicon for roughly 1.5x to 2x faster decode at equivalent quants.

Both are reached through the same `OpenAICompatible` provider. Switching is a config change.

### 7.3 What the agent does NOT do

- Does not embed a tokenizer. Token counting for sliding-window truncation uses a length heuristic in v1 and a model-aware tokenizer (via a small HTTP `/tokenize` call to the runtime when available) in v1.1.
- Does not manage model downloads. `ollama pull qwen3:30b-a3b-q4_K_M` is a documented prerequisite, not a runtime concern.
- Does not implement speculative decoding, KV cache management, or other inference-layer concerns. Those belong in the runtime.

---

## 8. MCP Integration

### 8.1 Context Enrichment Is on the Server Side

Plexara MCP performs **context enrichment** as part of its protocol surface: tool descriptions carry domain context, `datahub_get_schema` and `datahub_get_glossary_term` exist as first-class tools, lineage is queryable, and resources expose curated views over the catalog. The agent in this project does not implement enrichment. It consumes whatever the MCP server presents.

This is the right division of responsibility. The MCP server has the catalog, the lineage graph, the access policies, and the domain knowledge. Putting enrichment logic in the agent would duplicate it and couple the agent to one MCP's data model.

What the agent does instead:

- **Surfaces the full enriched tool catalog** to the model after toolkit narrowing, so descriptions written by Plexara reach the model verbatim.
- **Calls discovery tools eagerly** when the toolkit classifier suggests they are relevant. For Plexara that means `datahub_search` and `datahub_get_schema` are likely first calls when the user asks about data, before the model attempts a `trino_query`.
- **Exposes MCP resources and prompts to the loop** so the model can pull curated context (resource reads, prompt templates) when the server offers them.

The Plexara examples make this concrete. `examples/acme-revenue/` shows a turn where the model issues `datahub_search` then `datahub_get_schema` before writing SQL, because the toolkit classifier surfaced both tools and Plexara's tool descriptions made their purpose obvious. No agent-side enrichment code is required for that to work.

### 8.2 Tool Routing

See section 6.6. Tool routing is the agent's responsibility because it depends on the user message and the conversation, not on the MCP server's catalog alone. Once narrowed, the agent passes the selected tools (with their server-supplied descriptions intact) to the model.

### 8.3 SQL Safety Pattern

For Plexara and any MCP exposing SQL execution, the agent ships with an optional `SQLValidator` middleware that:

1. Intercepts `*_query` and `*_execute` tool calls.
2. Calls the corresponding `*_explain` tool first.
3. If EXPLAIN fails, returns the error to the model as a tool result so the model self-corrects.
4. If EXPLAIN succeeds, lets the original call proceed.

This costs one extra round trip and dramatically reduces the rate at which local models produce broken queries. It is opt-in but enabled by default in the Plexara examples.

This is a client-side pattern, not enrichment. The MCP server already provides `_explain`; the agent just orchestrates the two-step.

### 8.4 Tool-Call Streaming Discipline

The agent never parses partial tool-call JSON. The `OpenAICompatible` provider buffers tool-call deltas internally and emits a `ToolCallRequest` event only when the runtime signals the call is complete (`finish_reason: tool_calls` or equivalent). This is non-negotiable. Half-parsed tool calls are the single most common source of agent flakiness in the wild.

---

## 9. CLI Surfaces

### 9.1 `cmd/ask`

Single-shot. One question in, streamed answer out, exit.

```
ask --model qwen3:30b-a3b --mcp plexara-acme \
    "Top five products by revenue in the West region last quarter"
```

Flags:

- `--model`: model name passed to the runtime.
- `--mcp`: named MCP config from `~/.config/plexara-agents/config.yaml`, or a path.
- `--no-router`: disable toolkit narrowing, send the full catalog every turn.
- `--auto-approve`: bypass approval prompts (for scripted use).
- `--session FILE`: append to or replay a session file.
- `--json`: structured event output for piping into tooling.

### 9.2 `cmd/repl`

Interactive. Multi-turn session with approval prompts, history, and slash commands.

Slash commands (planned):

- `/tools` list narrowed tools for the next turn
- `/catalog` show full catalog from all connected MCPs
- `/save FILE`, `/load FILE` session persistence
- `/explain` show the last tool call's parameters and result
- `/prompt` print the assembled system prompt for the next turn

Implementation uses `bubbletea` for the TUI layer. This pulls in a real dependency, but `bubbletea` is the de facto Go TUI library and the alternative (raw terminal handling) is not where reference-quality code should be spent.

---

## 10. Configuration

YAML, located at `~/.config/plexara-agents/config.yaml` by default, overridable with `--config`.

```yaml
defaults:
  model: qwen3:30b-a3b
  provider: ollama-local
  router: toolkit-classifier
  approval: interactive

providers:
  ollama-local:
    type: openai-compatible
    base_url: http://localhost:11434/v1
    api_key_env: OLLAMA_API_KEY    # optional, ignored if unset

  mlx-local:
    type: openai-compatible
    base_url: http://localhost:8080/v1

mcp_servers:
  plexara-acme:
    transport: http
    endpoint: https://mcp-demo.plexara.io
```

Configuration values are also overridable via flags and environment variables. Precedence: flag > env > config file > built-in default. No silent overrides.

---

## 11. Observability

Structured logging via `log/slog`, JSON by default, text in TTYs.

Every tool call emits a log line with: server, tool name, argument digest (hashed, not raw), latency, success or error class. Resource fetches and enrichment calls do the same.

Optional OpenTelemetry tracing behind a build tag. Off by default to keep the dependency tree light. When on, the agent loop emits one span per turn with child spans for enrichment, provider streaming, and each tool call.

A `--debug` flag dumps the full assembled system prompt, the narrowed tool list (with the MCP server's descriptions), and the messages sent to the Provider to stderr before each turn. This is the single most useful debugging affordance.

---

## 12. Error Handling

Conventions:

- All errors wrap with `fmt.Errorf("%w", ...)` and never lose the original.
- Sentinel errors live next to the package that owns them (`mcp.ErrServerUnavailable`, `provider.ErrModelNotFound`, etc.).
- The agent loop converts internal errors into `event.Error` for streaming consumers, but also returns them from `Run` for callers that want to handle errors imperatively.
- Network and tool errors are not fatal by default. The model sees the error as a tool result and is free to recover. Programmer errors (bad config, missing model) are fatal.

---

## 13. Testing

Three layers, plus fuzzing.

**Unit tests.** Each package, table-driven where it makes sense. The Provider interface has a `provider/testing.Fake` implementation that lets the loop, router, and approval be tested without any network or model. All tests run with `-race` and `-shuffle=on` in CI.

**Integration tests.** Run against a real MCP server and a real local model. Gated behind a build tag (`//go:build integration`) and an `INTEGRATION=1` env var. Run on a self-hosted CI runner with Ollama and Qwen3 pre-pulled, and locally during development. Never block the standard PR pipeline.

**Replay tests.** Saved sessions in `testdata/sessions/` are replayed against a recorded Provider transcript. This catches regressions in the loop, router, and tool dispatch without needing a live model or network. New examples must ship with at least one replay test; the spec is enforced by CI.

**Fuzz tests.** Native Go fuzzing for parsers and serializers: tool-name namespacing, event JSON marshaling, MCP server response handling, session file decoding. Fuzz corpora committed under `testdata/fuzz/`. CI runs short fuzz cycles per PR; longer scheduled runs catch regressions over time.

Coverage is measured with `-covermode=atomic` and uploaded to Codecov. Quality gates are defined in section 14.

---

## 14. CI, Security, and Repository Standards

This is an OSS reference project. The CI surface, supply-chain posture, and repository hygiene are part of what's being demonstrated. Standards align with what matured projects in the same space ship (kubefwd's pipeline is the reference baseline).

### 14.1 Repository Hygiene

Files committed at the repo root or under `.github/`:

- `LICENSE` (Apache 2.0).
- `NOTICE` if any third-party attribution is required.
- `README.md` with status badges (build, coverage, Go Report Card, Scorecard, license, latest release).
- `CONTRIBUTING.md` describing local dev setup, commit conventions, and the PR process.
- `CODE_OF_CONDUCT.md` (Contributor Covenant 2.1).
- `SECURITY.md` with a vulnerability disclosure policy, supported-versions matrix, and a security contact (`security@plexara.io` or equivalent). GitHub Private Vulnerability Reporting enabled.
- `CODEOWNERS` mapping directories to maintainers.
- `.github/ISSUE_TEMPLATE/` with bug, feature, and security-redirect templates.
- `.github/PULL_REQUEST_TEMPLATE.md`.
- `.github/dependabot.yml` configured for `gomod`, `github-actions`, and `docker` (if applicable).

Branch protection on `main`:

- Required PR review (at least one approver, code owner review for owned paths).
- Required status checks: build, test, lint, security, codeql, govulncheck, dependency-review.
- Linear history required (squash or rebase, no merge commits).
- Signed commits required.
- Force-pushes blocked.
- Auto-merge enabled for Dependabot PRs that pass all checks.

### 14.2 Continuous Integration

Workflows under `.github/workflows/`:

- `ci.yml`: build, vet, lint, test (race + coverage), `go mod verify`, `go mod tidy -diff`. Runs on every PR and every push to `main`.
- `security.yml`: gosec, govulncheck, Semgrep. Runs on every PR and on a weekly schedule.
- `codeql.yml`: GitHub CodeQL with the `security-extended` and `security-and-quality` query packs. Runs on every PR, every push to `main`, and weekly.
- `scorecard.yml`: OpenSSF Scorecard. Runs weekly and on `main` pushes; uploads SARIF to the security tab and publishes results.
- `dependency-review.yml`: blocks PRs that introduce dependencies with known vulnerabilities or non-permissive licenses.
- `release.yml`: triggered on `v*.*.*` tags; runs GoReleaser, generates SBOMs, signs artifacts, attaches SLSA provenance.
- `fuzz.yml`: runs Go fuzz targets for an extended cycle (e.g., 5 minutes per target) on a nightly schedule. Failures open issues automatically.

All jobs run on the latest stable Ubuntu runner image. macOS jobs cover the Apple Silicon developer path for the Ollama/MLX integration tests.

### 14.3 Test and Coverage Gate

Standard test invocation in CI:

```
go test -race -shuffle=on -count=1 \
        -covermode=atomic -coverprofile=coverage.out \
        ./...
```

- Coverage uploaded to Codecov on every CI run. Codecov badge in README.
- Coverage gate: **>80%** of statements for `core/...`. PRs that drop coverage below the threshold fail CI.
- `cmd/...` and `examples/...` are exercised but excluded from the gate; they exist primarily as worked examples and integration scaffolding.
- `-race` is mandatory in CI and recommended locally.
- `-shuffle=on` to surface order-dependent tests.
- Replay tests live under `testdata/` and run as part of the standard suite.
- Integration tests gated behind the `integration` build tag run only on the self-hosted runner.

### 14.4 Build

`go build` runs as a separate step before tests, on its own to fail fast on compile errors before the longer test suite kicks off. Build verification includes:

- `go build ./...` for every supported platform via a matrix (`darwin/arm64`, `darwin/amd64`, `linux/amd64`, `linux/arm64`).
- `go vet ./...`.
- `go mod verify`.
- `go mod tidy -diff` to confirm `go.mod` and `go.sum` are clean (no hidden drift).
- `gofmt -l .` to fail on unformatted files.
- Build flags for release artifacts: `-trimpath`, ldflags `-s -w` only on release binaries (debug builds keep symbols).
- CGO disabled (`CGO_ENABLED=0`) for portability.

### 14.5 Linting

`golangci-lint` with a comprehensive, opinionated configuration in `.golangci.yml`. Enabled linters (in addition to the default set):

`errcheck`, `errorlint`, `errname`, `govet`, `ineffassign`, `staticcheck`, `unused`, `gosimple`, `gofmt`, `goimports`, `misspell`, `revive`, `gocritic`, `gocyclo`, `gocognit`, `gosec`, `prealloc`, `unconvert`, `unparam`, `copyloopvar`, `intrange`, `nilerr`, `nilnil`, `contextcheck`, `durationcheck`, `exhaustive`, `gomoddirectives`, `gomodguard`, `importas`, `predeclared`, `whitespace`, `godot`, `dupl`, `nolintlint`.

Specific rules:

- `gocyclo` cyclomatic complexity threshold 15.
- `gocognit` cognitive complexity threshold 20.
- `dupl` set to flag genuine duplication, with a high enough threshold that table-driven test rows don't trip it.
- `nolintlint` enforces that every `//nolint:` directive includes a reason.
- `gosec` runs in lint mode here; a separate `gosec` job (section 14.6) runs with stricter settings.

The lint job fails CI on any new finding. Existing legitimate exceptions are listed inline with `//nolint:<linter> // <reason>`.

### 14.6 Security Scanning

Multiple complementary scanners. They overlap intentionally; coverage gaps in one are filled by another.

- **`gosec`**: dedicated job using `securego/gosec` GitHub Action with full ruleset. Findings as SARIF uploaded to the security tab.
- **`govulncheck`**: official `golang.org/x/vuln/cmd/govulncheck` against `./...` on every CI run. Failures block the merge.
- **Semgrep**: `returntocorp/semgrep-action` with rulesets `p/security-audit`, `p/secrets`, `p/golang`, `p/owasp-top-ten`. Findings posted as PR comments and uploaded as SARIF.
- **CodeQL**: `github/codeql-action` with `go` language, `security-extended` and `security-and-quality` query packs.
- **Trivy**: `aquasecurity/trivy-action` filesystem scan for misconfigurations and secrets. Container scan added when the project starts publishing images.

All scanner outputs go through GitHub's SARIF interface so findings are visible in the security tab and surfaceable in PR review.

### 14.7 Supply Chain Security

- **OpenSSF Scorecard**: `ossf/scorecard-action` weekly. Target score **>=8.0**. Score badge in README.
- **SLSA Level 3 provenance**: `slsa-framework/slsa-github-generator` produces signed provenance for every release artifact.
- **SBOM**: generated via `anchore/sbom-action` (Syft) in both CycloneDX and SPDX formats; attached to every GitHub release.
- **Cosign keyless signing**: every release artifact, every container image, every SBOM signed via Sigstore OIDC. Verification commands documented in `SECURITY.md`.
- **Reproducible builds**: `-trimpath`, fixed module cache, frozen `BUILD_ID` from the tag's commit. Documented procedure for third-party reproduction.
- **License scan**: `google/go-licenses` blocks PRs that introduce dependencies with non-permissive licenses (anything not in an allowlist of MIT, Apache-2.0, BSD-2-Clause, BSD-3-Clause, ISC, MPL-2.0, etc.).

### 14.8 Releases

Releases are tag-driven. Cutting `vX.Y.Z` triggers `release.yml`, which runs GoReleaser.

- Strict semantic versioning. Pre-1.0 tags signal API instability.
- GoReleaser config at `.goreleaser.yaml`.
- Cross-compiled binaries: `darwin/arm64`, `darwin/amd64`, `linux/amd64`, `linux/arm64`, `windows/amd64`.
- Each binary signed with cosign and accompanied by a `.sig` and a `.cert`.
- SBOMs (CycloneDX and SPDX) attached.
- SLSA Level 3 provenance attached.
- Conventional Commits enforced for changelog generation. PR titles validated by a CI check using `commitlint` or equivalent.
- Container images (when introduced) published to `ghcr.io/plexara/plexara-agents` with cosign signature and SBOM.
- Homebrew tap formula updated automatically by GoReleaser for the `ask` and `repl` binaries.

### 14.9 Dependency Management

- **Dependabot** for `gomod` and `github-actions`, weekly schedule, grouped updates for non-major bumps.
- **Renovate** considered as an alternative; for v1 stick with Dependabot to keep the toolchain native to GitHub.
- `govulncheck` provides the runtime safety net for transitive vulnerabilities Dependabot cannot reach.
- No vendoring. Modules are resolved from the proxy and verified via `go.sum` and `go mod verify`.

### 14.10 Action Pinning

Every third-party GitHub Action pinned to a full commit SHA, with the human-readable version as a trailing comment:

```yaml
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
- uses: actions/setup-go@0a12ed9d6a96ab950c8f026ed9f722fe0da7ef32   # v5.0.2
- uses: golangci/golangci-lint-action@aaa42aa0628b4ae2578232a66b541047968fac86 # v6.1.0
```

This is non-negotiable: SHA pinning is what closes the supply-chain attack surface that tag-pinning leaves open. Dependabot updates the SHAs on its weekly cadence; reviewers verify the new SHA points at the same release the comment claims.

First-party actions (`actions/*`, `github/*`) follow the same rule. The Scorecard `Pinned-Dependencies` check enforces this; missing pins fail Scorecard.

### 14.11 Pre-commit and Local Tooling

- `.pre-commit-config.yaml` with hooks: `trailing-whitespace`, `end-of-file-fixer`, `check-yaml`, `check-json`, `check-added-large-files`, `mixed-line-ending`.
- Local `make` targets that mirror CI: `make build`, `make test`, `make lint`, `make sec`, `make cover`, `make tidy`. Developers run these before pushing.
- A `tools.go` file under `internal/tools/` pins versions of the dev-only tools (`golangci-lint`, `goimports`, `govulncheck`) so that `go install` produces reproducible local toolchains.
- pre-commit hooks are a developer convenience; CI is the source of truth.

### 14.12 Frontend Build (where applicable)

For v1, no frontend exists. The `core` library is consumed by terminal binaries.

When the hosted web variant lands (separate repository, see section 19), the standards inherited and enforced are:

- TypeScript with `"strict": true` and `noUncheckedIndexedAccess`.
- Vite or equivalent for build, with type-only imports enforced.
- ESLint with `@typescript-eslint/strict-type-checked` and security plugins.
- Prettier for formatting.
- `tsc --noEmit` typecheck as a CI gate.
- Dependency scanning via `npm audit`, `osv-scanner`, and Dependabot.
- Same CodeQL, Semgrep, Scorecard, and Cosign posture as the Go side.

This section is documented now so the future repository is set up correctly from day one.

### 14.13 Badges

Badges in `README.md`, in this order:

1. CI status
2. Coverage (Codecov)
3. Go Report Card
4. OpenSSF Scorecard
5. Latest release
6. License
7. Go reference (pkg.go.dev)
8. Slack/Discord/Discussions (if community channels exist)

---

## 15. Coding Standards

- Go 1.26+. We use range-over-func iterators where they earn their keep, generics where they remove duplication, and neither for show.
- `gofmt`, `goimports`, and `golangci-lint` clean. Lint config in `.golangci.yml`, conservative rule set.
- Public APIs documented with full sentences. Private functions documented when they are non-obvious, not as a rule.
- Interfaces defined at the consumer, not the producer. The `Provider` interface lives in `core/provider` because the loop, the router, and tests all consume it; defining it inside `core/loop` would be wrong. When in doubt, define small.
- No package-level mutable state. Constructors return values; lifetimes are explicit.
- `context.Context` is the first parameter of any function that does I/O. No exceptions.
- Errors are values; panics are bugs. The agent loop recovers from panics in tool handlers and surfaces them as `event.Error`, but does not recover from panics in the Provider or core. Those crash by design.
- `internal/` is used freely. If a package is not meant to be imported by downstream consumers, it goes there.

### Idiom we are deliberately avoiding

Functional options for the Provider and MCP client are appealing but cost readability. We use plain config structs with sensible zero values instead. Functional options are reserved for places where a long tail of optional knobs really exists (the agent loop's `Config`).

---

## 16. Dependencies

### 16.1 Runtime

A small list, intentionally:

- `github.com/modelcontextprotocol/go-sdk` MCP client (official, jointly maintained by Anthropic and Google).
- `github.com/charmbracelet/bubbletea` REPL TUI (only pulled in by `cmd/repl`, not by `core`).
- `gopkg.in/yaml.v3` config parsing.
- `golang.org/x/sync/errgroup` concurrent MCP server connection management.

That's it for `core`. The OpenTelemetry path adds dependencies behind a build tag.

No HTTP framework. The standard library is fine for the OpenAI-compatible client and any future server.

No agent framework. We are the agent framework.

### 16.2 Developer and CI Tooling

Pinned via `internal/tools/tools.go` (the standard `//go:build tools` pattern) so `go install` produces reproducible local toolchains:

- `github.com/golangci/golangci-lint/cmd/golangci-lint`
- `golang.org/x/tools/cmd/goimports`
- `golang.org/x/vuln/cmd/govulncheck`
- `github.com/securego/gosec/v2/cmd/gosec`
- `github.com/google/go-licenses`
- `github.com/anchore/syft/cmd/syft` (SBOM, used by GoReleaser)
- `github.com/sigstore/cosign/v2/cmd/cosign` (signing, used by release workflow)

CI installs these from the same pinned versions. Local `make tools` produces an identical toolchain.

---

## 17. Plexara Demo Workflows

`examples/acme-revenue/` is the headline. It demonstrates:

1. User question: natural language revenue query.
2. Toolkit classifier narrows to `datahub_*` and `trino_*`.
3. Model sees Plexara's enriched tool descriptions and chooses to call `datahub_search` and `datahub_get_schema` to ground itself before writing SQL.
4. Model issues `trino_explain` (via the SQL safety wrapper), then `trino_query`.
5. Result formatted as a small table in the response.
6. Saved session replayable as a regression test.

The point of this example is to show that an unmodified, generic agent driving a richly enriched MCP produces good results. The MCP earns its keep; the agent stays small.

`examples/acme-lineage/` demonstrates the lineage walk: a question about downstream impact triggers `datahub_get_lineage` calls, the model assembles a small graph in its response, and Plexara's glossary tools (`datahub_get_glossary_term`) come into play when terms need defining. Again, the agent does not coordinate this; the model drives it because Plexara's tool descriptions make the path obvious.

`examples/multi-mcp/` connects to two MCP servers at once (Plexara ACME and a second small server, possibly Filesystem from the official examples) and demonstrates that namespacing and routing work cleanly across servers.

Each example has a README, a `main.go` that's small enough to read in one sitting, and a recorded session for replay testing.

---

## 18. Roadmap

### v0.1 (initial public release)

Code:
- `core/event`, `core/provider/openai_compatible`, `core/mcp`, `core/session`, `core/loop`, `core/router/{passthrough,toolkit_classifier}`, `core/approval`.
- `cmd/ask`.
- `examples/acme-revenue` with replay test.

Documentation:
- README with full badge set.
- Architecture doc.
- One ADR (provider model choice and the decision to ship only `openai_compatible` in v1).
- `CONTRIBUTING.md`, `CODE_OF_CONDUCT.md`, `SECURITY.md`, `CODEOWNERS`, issue and PR templates.

CI and supply chain (the entire section 14 baseline):
- `ci.yml`, `security.yml`, `codeql.yml`, `scorecard.yml`, `dependency-review.yml`, `release.yml`, `fuzz.yml`.
- All third-party actions SHA-pinned with version comments.
- Coverage gate at >80% for `core/...`, Codecov upload.
- gosec, govulncheck, Semgrep, CodeQL, Trivy fs scan all green.
- OpenSSF Scorecard >=8.0.
- GoReleaser config with cosign signing, SBOM, SLSA Level 3 provenance.
- Dependabot configured for `gomod` and `github-actions`.
- Branch protection on `main` enforced.
- Pre-commit config and Makefile.

### v0.2

- `cmd/repl` with TUI and slash commands.
- SQL safety middleware.
- `examples/acme-lineage`, `examples/multi-mcp`.
- Documented MLX runtime path.
- Session save/load.

### v0.3

- Policy approver.
- Token-aware session truncation via runtime `/tokenize`.
- OpenTelemetry tracing behind a build tag.

### v1.0

- Documented stability for the `core` API.
- Comprehensive ADRs for every significant design decision.
- Benchmarks comparing Qwen3-30B-A3B against alternatives on the ACME workflows.

### Post-1.0

A separate repository (`plexara-agents-server` or similar) builds a multi-user web service on top of `core`. That is a different project with a different deployment story; this one stays single-user, local-first, terminal-native.

---

## 19. Future: Hosted Multi-User

Out of scope for this repository, but worth recording the shape so `core` does not paint into a corner.

When the hosted variant is built, `core` will be imported unchanged. The differences are at the edges:

- **Runtime:** vLLM or sglang behind an OpenAI-compatible endpoint, batching concurrent sessions for throughput.
- **Hardware:** single L40S (48 GB) or dual RTX 5090 reasonable starting point.
- **Concurrency:** one `loop.Run` call per HTTP session, with sessions persisted to Postgres or similar.
- **Approval:** policy-based (no human at the terminal), with sensitive operations rejected outright or escalated to an out-of-band review.
- **Auth:** per-tenant MCP server selection; each user's `mcp.Client` is built from their authorized server list.

If the v1 `core` API supports all of that without modification, we did our job.

---

## 20. Open Questions

These are deliberately unresolved and should be settled before implementation begins.

1. **Toolkit classifier prompt.** The first version will be hand-tuned for Plexara's namespaces. Should a generic toolkit-classification prompt ship in `core`, or should each MCP user supply their own? Leaning toward "shipped generic, override per MCP."
2. **Tool-name namespacing separator.** Double underscore (`server__tool`) is safe but ugly. Other MCP clients have used colon and dot. The choice locks in: do we want to align with any community-emerging convention? Worth surveying before locking in.
3. **Resource handling.** MCP resources are first-class but the model has no native way to consume "resource references" mid-stream. v1 will inline small resources into the system prompt and link larger ones; v2 may grow a richer resource handler. Worth an ADR.
4. **Persistence format.** JSON Lines for sessions is simple and tooling-friendly but verbose. Some hybrid (JSONL for messages, sidecar for metadata) may end up cleaner. Decide before v0.2.

---

## 21. References

- Model Context Protocol specification: https://modelcontextprotocol.io
- `modelcontextprotocol/go-sdk`: https://github.com/modelcontextprotocol/go-sdk
- `txn2/mcp-data-platform`: the Plexara MCP server reference implementation.
- Plexara ACME Corp demo MCP: https://mcp-demo.plexara.io
- Qwen3 model family announcement and weights: published by Alibaba's Qwen team, Apache 2.0 licensed.
- Ollama: https://ollama.com
- mlx-lm OpenAI-compatible server: in the `ml-explore/mlx-examples` repository.


Project bootstrap #1

Description

plexara-agents

1. Purpose

Why this exists

2. Goals and Non-Goals

Goals

Non-Goals (v1)

3. Audience

4. Architecture Overview

5. Module Layout

6. Core Abstractions

6.1 Events

6.2 Provider

6.3 Session

6.4 MCP Client

6.5 Agent Loop

6.6 Tool Router

6.7 Approval

7. Local Model Strategy

7.1 Target Model

7.2 Runtime

7.3 What the agent does NOT do

8. MCP Integration

8.1 Context Enrichment Is on the Server Side

8.2 Tool Routing

8.3 SQL Safety Pattern

8.4 Tool-Call Streaming Discipline

9. CLI Surfaces

9.1 cmd/ask

9.2 cmd/repl

10. Configuration

11. Observability

12. Error Handling

13. Testing

14. CI, Security, and Repository Standards

14.1 Repository Hygiene

14.2 Continuous Integration

14.3 Test and Coverage Gate

14.4 Build

14.5 Linting

14.6 Security Scanning

14.7 Supply Chain Security

14.8 Releases

14.9 Dependency Management

14.10 Action Pinning

14.11 Pre-commit and Local Tooling

14.12 Frontend Build (where applicable)

14.13 Badges

15. Coding Standards

Idiom we are deliberately avoiding

16. Dependencies

16.1 Runtime

16.2 Developer and CI Tooling

17. Plexara Demo Workflows

18. Roadmap

v0.1 (initial public release)

v0.2

v0.3

v1.0

Post-1.0

19. Future: Hosted Multi-User

20. Open Questions

21. References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

9.1 `cmd/ask`

9.2 `cmd/repl`