Skip to content
This repository was archived by the owner on Jun 2, 2026. It is now read-only.
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ infra-controller-rest/
├── api/ # Main REST API server (Echo-based)
├── auth/ # Authentication (Keycloak, JWT, service accounts)
├── cert-manager/ # Native PKI certificate management (credsmgr)
├── cli/ # CLI client (nicocli) with TUI
├── cli/ # CLI client (nicocli) with TUI and MCP server mode
├── common/ # Shared utilities and configuration
├── db/ # Database layer (Bun ORM, pgx, migrations)
├── deploy/ # Kubernetes deployment (Kind, Kustomize, Helm)
Expand Down
62 changes: 62 additions & 0 deletions cli/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -333,6 +333,68 @@ To skip the config selector and connect to a specific environment directly:
nicocli --config ~/.nico/config.prod.yaml tui
```

## MCP Server Mode

`nicocli mcp serve` exposes the NICo REST read surface (every `GET` operation in the embedded OpenAPI spec) as Model Context Protocol tools over streamable-HTTP. It is intended to run behind an MCP-aware gateway such as the [Latinum Agent Gateway](https://agentgateway.dev), but works standalone for local development.

```bash
# --base-url/--org/--api-name/--token/--token-command are root-level flags
# and must be passed BEFORE the `mcp serve` subcommand.
nicocli \
--base-url https://nico.example.com \
--org tester \
mcp serve \
--listen :8080 \
--path /mcp
```

### Properties

- **Read-only.** Only `GET` operations are exposed. Mutating routes (`POST`, `PATCH`, `PUT`, `DELETE`) are intentionally excluded.
- **Tool naming.** Tools are named `nico_<snake_case(operationId)>` (e.g. `nico_get_all_site`, `nico_validate_rack`).
- **Stateless and request/response only.** The server sets `Stateless: true` and `JSONResponse: true` on the MCP streamable-HTTP handler -- responses are always `Content-Type: application/json`, never `text/event-stream`, and the server retains no per-session state. This is compatible with deployment behind a NATS-bridged shard proxy.
- **JWT passthrough.** The `Authorization: Bearer <jwt>` header on the inbound MCP request is forwarded unchanged to NICo REST. NICo REST validates the JWT, resolves the caller org, and enforces role-based authorisation. The MCP layer never makes the authz decision itself.

### Flags

| Flag | Env Var | Description |
|------|---------|-------------|
| `--listen` | `NICO_MCP_LISTEN` | Listen address (default `:8080`) |
| `--path` | `NICO_MCP_PATH` | HTTP path the MCP handler is mounted at (default `/mcp`) |
| `--shutdown-timeout` | `NICO_MCP_SHUTDOWN_TIMEOUT` | Graceful shutdown timeout (default `10s`) |

`--base-url`, `--org`, `--api-name`, `--token`, and `--token-command` are inherited from the root command and provide optional server-side defaults; each also reads its `NICO_*` environment variable. Because they are root-level flags, pass them **before** the `mcp serve` subcommand (e.g. `nicocli --org tester mcp serve --listen :8080`). Unlike the other `nicocli` commands, `mcp serve` does **not** read `~/.nico/config.yaml` -- the server is stateless and entirely parameter-driven, so `nicocli mcp serve` starts cleanly with no config file present and every connection detail is supplied per tool call (see below), falling back to these flags only when an argument is omitted.

### Per-call config overrides

Every typical config value can also be passed as an argument on each MCP tool call, layered on top of the server defaults:

| Tool arg | Equivalent flag | Config field |
|----------|-----------------|--------------|
| `org` | `--org` | `api.org` |
| `base_url` | `--base-url` | `api.base` |
| `api_name` | `--api-name` | `api.name` |
| `token` | `--token` | `auth.token` |

Precedence per tool call (first non-empty wins): tool argument → inbound `Authorization` header (token only) → server startup flag/env → `token_command` refresh on a 401. The MCP server does not read the on-disk config file. `token_command`, OIDC credentials, and NGC api_key settings are NOT exposed as tool arguments -- they are login-flow inputs configured server-side via flags/env.

### Probing the server

```bash
# List the tool catalogue
curl -sS http://localhost:8080/mcp \
-H 'Content-Type: application/json' \
-H 'Accept: application/json, text/event-stream' \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' | jq

# Call a specific tool
curl -sS http://localhost:8080/mcp \
-H 'Content-Type: application/json' \
-H 'Accept: application/json, text/event-stream' \
-H "Authorization: Bearer $TOKEN" \
-d '{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{"name":"nico_get_all_site","arguments":{}}}' | jq
```

## Troubleshooting

If `nicocli` is not found after install, make sure `$(go env GOPATH)/bin` is in your PATH:
Expand Down
2 changes: 2 additions & 0 deletions cli/cmd/cli/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ import (
"fmt"
"os"

"github.com/NVIDIA/infra-controller-rest/cli/mcp"
appcli "github.com/NVIDIA/infra-controller-rest/cli/pkg"
"github.com/NVIDIA/infra-controller-rest/cli/tui"
"github.com/NVIDIA/infra-controller-rest/openapi"
Expand All @@ -41,6 +42,7 @@ func main() {
return tui.RunTUI(c.String("config"))
},
})
app.Commands = append(app.Commands, mcp.Command(openapi.Spec))
if err := app.Run(os.Args); err != nil {
fmt.Fprintf(os.Stderr, "Error: %v\n", err)
os.Exit(1)
Expand Down
175 changes: 175 additions & 0 deletions cli/mcp/cmd.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
/*
* SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: Apache-2.0
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package mcp

import (
"context"
"errors"
"fmt"
"net/http"
"os"
"os/signal"
"syscall"
"time"

"github.com/sirupsen/logrus"
urfave "github.com/urfave/cli/v2"
)

// Command returns the "mcp" urfave/cli command tree for nicocli. Wire
// it into the binary's command list from cmd/cli/main.go alongside the
// dynamically-generated commands the rest of the CLI ships with.
//
// specData is the OpenAPI YAML the rest of the CLI is built from; the
// command's "serve" action passes it to BuildServer so the MCP tool
// catalogue stays in lockstep with every nicocli build.
func Command(specData []byte) *urfave.Command {
return &urfave.Command{
Name: "mcp",
Usage: "Run an MCP server that exposes the NICo REST read surface over streamable-HTTP",
Subcommands: []*urfave.Command{
serveCommand(specData),
},
}
}

func serveCommand(specData []byte) *urfave.Command {
return &urfave.Command{
Name: "serve",
Usage: "Start the streamable-HTTP MCP server",
Description: "Serves the NICo REST read surface as MCP tools at /mcp on the\n" +
"configured listen address. The server is stateless and never emits\n" +
"text/event-stream responses; every tool/call returns a single JSON\n" +
"body. In production, place an MCP-aware gateway in front and rely on\n" +
"the inbound Authorization header for per-call authentication.",
Flags: []urfave.Flag{
&urfave.StringFlag{
Name: "listen",
Usage: "address:port to listen on",
EnvVars: []string{"NICO_MCP_LISTEN"},
Value: ":8080",
},
&urfave.StringFlag{
Name: "path",
Usage: "HTTP path the MCP handler is mounted at",
EnvVars: []string{"NICO_MCP_PATH"},
Value: "/mcp",
},
&urfave.DurationFlag{
Name: "shutdown-timeout",
Usage: "graceful shutdown timeout when SIGINT/SIGTERM arrives",
EnvVars: []string{"NICO_MCP_SHUTDOWN_TIMEOUT"},
Value: 10 * time.Second,
},
},
Action: func(c *urfave.Context) error {
return runServe(c, specData)
},
}
}

// runServe wires the urfave context into Options, builds the MCP server,
// and runs an http.Server until SIGINT/SIGTERM. It is split out from the
// urfave Action closure so tests can drive it directly.
func runServe(c *urfave.Context, specData []byte) error {
opts := buildServeOptions(c)

server, err := BuildServer(specData, opts)
if err != nil {
return fmt.Errorf("building MCP server: %w", err)
}

listen := c.String("listen")
path := c.String("path")
if path == "" || path[0] != '/' {
return fmt.Errorf("invalid --path %q: must be non-empty and start with '/'", path)
}
shutdownTimeout := c.Duration("shutdown-timeout")

mux := http.NewServeMux()
if err := registerHandler(mux, path, NewHandler(server)); err != nil {
return err
}

httpServer := &http.Server{
Addr: listen,
Handler: mux,
ReadHeaderTimeout: 10 * time.Second,
}

opts.Log.Infof("nico-mcp: listening on %s, MCP at %s (stateless, JSONResponse)", listen, path)

errCh := make(chan error, 1)
go func() {
errCh <- httpServer.ListenAndServe()
}()

sigCh := make(chan os.Signal, 1)
signal.Notify(sigCh, syscall.SIGINT, syscall.SIGTERM)
defer signal.Stop(sigCh)

Comment on lines +122 to +125

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate and display the relevant section in cli/mcp/cmd.go
FILE="cli/mcp/cmd.go"
if [ ! -f "$FILE" ]; then
  echo "Missing file: $FILE"
  exit 1
fi

echo "== Context around requested lines =="
nl -ba "$FILE" | sed -n '90,170p'

Repository: NVIDIA/infra-controller-rest

Length of output: 171


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Search for other signal handlers / cleanup patterns across the codebase
rg -n --hidden --no-ignore-vcs "signal\.Notify\(|signal\.Stop\(|signal\.Reset\(" cli -S || true

Repository: NVIDIA/infra-controller-rest

Length of output: 146


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check for any tests that might repeatedly invoke this code path
rg -n --hidden --no-ignore-vcs "mcp.*cmd|cli/mcp|New.*MCP|cmd\.go" . -S --glob '*_test.go' || true

Repository: NVIDIA/infra-controller-rest

Length of output: 54


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="cli/mcp/cmd.go"
# Show around the target region (avoid nl)
sed -n '100,150p' "$FILE" | cat -n

Repository: NVIDIA/infra-controller-rest

Length of output: 1846


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find all signal.Notify / signal.Stop usage in this repo
rg -n --hidden --no-ignore-vcs "signal\.Notify\(|signal\.Stop\(" . -S || true

Repository: NVIDIA/infra-controller-rest

Length of output: 671


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show the whole function / block likely containing the snippet
# (heuristic: print from the nearest preceding 'func' to ~30 lines after the snippet region)
FILE="cli/mcp/cmd.go"
start=$(awk 'NR>=1 && /func / {last=$1} NR>=80 && NR<=150 { } END {print last}' < "$FILE")
echo "Heuristic start NR: $start"
sed -n "${start},${start+120}p" "$FILE" | cat -n

Repository: NVIDIA/infra-controller-rest

Length of output: 167


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Check whether the repo uses signal.Stop anywhere
rg -n --hidden --no-ignore-vcs "signal\.Stop\(" . || true

# 2) Inspect a few other call sites around their signal.Notify to learn prevailing patterns
for f in \
  "site-agent/cmd/site-agent/main.go" \
  "site-agent/pkg/components/managers/context.go" \
  "powershelf-manager/cmd/serve.go" \
  "nvswitch-manager/cmd/serve.go" \
  "flow/cmd/serve.go" \
  "cert-manager/pkg/core/context.go"; do
  echo "---- $f ----"
  # Print ~60 lines around the first signal.Notify occurrence in each file
  ln=$(rg -n "signal\.Notify\(" "$f" | head -n1 | cut -d: -f1 || true)
  if [ -n "${ln:-}" ]; then
    start=$((ln-20))
    end=$((ln+25))
    sed -n "${start},${end}p" "$f" | cat -n
  else
    echo "No signal.Notify found in $f (unexpected)."
  fi
done

# 3) Inspect a bit more around cli/mcp/cmd.go signal.Notify block
echo "---- cli/mcp/cmd.go ----"
ln=$(rg -n "signal\.Notify\(sigCh" cli/mcp/cmd.go | head -n1 | cut -d: -f1 || true)
if [ -n "${ln:-}" ]; then
  start=$((ln-30))
  end=$((ln+40))
  sed -n "${start},${end}p" cli/mcp/cmd.go | cat -n
fi

Repository: NVIDIA/infra-controller-rest

Length of output: 9983


Stop signal notifications on exit to avoid lingering signal registrations.

signal.Notify(sigCh, ...) registers sigCh with the runtime, but this code never calls signal.Stop(sigCh), so repeated in-process invocations (e.g., tests) can accumulate stale registrations even after the server goroutine exits.

Proposed fix
 	sigCh := make(chan os.Signal, 1)
 	signal.Notify(sigCh, syscall.SIGINT, syscall.SIGTERM)
+	defer signal.Stop(sigCh)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
sigCh := make(chan os.Signal, 1)
signal.Notify(sigCh, syscall.SIGINT, syscall.SIGTERM)
sigCh := make(chan os.Signal, 1)
signal.Notify(sigCh, syscall.SIGINT, syscall.SIGTERM)
defer signal.Stop(sigCh)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cli/mcp/cmd.go` around lines 122 - 124, The code registers sigCh with
signal.Notify but never unregisters it; add a call to signal.Stop(sigCh) to
unregister the channel when the server/goroutine exits (either via a defer
immediately after signal.Notify(sigCh, ...) or in the existing shutdown/cleanup
path) so that repeated in-process runs don't accumulate stale signal
registrations; reference the sigCh variable and the existing signal.Notify usage
and place signal.Stop(sigCh) in the same scope where the server shutdown is
handled.

select {
case err := <-errCh:
if err != nil && !errors.Is(err, http.ErrServerClosed) {
return fmt.Errorf("http server: %w", err)
}
return nil
case sig := <-sigCh:
opts.Log.Infof("nico-mcp: received %s, shutting down", sig)
ctx, cancel := context.WithTimeout(context.Background(), shutdownTimeout)
defer cancel()
if err := httpServer.Shutdown(ctx); err != nil {
return fmt.Errorf("graceful shutdown: %w", err)
}
return nil
}
}

func registerHandler(mux *http.ServeMux, path string, handler http.Handler) (err error) {
defer func() {
if r := recover(); r != nil {
err = fmt.Errorf("invalid --path %q: %v", path, r)
}
}()
mux.Handle(path, handler)
return nil
}

// buildServeOptions resolves the MCP server's start-up defaults from the
// process flags (each of which also reads its NICO_* environment
// variable). Unlike the dynamically-generated commands, mcp serve does
// NOT read ~/.nico/config.yaml: the server is stateless and entirely
// parameter-driven, so every connection detail is supplied per tool call
// via resolveCallConfig, with these flag values as the only fallback.
// This lets "nicocli mcp serve" start cleanly with no config file present.
func buildServeOptions(c *urfave.Context) Options {
log := logrus.NewEntry(logrus.StandardLogger())
if c.Bool("debug") {
log.Logger.SetLevel(logrus.DebugLevel)
}

return Options{
BaseURL: c.String("base-url"),
Org: c.String("org"),
APIName: c.String("api-name"),
Token: c.String("token"),
TokenCommand: c.String("token-command"),
Debug: c.Bool("debug"),
Log: log,
}
}
38 changes: 38 additions & 0 deletions cli/mcp/cmd_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
/*
* SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: Apache-2.0
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package mcp

import (
"net/http"
"testing"

"github.com/stretchr/testify/require"
)

func TestRegisterHandler_ValidPath(t *testing.T) {
mux := http.NewServeMux()
err := registerHandler(mux, "/mcp", http.HandlerFunc(func(http.ResponseWriter, *http.Request) {}))
require.NoError(t, err)
}

func TestRegisterHandler_InvalidPatternReturnsError(t *testing.T) {
mux := http.NewServeMux()
err := registerHandler(mux, "/{", http.HandlerFunc(func(http.ResponseWriter, *http.Request) {}))
require.Error(t, err)
require.Contains(t, err.Error(), "invalid --path")
}
Loading