A GIFT FROM TUESDAY'S LITTLE SHOP OF EXISTENTIAL HORRORS
ββββ ββββββββββββββββββ βββββββ βββββββ ββββββ βββββββ ββββββ βββ
βββββ βββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββββββ
ββββββ βββββββββ βββββββββββ βββ βββ βββββββββββββββββββββββββββ
ββββββββββββββββ βββββββββββ βββ βββ βββββββββββββββββββββββββββ
βββ βββββββββββββββββ βββββββββββ βββββββββββ ββββββββββββββ βββββββββββ
βββ ββββββββββββββββ ββββββββββ ββββββββββ ββββββββββ βββ βββββββββββ
A modular MCP server implementing 14 specialized AI agents for research operations, security, design, and organizational governance.
- What is NerdCabalMCP?
- Repository Structure
- Core Philosophy
- The 14 Agent Team
- Quick Start
- Installation
- Configuration
- Running the MCP Server
- Using Agents
- Testing Guide π§ͺ - Complete testing instructions
- Architecture
- API Reference
- Advanced Usage
- Troubleshooting
NerdCabalMCP/
βββ .claude/ # Claude Code configuration
β βββ agents/ # Custom Claude agents
β β βββ tutorial-scanner.md # Tutorial discovery agent
β β βββ tutorial-executor.md # Tutorial execution agent
β β βββ tutorial-tool-extractor-implementor.md
β β βββ test-verifier-improver.md # Testing agent
β β βββ environment-python-manager.md
β β βββ benchmark-extractor.md # Benchmark agents
β β βββ benchmark-judge.md
β β βββ benchmark-reviewer.md
β β βββ benchmark-solver.md
β βββ settings.json # Claude settings
βββ competitions/ # Competition submissions
β βββ ai-explorer/ # π AI Explorer hackathon
βββ docs/ # π All documentation
β βββ README.md # Docs index
β βββ QUICK_START.md # Quick start guide
β βββ MCP_SERVER_GUIDE.md # MCP server details
β βββ ARTIFEX_NERD_SWARM_ARCHITECTURE.md
β βββ API_KEY_MANAGEMENT.md
β βββ CORRECTED_DOCUMENTATION.md
β βββ README_ENTERPRISE_IP.md
β βββ ENHANCEMENTS_REPORT.md # Feature enhancements
β βββ HACKATHON_SUBMISSION.md # Submission details
β βββ IMPLEMENTATION_ROADMAP.md # Development roadmap
β βββ SUBMISSION_COMPLETE.md
βββ hackathon-submission/ # Hackathon materials
β βββ assets/ # Presentation assets
β β βββ slide-deck.md
β β βββ video-script.md
β β βββ cover-image-specs.md
β β βββ SLIDE_PRESENTATION.md
β β βββ VIDEO_PRESENTATION_SCRIPT.md
β β βββ COVER_IMAGE_DESIGN.md
β βββ demo/ # Demo application
β β βββ app/
β β βββ styles/
β βββ docs/ # Submission docs
β βββ INDEX.md # Submission index
βββ mcp-server/ # π§ MCP server (TypeScript)
β βββ src/ # Source code
β β βββ index.ts # Main server entry
β β βββ administrator-agent.ts # Org design agent
β β βββ archival-system.ts # IP evidence storage
β β βββ budget-agent.ts # Financial planning
β β βββ ciso-agent.ts # Security (STRIDE)
β β βββ compliance-engine.ts # GDPR/DMCA compliance
β β βββ comptroller-agent.ts # Operations (Iron Triangle)
β β βββ creative-director.ts # Design systems
β β βββ dataset-builder.ts # ML dataset creation
β β βββ experimental-designer.ts # Research methodology
β β βββ forensic-analyst.ts # Neural forensics (DSMMD)
β β βββ ip-analytics.ts # IP pattern detection
β β βββ ip-protection-suite.ts # IP suite orchestrator
β β βββ mlflow-agent.ts # MLflow queries
β β βββ orchestrator.ts # Multi-agent workflows
β β βββ rubric-architect.ts # LLM evaluation rubrics
β β βββ visual-inspector.ts # FiftyOne integration
β β βββ utils.ts # Shared utilities
β β βββ *-types.ts # TypeScript type definitions
β βββ examples/ # Usage examples
β βββ package.json # Dependencies
β βββ tsconfig.json # TypeScript config
β βββ mcp-config.json # MCP configuration
β βββ README.md # Server docs
βββ notebooks/ # π Jupyter notebooks
β βββ README.md
β βββ MATS_Neural_Forensics_Demo.ipynb
βββ PROOF_TO_PAY-AGENTIC_COMMERCE/ # Commerce project
β βββ assets/
β βββ demo/
β βββ docs/ # Commerce documentation
β βββ AGENTIC_COMMERCE_README.md
β βββ ARC_COMMERCE_ARCHITECTURE.md
β βββ QUICK_START_ARC_COMMERCE.md
βββ prompts/ # π¬ Prompt templates
β βββ step1_prompt.md
β βββ step2_prompt.md
β βββ step3_prompt.md
β βββ step4_prompt.md
β βββ step5_prompt.md
βββ scripts/ # π οΈ Utility scripts
β βββ Paper2Agent.sh
β βββ launch_remote_mcp.sh
βββ templates/ # π Project templates
β βββ AlphaPOP/
β βββ src/
β βββ test/
βββ tools/ # π¨ Build tools
β βββ benchmark_assessor.py
β βββ benchmark_extractor.py
β βββ benchmark_reviewer.py
β βββ extract_notebook_images.py
β βββ preprocess_notebook.py
βββ tutorials/ # π Tutorial content
β βββ siggraph-2026-sovereign-studio/
βββ web_ui/ # π Web interface
β βββ templates/
βββ LICENSE # MIT License
βββ README.md # π You are here
βββ TESTING.md # π§ͺ Comprehensive testing guide
βββ .gitignore # Git ignore rules
- π§ͺ Testing Guide: TESTING.md - START HERE for testing!
- π Hackathon Submission: hackathon-submission/
- π Full Documentation: docs/
- π Demo Notebooks: notebooks/
- π Quick Start Guide: docs/QUICK_START.md
- π§ MCP Server Guide: docs/MCP_SERVER_GUIDE.md
- π» Claude Code Guide: CLAUDE.MD
NerdCabalMCP is a Model Context Protocol (MCP) server that provides a co-scientist platform for AI-assisted research, operations, creative work, and enterprise IP protection. Think of it as your personal team of 17 specialized AI experts, each with deep domain knowledge and the ability to collaborate seamlessly.
- π§ 17 Specialized Agents: From financial planning to neural forensics to enterprise IP protection
- π‘οΈ Enterprise IP Protection Suite (NEW): IP analytics, compliance validation, cryptographic archival
- π A2A Protocol Compliant: Agent-to-Agent communication following Anthropic Design Kit standards
- ποΈ Modular Architecture: Each agent is independently deployable and upgradeable
- π Security-First: Built-in CISO agent with STRIDE threat modeling
- π Production-Ready: TypeScript + Python implementation with full type safety
- π Multi-Platform: Integrates with Claude Desktop, Streamlit, HuggingFace Spaces, and more
- βοΈ Multi-Jurisdiction Support: US, EU, UK compliance validation (advisory, not legal advice)
Traditional AI tools give you one-size-fits-all assistants. NerdCabalMCP gives you granular control over specialized agent personas, allowing you to:
- Compose Custom Teams: Mix and match agents for your specific workflow
- Iterate on Agent Design: Each agent has clear capabilities and constraints
- Scale Intelligently: Add new agents as your needs evolve
- Maintain Context: Agents share knowledge through the MCP protocol
- Educational First: Every agent is documented to teach, not just execute
- Transparency: Clear input/output schemas and reasoning traces
- Modularity: Each agent is a standalone module with minimal dependencies
- Extensibility: Built on open standards (MCP, A2A, ADK)
Role: Enterprise IP Intelligence & Pattern Detection Expertise: Patent/trademark/copyright pattern analysis, portfolio valuation, competitive scanning, geographic risk mapping
Use Cases:
- Detecting IP infringement patterns across large datasets
- Geographic risk heatmaps for jurisdiction-specific threats
- Portfolio valuation dashboards for asset management
- Competitive infringement pattern scoring
Example:
{
"tool": "ip_analytics",
"action": "analyze_patterns",
"data": {
"ip_type": "copyright",
"timeframe_days": 90,
"portfolio_ids": ["PORT-001", "PORT-002"]
},
"jurisdiction": "US"
}Key Features:
- Real-time pattern detection using ML bibliometrics
- Cross-jurisdictional risk scoring (US, EU, UK)
- ROI-based litigation opportunity ranking
- Integration with USPTO/EPO/WIPO data sources
Role: Multi-Jurisdiction Governance & Policy Validation Expertise: GDPR, DMCA, EU Copyright Directive, AI Act compliance validation
Use Cases:
- Pre-enforcement compliance checks for DMCA takedowns
- GDPR audit trail generation for IP monitoring
- EU Copyright Directive Article 17 compliance
- AI Act risk assessment for automated detection systems
Example:
{
"tool": "compliance_check",
"action": "validate",
"context": {
"processes_personal_data": true,
"consent_obtained": true,
"takedown_notice_sent": false,
"ai_training_data": false
},
"jurisdiction": "EU"
}Important: Advisory only - not legal advice. All enforcement actions require human review.
Role: Cryptographic Evidence Storage & Chain-of-Custody Expertise: SHA-256 chain hashing, tamper-proof evidence archival, legal admissibility preparation
Use Cases:
- Storing IP infringement evidence with cryptographic integrity
- Maintaining chain-of-custody for litigation
- Generating tamper-proof audit trails
- Preparing evidence packages for legal proceedings
Example:
{
"tool": "archival_system",
"action": "store",
"evidence": {
"type": "image",
"source": "https://example.com/infringement.jpg",
"description": "Unauthorized use of copyrighted work",
"jurisdiction": "US",
"case_id": "CASE-2026-001"
}
}Key Features:
- SHA-256 chain hashing for tamper detection
- Append-only ledger architecture
- Metadata preservation for legal contexts
- Integrity verification tools
Role: Evaluation Framework Designer Expertise: Creates comprehensive rubrics for LLM evaluation, benchmark design, and quality criteria
Use Cases:
- Designing evaluation frameworks for new AI capabilities
- Creating rubrics for human evaluation of model outputs
- Generating benchmark specifications
Example:
{
"tool": "llm-rubric-architect",
"task": "Create a rubric for evaluating code generation models",
"criteria": ["correctness", "efficiency", "readability", "security"],
"output_format": "markdown"
}Role: Research Methodology Specialist Expertise: Hypothesis formulation, experimental design, statistical power analysis
Use Cases:
- Planning A/B tests for model improvements
- Designing controlled experiments for capability evaluations
- Generating research protocols for novel AI techniques
Example:
{
"tool": "experimental-designer",
"research_question": "Does chain-of-thought improve math reasoning?",
"constraints": {
"budget": 1000,
"timeframe": "2 weeks"
}
}Role: Neural Forensics Specialist Expertise: DSMMD taxonomy (Data, Semantics, Methods, Metadata, Discourse)
Use Cases:
- Detecting confabulation patterns in LLM outputs
- Identifying context collapse and metadata leakage
- Analyzing transcripts for hallucination types
Role: Financial Strategist Expertise: Grant budgets, investor projections, ROI analysis
Use Cases:
- Creating grant application budgets (NIH, NSF, etc.)
- Generating investor pitch financial models
- Optimizing compute spend for ML training
Example:
{
"tool": "budget-agent",
"project": "Language Model Training",
"funding_target": 500000,
"timeline_months": 18
}Role: Operations Manager Expertise: Iron Triangle optimization (Speed β· Cost β· Quality)
Use Cases:
- Calculating burn rate and runway
- Optimizing resource allocation across projects
- Generating sprint capacity planning
Key Concept: The Iron Triangle
SPEED
/ \
/ \
/ βοΈ \
/________\
COST QUALITY
You can optimize two, but not all three simultaneously.
Role: Organizational Architect Expertise: SOPs, team structures, timezone optimization
Use Cases:
- Designing org charts for distributed teams
- Creating standard operating procedures
- Optimizing meeting schedules across timezones
Role: Chief Information Security Officer Expertise: STRIDE threat modeling, Zero Trust architecture
Use Cases:
- Security audits of AI systems
- Generating incident response playbooks
- Threat modeling for API deployments
STRIDE Framework:
- Spoofing
- Tampering
- Repudiation
- Information Disclosure
- Denial of Service
- Elevation of Privilege
Role: Experiment Tracking Specialist Expertise: MLflow queries, trace analysis, run comparisons
Use Cases:
- Querying experiment results across runs
- Generating comparative analysis of model versions
- Tracking hyperparameter optimization
Role: Training Data Engineer Expertise: SFT, DPO, HuggingFace dataset creation
Use Cases:
- Creating supervised fine-tuning datasets
- Generating DPO (Direct Preference Optimization) pairs
- Publishing datasets to HuggingFace Hub
Supported Formats:
- Supervised Fine-Tuning (SFT)
- Direct Preference Optimization (DPO)
- Reward Modeling (RM)
- Reinforcement Learning from Human Feedback (RLHF)
Role: Data Quality Analyst Expertise: FiftyOne visualization, mistakenness detection
Use Cases:
- Visualizing computer vision datasets
- Identifying label errors and edge cases
- Generating quality reports for training data
Role: Design Systems Architect Expertise: Color theory, typography, CSS frameworks, UI/UX
Use Cases:
- Creating design systems for applications
- Generating color palettes with WCAG compliance
- Designing component libraries
Supported Styles:
- Cyberpunk Brutalist Bauhaus (your preferred aesthetic!)
- Material Design
- Tailwind CSS utility-first
- Custom design tokens
Example:
{
"tool": "creative-director",
"style": "cyberpunk-brutalist-bauhaus",
"colors": ["black", "white", "red"],
"components": ["buttons", "cards", "navigation"]
}Role: Multi-Agent Coordinator Expertise: ADK patterns (Sequential, Parallel, Loop, Coordinator)
Use Cases:
- Composing multi-agent workflows
- Optimizing agent execution patterns
- Generating A2A communication protocols
ADK Patterns:
Sequential: A β B β C
Parallel: A β B β C β Merge
Loop: A β B β [condition] β A
Coordinator: A β· C β· B
Role: Agent Lifecycle Management Expertise: Creating, deploying, and monitoring agents
Use Cases:
- Converting research papers to executable agents
- Managing agent deployments
- Monitoring agent health and performance
- Node.js 18+ (we use v22.0.0)
- npm or pnpm
- Claude Desktop (for MCP client) or any MCP-compatible client
# 1. Clone the repository
git clone https://github.com/Tuesdaythe13th/NerdCabalMCP.git
cd NerdCabalMCP
# 2. Install dependencies
cd mcp-server
npm install
# 3. Build the TypeScript code
npm run build
# 4. Configure Claude Desktop
# Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS)
# or %APPDATA%/Claude/claude_desktop_config.json (Windows){
"mcpServers": {
"nerdcabal": {
"command": "node",
"args": [
"/absolute/path/to/NerdCabalMCP/mcp-server/dist/index.js"
],
"env": {}
}
}
}# 5. Restart Claude Desktop
# Your 17 agents are now available! (14 original + 3 IP protection tools)Verify your system meets the requirements:
# Check Node.js version (need 18+)
node --version # Should show v18.x.x or higher
# Check npm
npm --version
# Install pnpm (optional, but faster)
npm install -g pnpm# Clone with all history
git clone https://github.com/Tuesdaythe13th/NerdCabalMCP.git
cd NerdCabalMCP
# Or clone with shallow history (faster)
git clone --depth 1 https://github.com/Tuesdaythe13th/NerdCabalMCP.git
cd NerdCabalMCPcd mcp-server
# Using npm
npm install
# Or using pnpm (faster)
pnpm installDependencies Installed:
@modelcontextprotocol/sdk(v1.0.4): Core MCP protocol implementationtypescript(v5.7.2): Type system and compiler@types/node(v22.0.0): Node.js type definitions
# Full build
npm run build
# Development mode with auto-rebuild
npm run watch
# Development server with hot reload
npm run devBuild Output: Compiled JavaScript files in mcp-server/dist/
# Test the server standalone
node dist/index.js
# You should see:
# MCP server running on stdioThe server configuration is in mcp-server/mcp-config.json:
{
"server": {
"name": "nerdcabal-mcp",
"version": "1.0.0"
},
"tools": [
{
"name": "llm-rubric-architect",
"enabled": true
},
{
"name": "experimental-designer",
"enabled": true
}
// ... all 14 agents
]
}Location: ~/Library/Application Support/Claude/claude_desktop_config.json
{
"mcpServers": {
"nerdcabal": {
"command": "node",
"args": [
"/Users/yourname/NerdCabalMCP/mcp-server/dist/index.js"
],
"env": {
"LOG_LEVEL": "info"
}
}
}
}Location: %APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"nerdcabal": {
"command": "node",
"args": [
"C:\\Users\\yourname\\NerdCabalMCP\\mcp-server\\dist\\index.js"
],
"env": {}
}
}
}Location: ~/.config/Claude/claude_desktop_config.json
{
"mcpServers": {
"nerdcabal": {
"command": "node",
"args": [
"/home/yourname/NerdCabalMCP/mcp-server/dist/index.js"
],
"env": {}
}
}
}You can configure behavior with environment variables:
# Logging
LOG_LEVEL=debug|info|warn|error
# Agent-specific settings
MLFLOW_TRACKING_URI=http://localhost:5000
FIFTYONE_DATABASE_URI=mongodb://localhost:27017- Configure
claude_desktop_config.jsonas shown above - Restart Claude Desktop
- Start a conversation and type
@to see available tools - Select
nerdcabaltools to use agents
from anthropic import Anthropic
import json
client = Anthropic(api_key="your-api-key")
# Use the MCP tool
response = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=4096,
tools=[
{
"name": "llm-rubric-architect",
"description": "Creates comprehensive evaluation rubrics",
"input_schema": {
"type": "object",
"properties": {
"task": {"type": "string"},
"criteria": {"type": "array", "items": {"type": "string"}}
},
"required": ["task", "criteria"]
}
}
],
messages=[
{
"role": "user",
"content": "Create a rubric for evaluating code quality"
}
]
)
print(response.content)A cyberpunk brutalist bauhaus interface for easy agent interaction:
# Future command
streamlit run ui/app.pyDeploy as a public or private space for team collaboration.
Every agent follows this pattern:
Input β Agent Processing β Output
- Input: Structured JSON with task parameters
- Processing: Agent applies domain expertise
- Output: Structured result (JSON, Markdown, or Notebook)
Goal: Evaluate a chatbot's performance
Input:
{
"tool": "llm-rubric-architect",
"task": "chatbot-evaluation",
"dimensions": [
"response_quality",
"context_retention",
"safety",
"personality"
],
"scale": "1-5",
"output_format": "markdown"
}Output:
# Chatbot Evaluation Rubric
## Response Quality (1-5)
- **5**: Accurate, complete, directly addresses query
- **4**: Mostly accurate, minor gaps
- **3**: Partially correct, some misunderstandings
- **2**: Significant errors or irrelevance
- **1**: Completely wrong or nonsensical
## Context Retention (1-5)
...Goal: Test if prompt engineering improves accuracy
Input:
{
"tool": "experimental-designer",
"hypothesis": "Chain-of-thought prompting increases accuracy on math problems",
"baseline": "direct-answer",
"intervention": "chain-of-thought",
"metric": "accuracy",
"sample_size": 1000
}Output:
# Experiment Design: Chain-of-Thought vs Direct Answer
## Hypothesis
H0: No difference in accuracy between methods
H1: Chain-of-thought produces higher accuracy
## Methodology
- **Design**: Between-subjects randomized controlled trial
- **Sample**: 1000 math problems (balanced across difficulty)
- **Groups**:
- Control: Direct answer prompts (n=500)
- Treatment: Chain-of-thought prompts (n=500)
## Power Analysis
- Effect size (expected): d=0.3
- Power: 0.80
- Alpha: 0.05
- Required n per group: 352 (achieved)
## Success Criteria
- p < 0.05 on two-tailed t-test
- Effect size d > 0.2 (small practical significance)Goal: Threat model for an API
Input:
{
"tool": "ciso-agent",
"system": "LLM API",
"components": ["API gateway", "model inference", "user database"],
"framework": "STRIDE"
}Output:
# STRIDE Threat Model: LLM API
## Component: API Gateway
### Spoofing
- **Threat**: Attacker impersonates legitimate user
- **Mitigation**: JWT with short expiration, API key rotation
- **Risk**: HIGH
### Tampering
- **Threat**: Request modification in transit
- **Mitigation**: HTTPS/TLS 1.3 only, request signing
- **Risk**: MEDIUM
...Goal: Create a complete research project
Input:
{
"tool": "orchestrator",
"workflow": {
"pattern": "sequential",
"agents": [
{
"name": "experimental-designer",
"input": {"hypothesis": "..."}
},
{
"name": "budget-agent",
"input": {"project": "from_previous", "timeline": 6}
},
{
"name": "administrator",
"input": {"team_size": 3, "timeline": "from_previous"}
}
]
}
}Output: Coordinates all three agents sequentially, passing context between them.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MCP Client β
β (Claude Desktop, Custom UI, etc.) β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β MCP Protocol (JSON-RPC)
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β NerdCabalMCP Server β
β (index.ts) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Tool Router β
β ββ ip_analytics (NEW) β
β ββ compliance_check (NEW) β
β ββ archival_system (NEW) β
β ββ llm-rubric-architect β
β ββ experimental-designer β
β ββ budget-agent β
β ββ comptroller-agent β
β ββ administrator-agent β
β ββ mlflow-agent β
β ββ dataset-builder β
β ββ ciso-agent β
β ββ orchestrator β
β ββ creative-director β
β ββ visual-inspector β
β ββ forensic-analyst β
β ββ paper2agent-infrastructure (2 tools) β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β External Integrations β
β ββ MLflow (experiment tracking) β
β ββ FiftyOne (dataset visualization) β
β ββ HuggingFace (dataset hosting) β
β ββ GitHub (repository analysis) β
β ββ Google Colab (notebook execution) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Each agent implements the Agent Card specification:
interface AgentCard {
name: string; // e.g., "llm-rubric-architect"
version: string; // Semantic versioning
description: string; // Human-readable purpose
capabilities: string[]; // What the agent can do
input_schema: JSONSchema; // Structured input format
output_schema: JSONSchema; // Structured output format
dependencies: string[]; // Required external services
adk_patterns: ADKPattern[]; // Supported execution patterns
}ADK Execution Patterns:
1. Sequential: A β B β C
Use when: Output of A is required input for B
2. Parallel: A β B β C
Use when: Tasks are independent
3. Loop: A β [condition] β A or B
Use when: Iterative refinement needed
4. Coordinator: A β· C β· B
Use when: Central agent manages communication
mcp-server/src/
βββ index.ts # Main MCP server (tool routing)
βββ types.ts # TypeScript interfaces
βββ utils.ts # Shared utilities
βββ agents/
β βββ rubric-architect.ts
β βββ experimental-designer.ts
β βββ budget-agent.ts
β βββ comptroller-agent.ts
β βββ administrator-agent.ts
β βββ mlflow-agent.ts
β βββ dataset-builder.ts
β βββ ciso-agent.ts
β βββ orchestrator.ts
β βββ creative-director.ts
β βββ visual-inspector.ts
β βββ forensic-analyst.ts
βββ infrastructure/
βββ create-agent.ts
βββ check-agent.ts
βββ launch-mcp.ts
sequenceDiagram
participant User
participant MCPClient
participant MCPServer
participant Agent
participant ExternalService
User->>MCPClient: Request task
MCPClient->>MCPServer: MCP tool call (JSON-RPC)
MCPServer->>Agent: Route to appropriate agent
Agent->>Agent: Process with domain logic
Agent->>ExternalService: Optional external call
ExternalService-->>Agent: Return data
Agent-->>MCPServer: Structured output
MCPServer-->>MCPClient: MCP response
MCPClient-->>User: Display result
Purpose: Generate evaluation rubrics for LLM capabilities
Input Schema:
{
"type": "object",
"properties": {
"task": {
"type": "string",
"description": "The evaluation task"
},
"dimensions": {
"type": "array",
"items": {"type": "string"},
"description": "Aspects to evaluate"
},
"scale": {
"type": "string",
"enum": ["1-3", "1-5", "1-7", "1-10"],
"default": "1-5"
},
"output_format": {
"type": "string",
"enum": ["markdown", "json", "csv"]
}
},
"required": ["task", "dimensions"]
}Output: Markdown rubric or JSON structure
Purpose: Design controlled experiments for AI research
Input Schema:
{
"type": "object",
"properties": {
"hypothesis": {
"type": "string",
"description": "Research hypothesis to test"
},
"baseline": {
"type": "string",
"description": "Control condition"
},
"intervention": {
"type": "string",
"description": "Treatment condition"
},
"metric": {
"type": "string",
"description": "Primary evaluation metric"
},
"sample_size": {
"type": "integer",
"minimum": 30
},
"constraints": {
"type": "object",
"properties": {
"budget": {"type": "number"},
"timeframe": {"type": "string"}
}
}
},
"required": ["hypothesis", "metric"]
}Output: Experimental design document (Markdown)
Purpose: Financial planning and budget generation
Input Schema:
{
"type": "object",
"properties": {
"project": {
"type": "string",
"description": "Project name/description"
},
"funding_target": {
"type": "number",
"description": "Target funding amount (USD)"
},
"timeline_months": {
"type": "integer",
"minimum": 1
},
"categories": {
"type": "array",
"items": {
"type": "string",
"enum": ["personnel", "compute", "equipment", "travel", "indirect"]
}
},
"format": {
"type": "string",
"enum": ["NIH", "NSF", "investor_pitch", "generic"]
}
},
"required": ["project", "funding_target", "timeline_months"]
}Output: Detailed budget spreadsheet (JSON/CSV/Markdown)
Purpose: Coordinate multi-agent workflows
Input Schema:
{
"type": "object",
"properties": {
"workflow": {
"type": "object",
"properties": {
"pattern": {
"type": "string",
"enum": ["sequential", "parallel", "loop", "coordinator"]
},
"agents": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"input": {"type": "object"}
}
}
}
}
}
},
"required": ["workflow"]
}Output: Workflow execution plan and coordination strategy
Purpose: Design system and UI/UX generation
Input Schema:
{
"type": "object",
"properties": {
"style": {
"type": "string",
"enum": ["cyberpunk-brutalist-bauhaus", "material", "tailwind", "custom"]
},
"colors": {
"type": "array",
"items": {"type": "string"},
"description": "Color palette (hex or named colors)"
},
"components": {
"type": "array",
"items": {
"type": "string",
"enum": ["buttons", "cards", "navigation", "forms", "typography"]
}
},
"output_format": {
"type": "string",
"enum": ["css", "tailwind", "styled-components", "figma_tokens"]
}
},
"required": ["style", "components"]
}Output: Design system specification (CSS/JSON)
Purpose: Neural forensics for LLM transcript analysis
Input Schema:
{
"type": "object",
"properties": {
"transcript": {
"type": "string",
"description": "LLM conversation transcript"
},
"taxonomy": {
"type": "string",
"enum": ["DSMMD"],
"default": "DSMMD"
},
"detect": {
"type": "array",
"items": {
"type": "string",
"enum": [
"confabulation",
"context_collapse",
"metadata_leakage",
"semantic_drift",
"method_confusion"
]
}
}
},
"required": ["transcript"]
}Output: Forensics report with detected issues (Markdown/JSON)
Want to add your own agent? Here's the template:
// mcp-server/src/agents/my-custom-agent.ts
import { AgentCard, AgentInput, AgentOutput } from '../types';
export const myCustomAgent: AgentCard = {
name: 'my-custom-agent',
version: '1.0.0',
description: 'What your agent does',
capabilities: [
'capability-1',
'capability-2'
],
input_schema: {
type: 'object',
properties: {
// Define your input structure
task: { type: 'string' }
},
required: ['task']
},
output_schema: {
type: 'object',
properties: {
result: { type: 'string' }
}
},
dependencies: [],
adk_patterns: ['sequential', 'parallel']
};
export async function executeMyCustomAgent(
input: AgentInput
): Promise<AgentOutput> {
// Your agent logic here
return {
success: true,
data: {
result: 'Agent output'
}
};
}Then register it in index.ts:
import { myCustomAgent, executeMyCustomAgent } from './agents/my-custom-agent';
server.setRequestHandler(CallToolRequestSchema, async (request) => {
if (request.params.name === 'my-custom-agent') {
const result = await executeMyCustomAgent(request.params.arguments);
return { content: [{ type: 'text', text: JSON.stringify(result) }] };
}
// ... other agents
});from langchain.agents import Tool
from langchain.llms import Anthropic
import requests
def call_nerdcabal_agent(agent_name: str, input_data: dict) -> str:
"""Call a NerdCabal MCP agent"""
response = requests.post(
'http://localhost:3000/mcp',
json={
'tool': agent_name,
'input': input_data
}
)
return response.json()
# Create LangChain tool
rubric_tool = Tool(
name="Rubric Architect",
func=lambda task: call_nerdcabal_agent('llm-rubric-architect', {'task': task}),
description="Creates evaluation rubrics"
)
# Use in agent
from langchain.agents import initialize_agent, AgentType
agent = initialize_agent(
tools=[rubric_tool],
llm=Anthropic(model='claude-3-opus-20240229'),
agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION
)
result = agent.run("Create a rubric for evaluating chatbot empathy")Import the nerdcabal-langflow.json configuration:
{
"nodes": [
{
"type": "MCPTool",
"data": {
"server": "nerdcabal",
"tool": "llm-rubric-architect"
}
}
]
}- Open Anthropic Workbench
- Go to Tools β Add MCP Server
- Point to your local NerdCabal server
- All 14 agents appear as tools
Solution:
- Verify
claude_desktop_config.jsonpath is correct - Ensure you used absolute paths, not relative
- Restart Claude Desktop completely (quit and reopen)
- Check logs:
~/Library/Logs/Claude/mcp*.log(macOS)
Solution:
cd mcp-server
rm -rf node_modules package-lock.json
npm install
npm run buildSolution:
- Check agent is enabled in
mcp-config.json - Verify input matches the required schema
- Check server logs for errors:
LOG_LEVEL=debug node dist/index.jsSolution:
# Update TypeScript
npm install -D typescript@latest
# Clear build cache
rm -rf dist/
npm run buildSolutions:
MLflow:
# Start MLflow server
mlflow server --host 0.0.0.0 --port 5000
# Set environment variable
export MLFLOW_TRACKING_URI=http://localhost:5000FiftyOne:
# Start FiftyOne app
fiftyone app launch
# Verify database
fiftyone migrate --infoHuggingFace:
# Login to HuggingFace
huggingface-cli login
# Verify token
huggingface-cli whoamiModel Context Protocol (MCP) is Anthropic's standard for connecting AI models to external tools and data sources.
Key Concepts:
- Server: Provides tools (your NerdCabal agents)
- Client: Uses tools (Claude Desktop, your app)
- Protocol: JSON-RPC over stdio or HTTP
- Tools: Functions the model can call
Learn More:
Agent-to-Agent (A2A) protocol enables structured communication between AI agents.
Key Concepts:
- Agent Card: Metadata describing agent capabilities
- Input Schema: Structured request format
- Output Schema: Structured response format
- Handshake: Capability negotiation between agents
Anthropic Design Kit (ADK) provides patterns for multi-agent workflows.
Pattern Details:
-
Sequential: A β B β C
- Use when output of A is input to B
- Example: Design β Budget β Timeline
-
Parallel: A β B β C β Merge
- Use when tasks are independent
- Example: Multiple code reviews simultaneously
-
Loop: A β [condition] β A or B
- Use for iterative refinement
- Example: Draft β Review β Revise β Review
-
Coordinator: A β· C β· B
- Use when central agent manages state
- Example: Orchestrator coordinates specialist agents
import streamlit as st
import requests
st.title("𧬠NerdCabal MCP Interface")
agent = st.selectbox("Select Agent", [
"llm-rubric-architect",
"experimental-designer",
"budget-agent",
"creative-director"
])
if agent == "creative-director":
style = st.selectbox("Style", [
"cyberpunk-brutalist-bauhaus",
"material",
"tailwind"
])
colors = st.multiselect("Colors", ["black", "white", "red", "blue"])
if st.button("Generate Design System"):
result = requests.post('http://localhost:3000/mcp', json={
'tool': agent,
'input': {'style': style, 'colors': colors}
})
st.code(result.json()['data'], language='css')Create app.py in your Space:
import gradio as gr
from anthropic import Anthropic
client = Anthropic()
def call_agent(agent_name, task_description):
response = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=4096,
tools=[{"name": f"nerdcabal:{agent_name}"}],
messages=[{"role": "user", "content": task_description}]
)
return response.content[0].text
iface = gr.Interface(
fn=call_agent,
inputs=[
gr.Dropdown(["llm-rubric-architect", "experimental-designer"], label="Agent"),
gr.Textbox(label="Task Description")
],
outputs=gr.Markdown(),
title="NerdCabal MCP Agents"
)
iface.launch()Create .replit file:
run = "npm run dev"
language = "nodejs"
[nix]
channel = "stable-22_11"
[deployment]
deploymentTarget = "cloudrun"Each agent tracks:
- Latency: P50, P95, P99 response times
- Error Rate: Failed requests per minute
- Usage: Requests per agent per day
# Enable debug logging
LOG_LEVEL=debug node dist/index.js 2>&1 | tee mcp-server.log
# View agent-specific logs
grep "llm-rubric-architect" mcp-server.log
# Monitor real-time
tail -f mcp-server.log | grep ERROR# Test server is responding
curl -X POST http://localhost:3000/mcp \
-H "Content-Type: application/json" \
-d '{"method": "health"}'
# Should return: {"status": "ok", "agents": 17}We welcome contributions! See our contributing guide for:
- Adding new agents
- Improving existing agents
- Enhancing documentation
- Reporting bugs
Quick Contribution:
git checkout -b feature/my-new-agent
# Make your changes
npm run build
npm test
git commit -m "Add: My new agent for X"
git push origin feature/my-new-agent
# Open a pull requestMIT License - see LICENSE file for details
- Anthropic for the MCP protocol and Claude models
- Google DeepMind for co-scientist model inspiration
- The open-source AI community for agent design patterns
- My dog TITO, my CTO
- Documentation: You're reading it!
- Enterprise IP Protection Guide: See
README_ENTERPRISE_IP.mdfor watermarking, monitoring, and compliance details - Corrected Claims: See
CORRECTED_DOCUMENTATION.mdfor accurate technical claims (Jan 2026) - Issues: GitHub Issues
- Discussions: GitHub Discussions
Built with β€οΈ by TUESDAY and the OG NerdCabal
By using this MCP, you agree:
WE RESPECT THE RULES OF THE SEA.
Be the hero you want to see in the world. Or just go take a nap and remember this is all just a hi-fidelity simulation.
2026 JAN MODELS: GPT 5.2/Gemini3.0Flash/ClaudeOpus4.5/Perpelexity**
End of MCP Server Guide
Last Updated: January 2026
Version: 1.0.0