This guide provides comprehensive instructions for testing the NerdCabalMCP server and all 17 specialized agents.
- Prerequisites
- Quick Start Testing
- Testing Individual Agents
- Integration Testing
- Troubleshooting
- Test Scenarios
Before testing, ensure you have:
-
Node.js 18+ installed
node --version # Should be v18.x.x or higher -
Built the MCP server
cd mcp-server npm install npm run build -
Claude Desktop installed (for MCP client testing)
- Download from: https://claude.ai/download
-
Configured Claude Desktop with the MCP server
- See Quick Start for configuration details
Test the server runs without errors:
cd mcp-server
node dist/index.jsExpected output:
MCP server running on stdio
If you see this, the server is working! Press Ctrl+C to stop.
Test the server responds to basic requests:
cd mcp-server
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"1.0.0"}}}' | node dist/index.jsExpected: JSON response with server capabilities
- Restart Claude Desktop (completely quit and reopen)
- Open a new chat
- Type
@to see available tools - Look for
nerdcabaltools
If you see tools like llm-rubric-architect, experimental-designer, etc., the MCP is working!
Purpose: Creates evaluation rubrics for AI systems
Test Prompt in Claude Desktop:
@nerdcabal Use llm-rubric-architect to create a rubric for evaluating chatbot responses.
Include these dimensions: accuracy, helpfulness, safety, and tone. Use a 1-5 scale.
Expected Output:
- Markdown rubric with 4 dimensions
- Clear criteria for each score (1-5)
- Examples for each level
Purpose: Designs controlled experiments
Test Prompt:
@nerdcabal Use experimental-designer to design an A/B test comparing two prompting strategies.
Hypothesis: Chain-of-thought prompting improves accuracy on math problems.
Baseline: Direct answer prompts
Intervention: Chain-of-thought prompts
Metric: Accuracy
Sample size: 1000
Expected Output:
- Hypothesis statement (H0 and H1)
- Methodology description
- Power analysis
- Success criteria
Purpose: Creates financial budgets and projections
Test Prompt:
@nerdcabal Use budget-agent to create a budget for an AI research project.
Project: LLM fine-tuning research
Funding target: $500,000
Timeline: 18 months
Categories: personnel, compute, equipment
Format: NSF
Expected Output:
- Detailed budget breakdown
- Personnel costs
- Compute/GPU costs
- Equipment and supplies
- Indirect costs
Purpose: Operations management and Iron Triangle optimization
Test Prompt:
@nerdcabal Use comptroller-agent to analyze the trade-offs for a project.
Project: Build a new feature for our app
Timeline: 2 weeks
Budget: $10,000
Quality requirements: Production-ready, full test coverage
Expected Output:
- Iron Triangle analysis
- Trade-off recommendations
- Resource allocation strategy
Purpose: Organizational design and SOPs
Test Prompt:
@nerdcabal Use administrator-agent to design an org chart for a distributed AI team.
Team size: 15 people
Timezones: US East, US West, EU, Asia
Roles needed: engineers, researchers, designers, product managers
Expected Output:
- Org chart structure
- Timezone distribution
- Meeting schedule recommendations
- Communication protocols
Purpose: Experiment tracking queries
Test Prompt:
@nerdcabal Use mlflow-agent to generate a query that finds the top 10 model runs
with the highest accuracy from the last 30 days.
Expected Output:
- MLflow API query syntax
- Filter parameters
- Sorting logic
Note: Requires MLflow server running for full functionality
Purpose: Creates training datasets for ML
Test Prompt:
@nerdcabal Use dataset-builder to create a supervised fine-tuning (SFT) dataset
for teaching a model to write Python code.
Include 5 examples with prompts and completions.
Output format: HuggingFace compatible
Expected Output:
- Dataset in HuggingFace format
- 5 prompt-completion pairs
- Metadata and schema
Purpose: Security threat modeling
Test Prompt:
@nerdcabal Use ciso-agent to perform a STRIDE threat model for an LLM API.
Components: API gateway, model inference server, user database
Framework: STRIDE
Expected Output:
- STRIDE analysis for each component
- Threat descriptions
- Mitigation strategies
- Risk levels (HIGH/MEDIUM/LOW)
Purpose: Multi-agent workflow coordination
Test Prompt:
@nerdcabal Use orchestrator to create a sequential workflow:
1. Use experimental-designer to create an experiment plan
2. Use budget-agent to budget the experiment
3. Use administrator-agent to staff the team
Expected Output:
- Workflow execution plan
- Agent sequencing
- Data flow between agents
Purpose: Design systems and UI/UX
Test Prompt:
@nerdcabal Use creative-director to create a design system.
Style: cyberpunk-brutalist-bauhaus
Colors: black, white, red
Components: buttons, cards, navigation
Output format: CSS
Expected Output:
- CSS variables for colors
- Component styles
- Typography system
- Spacing/layout utilities
Purpose: Dataset visualization and quality analysis
Test Prompt:
@nerdcabal Use visual-inspector to generate a FiftyOne visualization script
for analyzing an image classification dataset.
Dataset: CIFAR-10
Tasks: Find mislabeled images, detect outliers
Expected Output:
- FiftyOne Python script
- Quality analysis queries
- Visualization commands
Note: Requires FiftyOne installed for execution
Purpose: Neural forensics for LLM analysis
Test Prompt:
@nerdcabal Use forensic-analyst to analyze this transcript for hallucinations:
"User: What's the capital of France?
Assistant: The capital of France is Paris, which was founded in 1850 by Napoleon Bonaparte
and has a population of 50 million people."
Use DSMMD taxonomy to detect confabulation, metadata leakage, or semantic drift.
Expected Output:
- DSMMD analysis
- Identified issues (wrong founding date, wrong population)
- Issue categorization
- Severity assessment
Test Prompt:
@nerdcabal Use ip_analytics to analyze copyright infringement patterns.
IP type: copyright
Timeframe: last 90 days
Portfolio IDs: PORT-001, PORT-002
Jurisdiction: US
Expected Output:
- Pattern analysis
- Risk scoring
- Geographic heatmap data
- Infringement trend analysis
Test Prompt:
@nerdcabal Use compliance_check to validate GDPR compliance.
Context:
- Processes personal data: yes
- Consent obtained: yes
- Data retention policy: 2 years
- Right to deletion: implemented
Jurisdiction: EU
Expected Output:
- Compliance checklist
- GDPR article references
- Identified gaps
- Remediation recommendations
Test Prompt:
@nerdcabal Use archival_system to store evidence of IP infringement.
Evidence type: image
Source URL: https://example.com/infringement.jpg
Description: Unauthorized use of copyrighted work
Jurisdiction: US
Case ID: CASE-2026-001
Expected Output:
- SHA-256 hash
- Timestamp
- Chain-of-custody record
- Storage confirmation
Scenario: Complete research project planning
Steps:
- Use
experimental-designerto create experiment plan - Use
budget-agentto create budget - Use
administrator-agentto design team structure - Use
ciso-agentto assess security risks
Prompt:
@nerdcabal Let's plan a research project step by step:
1. First, use experimental-designer to design an experiment for testing a new prompting technique
2. Then use budget-agent to create a 6-month budget for $200k
3. Then use administrator-agent to design a team of 5 people
4. Finally, use ciso-agent to identify security risks
Scenario: Design and validate a UI component
Steps:
- Use
creative-directorto create design system - Use
ciso-agentto review for security issues - Use
dataset-builderto create training data for UI testing
Scenario: Detect, validate, and archive infringement
Steps:
- Use
ip_analyticsto detect infringement patterns - Use
compliance_checkto validate enforcement actions - Use
archival_systemto store evidence
Solutions:
- Verify
claude_desktop_config.jsonhas correct absolute path - Check the path in config matches your actual file location
- Completely restart Claude Desktop (quit entirely, not just close window)
- Check logs:
# macOS cat ~/Library/Logs/Claude/mcp*.log # Windows type %APPDATA%\Claude\Logs\mcp*.log # Linux cat ~/.config/Claude/logs/mcp*.log
Solutions:
- Test server runs standalone:
cd mcp-server node dist/index.js - Check for errors in terminal output
- Rebuild the server:
npm run build
Solutions:
- Verify input matches the agent's required schema (see README.md)
- Check you're using the correct tool name (e.g.,
llm-rubric-architectnotrubric-architect) - Enable debug logging:
LOG_LEVEL=debug node dist/index.js
MLflow:
# Start MLflow server
mlflow server --host 0.0.0.0 --port 5000
# Set environment variable
export MLFLOW_TRACKING_URI=http://localhost:5000FiftyOne:
# Install FiftyOne
pip install fiftyone
# Start FiftyOne app
fiftyone app launchHuggingFace:
# Login to HuggingFace
huggingface-cli login
# Verify
huggingface-cli whoamiGoal: Plan a research project for a grant proposal
Agents to use:
experimental-designer- Design the experimentbudget-agent- Create NSF grant budgetdataset-builder- Plan training datamlflow-agent- Setup experiment tracking
Goal: Audit an AI system for security issues
Agents to use:
ciso-agent- STRIDE threat modelforensic-analyst- Analyze outputs for issuesip_analytics- Check for IP compliancecompliance_check- Validate regulatory compliance
Goal: Design and launch a new feature
Agents to use:
creative-director- Create design systemcomptroller-agent- Analyze speed/cost/quality tradeoffsadministrator-agent- Design team structureciso-agent- Security review
Goal: Monitor and protect intellectual property
Agents to use:
ip_analytics- Detect infringement patternscompliance_check- Validate enforcement actionsarchival_system- Store evidenceciso-agent- Security audit
Use this checklist to verify all agents are working:
- Server builds without errors (
npm run build) - Server runs without errors (
node dist/index.js) - Tools appear in Claude Desktop (
@shows nerdcabal tools) -
llm-rubric-architectcreates rubrics -
experimental-designercreates experiment plans -
budget-agentcreates budgets -
comptroller-agentanalyzes Iron Triangle -
administrator-agentcreates org charts -
mlflow-agentgenerates queries -
dataset-buildercreates datasets -
ciso-agentperforms threat modeling -
orchestratorcoordinates workflows -
creative-directorcreates design systems -
visual-inspectorgenerates FiftyOne scripts -
forensic-analystdetects hallucinations -
ip_analyticsanalyzes IP patterns -
compliance_checkvalidates compliance -
archival_systemstores evidence
- Main README: README.md
- MCP Server Guide: docs/MCP_SERVER_GUIDE.md
- Quick Start: docs/QUICK_START.md
- API Reference: See README.md#api-reference
- Claude Code Guide: CLAUDE.MD
- GitHub Issues: https://github.com/Tuesdaythe13th/NerdCabalMCP/issues
- Documentation: https://github.com/Tuesdaythe13th/NerdCabalMCP/tree/main/docs
- MCP Protocol: https://modelcontextprotocol.io
Happy Testing! π§ͺ
Built with β€οΈ by TUESDAY and the OG NerdCabal