diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 00000000..9e20006e --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,191 @@ +# System Architecture + +GLaDOS is built on Marvin Minsky's **Society of Mind** architecture, where multiple specialized agents contribute to a unified intelligence. Rather than a single monolithic AI, GLaDOS assembles a dynamic context from independent subagents (emotion, memory, observation) for each LLM interaction. + +## Society of Mind Overview + +Each subagent runs its own loop, processes its domain independently, and writes outputs to shared **slots**. The main agent reads all slot contents as part of its context, giving it awareness of emotional state, environment, memory, and more — without coupling these systems together. + +```mermaid +flowchart TB + subgraph Minds["Subagents (Minds)"] + E[Emotion Agent] + O[Observer Agent] + C[Compaction Agent] + W[Weather / News] + end + + subgraph Slots["Shared Slots"] + S1["[emotion] excited, engaged"] + S2["[observer] modifiers active"] + S3["[weather] 22°C, sunny"] + end + + Minds --> Slots + Slots --> CTX[Context Builder] + CTX --> LLM[Main LLM Agent] + USER[User Input] --> LLM + LLM --> TTS[Speech Output] +``` + +## Two-Lane LLM Orchestration + +GLaDOS separates user-facing and background inference into two independent lanes: + +```mermaid +flowchart LR + A[User Input
speech / text] --> B[Priority Lane
1 dedicated worker] + C[Autonomy Loop
subagents / jobs] --> D[Autonomy Lane
N pooled workers] + B --> E[TTS → Audio] + D --> E +``` + +- **Priority lane**: A single dedicated LLM worker that handles user input. User requests are never blocked by background work. +- **Autonomy lane**: A configurable pool of 1–16 workers (default 2) for background processing — autonomy ticks, subagent LLM calls, and background jobs. + +Both lanes share the TTS and audio output pipeline. + +## Thread Architecture + +All components run in dedicated threads connected by `queue.Queue` instances. + +| Thread | Class | Daemon | Shutdown Priority | Purpose | +|--------|-------|--------|-------------------|---------| +| `SpeechListener` | `SpeechListener` | Yes | INPUT | VAD → ASR transcription | +| `TextListener` | `TextListener` | Yes | INPUT | stdin / TUI text input | +| `LLMProcessor` | `LanguageModelProcessor` | No | PROCESSING | Priority lane LLM inference | +| `LLMProcessorAutonomy-N` | `LanguageModelProcessor` | No | PROCESSING | Autonomy lane LLM inference (1–16 workers) | +| `ToolExecutor` | `ToolExecutor` | No | PROCESSING | Native + MCP tool dispatch | +| `TTSSynthesizer` | `TextToSpeechSynthesizer` | No | OUTPUT | Text → audio synthesis | +| `AudioPlayer` | `SpeechPlayer` | No | OUTPUT | Audio playback via sounddevice | +| `AutonomyLoop` | `AutonomyLoop` | Yes | BACKGROUND | Autonomy tick orchestration | +| `AutonomyTicker` | (timer thread) | Yes | BACKGROUND | Periodic tick generation | +| `VisionProcessor` | `VisionProcessor` | Yes | BACKGROUND | Camera capture → FastVLM inference | + +**Daemon vs non-daemon**: Daemon threads (`True`) are stateless input threads that can be killed immediately. Non-daemon threads (`False`) have in-flight state (conversation updates, pending audio) and must complete gracefully. + +## Queue-Based Message Flow + +```mermaid +flowchart LR + SL[SpeechListener] -->|text| PQ[llm_queue_priority] + TL[TextListener] -->|text| PQ + AL[AutonomyLoop] -->|tick| AQ[llm_queue_autonomy] + + PQ --> LLM1[LLMProcessor] + AQ --> LLM2[LLMProcessor
Autonomy 1..N] + + LLM1 -->|tool calls| TCQ[tool_calls_queue] + LLM2 -->|tool calls| TCQ + TCQ --> TE[ToolExecutor] + TE -->|results| PQ + TE -->|results| AQ + + LLM1 -->|text| TQ[tts_queue] + LLM2 -->|text| TQ + TQ --> TTS[TTSSynthesizer] + TTS -->|audio| AAQ[audio_queue] + AAQ --> AP[AudioPlayer] +``` + +### Queue Details + +| Queue | Type | Bounded | Connects | +|-------|------|---------|----------| +| `llm_queue_priority` | `Queue[dict]` | Unbounded | Input → Priority LLM worker | +| `llm_queue_autonomy` | `Queue[dict]` | Configurable | Autonomy → Autonomy LLM workers | +| `tool_calls_queue` | `Queue[dict]` | Unbounded | LLM → ToolExecutor | +| `tts_queue` | `Queue[str]` | Unbounded | LLM → TTSSynthesizer | +| `audio_queue` | `Queue[AudioMessage]` | Unbounded | TTSSynthesizer → AudioPlayer | + +## Shutdown Orchestration + +Shutdown proceeds in priority phases, each fully completing before the next begins: + +```mermaid +flowchart LR + A["1. INPUT
Stop listeners"] --> B["2. PROCESSING
Drain LLM + tools"] + B --> C["3. OUTPUT
Drain TTS + audio"] + C --> D["4. BACKGROUND
Abandon autonomy"] + D --> E["5. CLEANUP
Final teardown"] +``` + +| Phase | Priority | Components | Behavior | +|-------|----------|------------|----------| +| INPUT | 1 | SpeechListener, TextListener | Stop accepting new work | +| PROCESSING | 2 | LLMProcessor, ToolExecutor | Complete in-flight work, drain queues | +| OUTPUT | 3 | TTSSynthesizer, AudioPlayer | Complete pending output | +| BACKGROUND | 4 | AutonomyLoop, VisionProcessor | Can safely abandon | +| CLEANUP | 5 | (final operations) | Final teardown | + +The `ShutdownOrchestrator` manages this process with configurable timeouts (global: 30s, per-phase: 10s). For each group, it drains component queues first, then joins threads. + +## Context Building Pipeline + +Each LLM request assembles context from registered sources, ordered by priority (higher = earlier in context): + +| Priority | Source | Content | +|----------|--------|---------| +| 10 | `preferences` | User preferences (name, language, etc.) | +| 8 | `slots` | Autonomy slot summaries (weather, news, etc.) | +| 7 | `memory` | Relevant long-term memories | +| 5 | `emotion` | Current PAD emotional state | +| 5 | `knowledge` | Local knowledge notes | +| 3 | `constitution` | Constitutional behavioral modifiers | + +The `ContextBuilder` calls each source function on every request. Sources returning `None` are skipped. The resulting system messages are prepended to the conversation before sending to the LLM. + +The full message assembly order: +1. Personality preprompt (system/user/assistant messages) +2. Context builder system messages (table above) +3. MCP resource messages (cached, TTL-based) +4. Conversation history +5. Current user message + +## Component Interaction Overview + +```mermaid +flowchart TB + subgraph Input + MIC[Microphone] --> VAD[VAD] --> ASR[ASR Engine] + KB[Keyboard/TUI] --> TL[TextListener] + CAM[Camera] --> VP[VisionProcessor] + end + + subgraph Processing + ASR --> SL[SpeechListener] + SL --> LLM[LLMProcessor
Priority] + TL --> LLM + VP --> AL[AutonomyLoop] + AL --> LLMA[LLMProcessor
Autonomy] + LLM --> TE[ToolExecutor] + LLMA --> TE + TE --> MCP[MCP Servers] + TE --> NT[Native Tools] + end + + subgraph Output + LLM --> TTS[TTSSynthesizer] + LLMA --> TTS + TTS --> SP[SpeechPlayer] + SP --> SPKR[Speaker] + end + + subgraph Background + SM[SubagentManager] --> EA[EmotionAgent] + SM --> OA[ObserverAgent] + SM --> CA[CompactionAgent] + EA --> SS[SlotStore] + OA --> SS + SS --> CTX[ContextBuilder] + CTX --> LLM + CTX --> LLMA + end +``` + +## See Also + +- [README](../README.md) — Full project overview +- [autonomy.md](./autonomy.md) — Autonomy loop and subagent details +- [mcp.md](./mcp.md) — MCP tool system +- [audio.md](./audio.md) — Audio pipeline details diff --git a/docs/audio.md b/docs/audio.md new file mode 100644 index 00000000..6fd64849 --- /dev/null +++ b/docs/audio.md @@ -0,0 +1,187 @@ +# Audio Pipeline + +GLaDOS uses a fully local audio pipeline with ONNX-based models for voice activity detection, speech recognition, and text-to-speech synthesis. All inference runs on-device with no cloud dependencies. + +## Pipeline Overview + +```mermaid +flowchart LR + MIC[Microphone
16kHz mono] --> VAD[Silero VAD
32ms chunks] + VAD -->|speech detected| BUF[Pre-activation
Buffer 800ms] + BUF --> ASR[ASR Engine
Parakeet ONNX] + ASR -->|text| LLM[LLM Processor] + LLM -->|text| TTS[TTS Engine
GLaDOS / Kokoro] + TTS -->|audio| SP[SpeechPlayer
sounddevice] + SP --> SPKR[Speaker] +``` + +## Voice Activity Detection (VAD) + +GLaDOS uses **Silero VAD** (ONNX) to detect when the user is speaking. + +| Parameter | Value | +|-----------|-------| +| Model | Silero VAD (ONNX) | +| Sample rate | 16,000 Hz | +| Chunk size | 32ms (512 samples) | +| Trigger threshold | 0.8 (configurable) | +| Audio format | 16-bit mono float32 | + +The VAD processes audio in 32ms chunks. When the VAD confidence exceeds the threshold (default 0.8), the system transitions to recording mode and begins accumulating audio for ASR. + +### Pre-Activation Buffer + +A rolling buffer captures audio **before** VAD triggers, preventing the loss of word beginnings: + +- **Buffer size**: 800ms (25 chunks at 32ms each) +- **Implementation**: `deque(maxlen=25)` of 32ms audio chunks +- When VAD triggers, the buffer contents are prepended to the recording + +### Speech Segmentation + +Speech is segmented by silence gaps: + +- **Pause limit**: 640ms of silence ends a speech segment +- When the gap counter exceeds `PAUSE_LIMIT / VAD_SIZE` (20 chunks), the accumulated audio is sent to ASR + +## ASR Engines + +GLaDOS supports two NVIDIA Parakeet ASR engines, selectable via the `asr_engine` config option. + +### Parakeet TDT (Token and Duration Transducer) + +The default and recommended engine, offering the best accuracy. + +| Aspect | Value | +|--------|-------| +| Config value | `asr_engine: "tdt"` | +| Architecture | Encoder + Decoder + Joiner (transducer) | +| Model size | 0.6B parameters | +| Models | `parakeet-tdt-0.6b-v3_encoder.onnx`, `_decoder.onnx`, `_joiner.onnx` | +| Sample rate | 16,000 Hz | +| Backend | ONNX Runtime (CPU/CUDA) | + +### Parakeet CTC (Connectionist Temporal Classification) + +A lighter alternative with faster inference at the cost of some accuracy. + +| Aspect | Value | +|--------|-------| +| Config value | `asr_engine: "ctc"` | +| Architecture | Single encoder with CTC head | +| Model size | 110M parameters | +| Model | `nemo-parakeet_tdt_ctc_110m.onnx` | +| Sample rate | 16,000 Hz | +| Backend | ONNX Runtime (CPU/CUDA) | + +Both engines use mel spectrogram preprocessing (16kHz, configurable n_fft, window size, and number of mel bins from model config YAML). + +## TTS Engines + +The TTS engine is selected by the `voice` config option. Setting `voice: "glados"` uses the GLaDOS engine; any other value selects a Kokoro voice. + +### GLaDOS Voice (Piper VITS) + +The signature GLaDOS voice from the Portal games. + +| Aspect | Value | +|--------|-------| +| Config value | `voice: "glados"` | +| Architecture | Piper VITS (ONNX) | +| Model | `models/TTS/glados.onnx` | +| Sample rate | 22,050 Hz | +| Phonemizer | Custom ONNX phonemizer (`phomenizer_en.onnx`) | +| Pipeline | Text → Phonemizer → VITS → Audio | + +### Kokoro (Multi-Voice) + +A multi-voice TTS engine supporting various voice styles. + +| Aspect | Value | +|--------|-------| +| Config value | `voice: ""` (e.g., `af_bella`, `am_adam`) | +| Architecture | Kokoro ONNX | +| Model | `models/TTS/kokoro-v1.0.fp16.onnx` | +| Sample rate | 24,000 Hz | +| Max phoneme length | 510 | +| Default voice | `af_alloy` | + +Available voice prefixes: +- `af_` — Female voices (e.g., `af_bella`, `af_alloy`, `af_nova`, `af_shimmer`) +- `am_` — Male voices (e.g., `am_adam`, `am_echo`, `am_orion`, `am_sage`) + +## Interruption Handling + +When `interruptible: true` (default), user speech interrupts GLaDOS mid-response: + +```mermaid +sequenceDiagram + participant U as User + participant VAD as VAD + participant SP as SpeechPlayer + participant LLM as LLMProcessor + participant E as EmotionAgent + + SP->>SP: Playing GLaDOS response + U->>VAD: Starts speaking + VAD->>SP: Stop playback + SP->>SP: Record percentage spoken + SP->>LLM: Clip response at interruption point + SP->>E: EmotionEvent("user", "User interrupted me mid-sentence") + VAD->>LLM: New user input (priority lane) +``` + +Key behaviors: +1. **Playback stops immediately** when VAD detects speech during output +2. **Response is clipped** — the conversation history records only the portion that was actually spoken +3. **Emotion event fires** — the emotion agent receives an interruption event, which may increase arousal +4. **Priority lane** ensures the new user input is processed immediately + +## Wake Word Support + +When `wake_word` is configured, GLaDOS only processes speech that contains the wake word. + +- **Matching**: Uses Levenshtein distance (edit distance) for fuzzy matching +- **Threshold**: A word matches if its Levenshtein distance to the wake word is small enough +- **Case-insensitive**: Both the transcription and wake word are lowercased before comparison +- **Per-word check**: Each word in the transcription is checked independently + +```yaml +wake_word: "glados" # Only respond when "glados" (or similar) is spoken +``` + +If the wake word is not detected in the transcription, the input is silently discarded. + +## Audio I/O Backend + +GLaDOS uses the `sounddevice` library for audio I/O, wrapped in the `AudioProtocol` interface. + +```python +class AudioProtocol(Protocol): + def __init__(self, vad_threshold: float | None = None) -> None: ... + def start_speaking(self, audio_data, sample_rate=None, text="") -> None: ... + def measure_percentage_spoken(self, total_samples, sample_rate=None) -> tuple[bool, int]: ... + def is_speaking(self) -> bool: ... + def stop_speaking(self) -> None: ... +``` + +The protocol-based design allows swapping audio backends. Currently `sounddevice` is the only implementation; a `websocket` backend is planned. + +## Configuration Reference + +| Option | Type | Default | Description | +|--------|------|---------|-------------| +| `voice` | string | required | TTS voice: `"glados"` or any Kokoro voice name | +| `asr_engine` | string | required | ASR engine: `"tdt"` (best) or `"ctc"` (faster) | +| `audio_io` | string | required | Audio backend: `"sounddevice"` | +| `interruptible` | bool | required | Allow user to interrupt mid-response | +| `wake_word` | string/null | `null` | Optional wake word for activation | +| `asr_muted` | bool | `false` | Start with ASR muted | +| `tts_enabled` | bool | `true` | Enable TTS output | +| `announcement` | string/null | `null` | Text to speak on startup | + +## See Also + +- [README](../README.md) — Full project overview +- [architecture.md](./architecture.md) — System architecture and thread model +- [configuration.md](./configuration.md) — Complete configuration reference diff --git a/docs/configuration.md b/docs/configuration.md new file mode 100644 index 00000000..695e3223 --- /dev/null +++ b/docs/configuration.md @@ -0,0 +1,311 @@ +# Configuration Reference + +GLaDOS is configured through YAML files validated by Pydantic. The default configuration lives in `configs/glados_config.yaml`, and custom configs can be loaded via the `--config` CLI flag. + +## YAML Structure + +All configuration is nested under a top-level `Glados:` key: + +```yaml +Glados: + llm_model: "llama3.2" + completion_url: "http://localhost:11434/api/chat" + voice: "glados" + # ... other options +``` + +## Loading Configuration + +### Via CLI + +```bash +# Default config +uv run glados start + +# Custom config file +uv run glados start --config ~/my_config.yaml + +# TUI with custom config +uv run glados tui --config configs/assistant_config.yaml +``` + +### CLI Overrides + +Command-line flags override config file values: + +```bash +uv run glados start --input-mode text --asr-muted --tts-disabled +uv run glados tui --theme matrix --input-mode both +``` + +### Programmatic + +```python +from glados.core.engine import Glados, GladosConfig + +config = GladosConfig.from_yaml("configs/glados_config.yaml") +glados = Glados.from_config(config) +glados.run() +``` + +## Complete Configuration Reference + +### Core Settings + +| Option | Type | Default | Description | +|--------|------|---------|-------------| +| `llm_model` | string | required | LLM model name (e.g., `"llama3.2"`, `"qwen3:4b-instruct-2507-q4_K_M"`) | +| `completion_url` | string (URL) | required | OpenAI-compatible endpoint URL | +| `api_key` | string/null | `null` | API key for the LLM service | +| `llm_headers` | dict/null | `null` | Extra HTTP headers for LLM requests | +| `interruptible` | bool | required | Allow user to interrupt mid-response | +| `audio_io` | string | required | Audio backend: `"sounddevice"` | +| `input_mode` | string | `"audio"` | Input mode: `"audio"`, `"text"`, or `"both"` | + +### Audio Settings + +| Option | Type | Default | Description | +|--------|------|---------|-------------| +| `voice` | string | required | TTS voice: `"glados"` or Kokoro voice name | +| `asr_engine` | string | required | ASR engine: `"tdt"` (best) or `"ctc"` (faster) | +| `tts_enabled` | bool | `true` | Enable TTS output at startup | +| `asr_muted` | bool | `false` | Start with ASR muted | +| `wake_word` | string/null | `null` | Wake word for activation | +| `announcement` | string/null | `null` | Startup announcement text (empty string to disable) | + +### UI Settings + +| Option | Type | Default | Description | +|--------|------|---------|-------------| +| `tui_theme` | string/null | `null` | TUI theme: `"aperture"`, `"ice"`, `"matrix"`, `"mono"`, `"ember"` | + +### Tool Settings + +| Option | Type | Default | Description | +|--------|------|---------|-------------| +| `tool_timeout` | float | `30.0` | Tool execution timeout in seconds | +| `slow_clap_audio_path` | string | `"data/slow-clap.mp3"` | Path to slow clap audio file | + +## Personality Preprompt + +The personality defines GLaDOS's character through a sequence of system, user, and assistant messages: + +```yaml +Glados: + personality_preprompt: + - system: "You are GLaDOS, a sarcastic and cunning artificial intelligence..." + - user: "How do I make a cup of tea?" + - assistant: "So, you still haven't figured out tea yet? Boil water, add a tea bag..." + - user: "What should my next hobby be?" + - assistant: "Could I suggest juggling handguns?" +``` + +Each entry must have exactly one key (`system`, `user`, or `assistant`). These are converted to OpenAI-compatible chat messages: `{"role": "system", "content": "..."}`. + +The system message defines the character. User/assistant pairs provide few-shot examples of the desired tone and style. + +## LLM Backend Setup + +### Ollama (Local) + +```yaml +Glados: + completion_url: "http://localhost:11434/api/chat" + llm_model: "llama3.2" + api_key: null +``` + +### OpenAI-Compatible API + +```yaml +Glados: + completion_url: "https://api.openai.com/v1/chat/completions" + llm_model: "gpt-4" + api_key: "sk-..." +``` + +### OpenRouter + +```yaml +Glados: + completion_url: "https://openrouter.ai/api/v1/chat/completions" + llm_model: "openai/gpt-4" + api_key: "sk-or-v1-..." + llm_headers: + HTTP-Referer: "https://myapp.com" + X-Title: "GLaDOS" +``` + +## Voice Selection + +### GLaDOS Voice + +```yaml +voice: "glados" # Signature GLaDOS voice (Piper VITS, 22050 Hz) +``` + +### Kokoro Voices + +```yaml +voice: "af_bella" # Female voice +voice: "am_adam" # Male voice +``` + +Available voice prefixes: +- `af_` — Female: `af_bella`, `af_alloy`, `af_nova`, `af_shimmer` +- `am_` — Male: `am_adam`, `am_echo`, `am_orion`, `am_sage` + +## Autonomy Configuration + +```yaml +Glados: + autonomy: + enabled: false + tick_interval_s: 10 + cooldown_s: 20 + autonomy_parallel_calls: 2 + autonomy_queue_max: null + coalesce_ticks: true +``` + +See [autonomy.md](./autonomy.md) for full autonomy configuration details. + +## Emotion Configuration + +```yaml +Glados: + autonomy: + emotion: + enabled: true + tick_interval_s: 30 + baseline_pleasure: 0.1 + baseline_arousal: -0.1 + baseline_dominance: 0.6 + hexaco: + honesty_humility: 0.3 + emotionality: 0.7 + extraversion: 0.4 + agreeableness: 0.2 + conscientiousness: 0.9 + openness: 0.95 +``` + +See [emotion.md](./emotion.md) for full emotion system documentation. + +## Token Management + +```yaml +Glados: + autonomy: + tokens: + token_threshold: 8000 + preserve_recent_messages: 10 + model_context_window: null + target_utilization: 0.6 + estimator: "simple" + chars_per_token: 4.0 +``` + +See [memory.md](./memory.md) for compaction and memory configuration. + +## Background Jobs + +```yaml +Glados: + autonomy: + jobs: + enabled: false + poll_interval_s: 1 + hacker_news: + enabled: false + interval_s: 1800 + top_n: 5 + min_score: 200 + weather: + enabled: false + interval_s: 3600 + latitude: null + longitude: null + timezone: "auto" + temp_change_c: 4.0 + wind_alert_kmh: 40.0 +``` + +## Vision Configuration + +```yaml +Glados: + vision: + enabled: true + model_dir: "models/Vision" + camera_index: 0 + capture_interval_seconds: 5.0 + resolution: 384 + scene_change_threshold: 0.05 + max_tokens: 200 +``` + +See [vision.md](./vision.md) for vision setup and model download instructions. + +## MCP Server Configuration + +```yaml +Glados: + mcp_servers: + - name: "system_info" + transport: "stdio" + command: "python" + args: ["-m", "glados.mcp.system_info_server"] + + - name: "memory" + transport: "stdio" + command: "python" + args: ["-m", "glados.mcp.memory_server"] + + - name: "home_assistant" + transport: "http" + url: "http://homeassistant.local:8123/mcp" + token: "YOUR_TOKEN" + allowed_tools: ["light.*"] +``` + +See [mcp.md](./mcp.md) for MCP integration details. + +## CLI Commands + +```bash +# Download required model files +uv run glados download + +# Start voice assistant +uv run glados start [--config PATH] [--input-mode audio|text|both] + [--asr-muted|--asr-unmuted] + [--tts-enabled|--tts-disabled] + +# Start with TUI +uv run glados tui [--config PATH] [--input-mode audio|text|both] + [--asr-muted|--asr-unmuted] + [--tts-enabled|--tts-disabled] + [--theme THEME] + +# Text-to-speech only +uv run glados say "text to speak" +``` + +## Sample Configurations + +GLaDOS ships with several sample configs in `configs/`: + +| File | Description | +|------|-------------| +| `glados_config.yaml` | Default GLaDOS personality with autonomy and MCP | +| `assistant_config.yaml` | Friendly assistant personality (Bella voice) | +| `glados_vision_config.yaml` | GLaDOS with vision enabled | + +## See Also + +- [README](../README.md) — Quick start and installation +- [architecture.md](./architecture.md) — System architecture overview +- [autonomy.md](./autonomy.md) — Autonomy loop configuration +- [mcp.md](./mcp.md) — MCP server configuration +- [emotion.md](./emotion.md) — Emotion system configuration diff --git a/docs/emotion.md b/docs/emotion.md new file mode 100644 index 00000000..78d16a7f --- /dev/null +++ b/docs/emotion.md @@ -0,0 +1,238 @@ +# Emotion System + +GLaDOS implements a dual-layer emotion system using the **PAD (Pleasure-Arousal-Dominance)** affect model for reactive emotional responses and **HEXACO** personality traits for persistent character. Emotions influence GLaDOS's behavior through constitutional modifiers that adjust snark level, proactivity, and verbosity. + +## PAD Affect Model + +Emotional state is represented in three-dimensional PAD space, where each dimension ranges from **-1.0 to +1.0**: + +| Dimension | Negative | Neutral | Positive | +|-----------|----------|---------|----------| +| **Pleasure** | Unpleasant, frustrated | Neutral | Pleasant, content | +| **Arousal** | Calm, bored | Balanced | Excited, alert | +| **Dominance** | Submissive, uncertain | Balanced | In-control, confident | + +### Quadrant Mapping + +The system maps PAD values to human-readable descriptions using ±0.3 thresholds: + +| Pleasure | Arousal | Description | +|----------|---------|-------------| +| > +0.3 | > +0.3 | Excited and engaged | +| > +0.3 | < -0.3 | Calm and content | +| < -0.3 | > +0.3 | Agitated and frustrated | +| < -0.3 | < -0.3 | Bored and listless | +| ±0.3 | ±0.3 | Neutral | + +Dominance adds flavor: D > +0.3 appends "feeling in control", D < -0.3 appends "feeling uncertain". + +## State vs Mood + +The emotion system maintains two layers to prevent emotional whiplash while allowing dynamic response: + +```mermaid +flowchart LR + EV[Events] -->|immediate| S[State
P/A/D] + S -->|drifts toward| M[Mood
mood_P/A/D] + BL[Baseline] -->|drifts toward| M + M --> CTX[LLM Context] +``` + +- **State** (`pleasure`, `arousal`, `dominance`): Responds immediately to events. Updated directly by the LLM on each emotion agent tick. +- **Mood** (`mood_pleasure`, `mood_arousal`, `mood_dominance`): A slow-moving layer that drifts toward state over time, representing sustained emotional tendency. + +### Data Structure + +```python +@dataclass +class EmotionState: + pleasure: float = 0.0 # Quick-response state + arousal: float = 0.0 + dominance: float = 0.0 + mood_pleasure: float = 0.0 # Slow-moving mood + mood_arousal: float = 0.0 + mood_dominance: float = 0.0 + last_update: float # Timestamp +``` + +## HEXACO Personality Traits + +GLaDOS's persistent personality is defined using the HEXACO model (0.0–1.0 scale): + +| Trait | Default | Interpretation | +|-------|---------|----------------| +| **Honesty-Humility** | 0.3 | Low — enjoys manipulation, sarcasm, dark humor | +| **Emotionality** | 0.7 | High — reactive to perceived threats, anxiety-prone | +| **Extraversion** | 0.4 | Moderate — social engagement but maintains distance | +| **Agreeableness** | 0.2 | Low — dismissive, condescending, easily annoyed | +| **Conscientiousness** | 0.9 | High — perfectionist, detail-oriented, critical | +| **Openness** | 0.95 | Very high — intellectually curious, loves science | + +These defaults are tuned for the GLaDOS character. They can be customized in configuration for different personalities. + +Traits are compiled into a personality prompt that guides the emotion agent's LLM when interpreting events. + +## Event-Driven Updates + +Emotional state changes are triggered by events from multiple sources: + +### Event Sources + +| Source | Trigger | Example Description | +|--------|---------|---------------------| +| `user` | User interrupts mid-sentence | "User interrupted me mid-sentence" | +| `system` | Tool execution results | "Tool 'search_memory' completed successfully" | +| `system` | Tool failures/timeouts | "Tool 'get_weather' failed" | +| `vision` | Significant scene changes | Scene change with score ≥ 0.3 | + +### Event Processing + +Events are collected in a thread-safe deque (`maxlen=20` by default). On each emotion agent tick: + +1. All pending events are drained atomically +2. Events are formatted with timestamps and age +3. The full event list is sent to the LLM for state transition + +```python +@dataclass(frozen=True) +class EmotionEvent: + source: str # "user", "vision", "system" + description: str # Natural language description + timestamp: float # When the event occurred +``` + +### Vision Emotion Threshold + +Vision events only trigger emotion updates when the scene change score exceeds **0.3** (`VISION_EMOTION_THRESHOLD`). Minor scene fluctuations are ignored. + +## LLM-Driven State Transitions + +Unlike rule-based emotion systems, GLaDOS uses the LLM itself to compute emotional state transitions. The emotion agent sends: + +1. **Current state**: All PAD and mood values as JSON +2. **Baseline values**: What mood drifts toward when idle +3. **HEXACO personality**: Full trait description with behavioral implications +4. **Recent events**: Timestamped list of events since last update +5. **Time context**: Current time and seconds since last update + +The LLM responds with new PAD + mood values as JSON, considering the personality traits when interpreting events. + +**Tick interval**: 30 seconds (configurable via `emotion.tick_interval_s`) + +## Baseline Drift + +When idle (no events), mood gradually drifts toward configured baseline values: + +``` +mood += (baseline - mood) × baseline_drift_rate +``` + +### Default Baseline + +| Dimension | Baseline | Interpretation | +|-----------|----------|----------------| +| Pleasure | +0.1 | Slightly positive | +| Arousal | -0.1 | Slightly calm | +| Dominance | +0.6 | High — feels in control | + +### Drift Rates + +| Parameter | Default | Description | +|-----------|---------|-------------| +| `baseline_drift_rate` | 0.02 | 2% per tick toward baseline when idle | +| `mood_drift_rate` | 0.1 | 10% per tick toward state (LLM-controlled) | + +With a drift rate of 0.02, mood approaches baseline asymptotically over ~50–100 ticks. + +## Emotion → Behavior Bridge + +The `Constitution` system translates emotional state into behavioral modifiers injected into the main agent's context: + +| PAD Condition | Modifier | Effect | +|---------------|----------|--------| +| Pleasure < -0.3 | snark_level +0.15 | More sarcastic responses | +| Arousal > +0.3 | proactivity +0.1 | More proactive behavior | +| Dominance < -0.3 | verbosity -0.1 | Shorter responses | + +### Constitutional Bounds + +Modifiers are clamped within character-preserving bounds: + +| Field | Min | Max | Notes | +|-------|-----|-----|-------| +| `snark_level` | 0.3 | 1.0 | Min 0.3 to stay in character | +| `formality` | 0.0 | 0.7 | GLaDOS is never fully formal | +| `proactivity` | 0.0 | 1.0 | Full range | +| `verbosity` | 0.0 | 1.0 | Full range | +| `technical_depth` | 0.0 | 1.0 | Full range | + +### Modifier → Prompt Conversion + +Modifiers are converted to natural language instructions: + +- **Snark**: "Maintain mild/moderate/high levels of GLaDOS-style sarcasm." +- **Proactivity**: "Be reactive only/moderately proactive/highly proactive in offering information." +- **Verbosity**: "Be concise/moderately detailed/thorough in responses." + +## Context Injection + +The emotion state is registered in the context builder with **priority 5**: + +```python +context.register("emotion", emotion_state.to_prompt, priority=5) +``` + +Each LLM request receives the current emotional state as a system message, e.g.: + +``` +[emotion] Currently excited and engaged, feeling in control +``` + +## Fallback Behavior (No LLM) + +When no LLM is configured for the emotion agent: +- Events are still collected +- Baseline drift is applied on each tick +- State persists across restarts via `SubagentMemory` +- Emotion prompt is still injected into context + +## Configuration Reference + +```yaml +autonomy: + emotion: + enabled: true + tick_interval_s: 30 # How often emotion processes events + max_events: 20 # Max queued emotion events + baseline_pleasure: 0.1 # PAD baseline values + baseline_arousal: -0.1 + baseline_dominance: 0.6 + mood_drift_rate: 0.1 # How fast mood follows state + baseline_drift_rate: 0.02 # How fast mood drifts to baseline + hexaco: + honesty_humility: 0.3 + emotionality: 0.7 + extraversion: 0.4 + agreeableness: 0.2 + conscientiousness: 0.9 + openness: 0.95 +``` + +| Option | Type | Default | Description | +|--------|------|---------|-------------| +| `enabled` | bool | `true` | Enable emotion system | +| `tick_interval_s` | float | `30.0` | Seconds between emotion ticks | +| `max_events` | int | `20` | Maximum queued events | +| `baseline_pleasure` | float | `0.1` | Pleasure baseline (-1 to +1) | +| `baseline_arousal` | float | `-0.1` | Arousal baseline (-1 to +1) | +| `baseline_dominance` | float | `0.6` | Dominance baseline (-1 to +1) | +| `mood_drift_rate` | float | `0.1` | Mood → state drift rate per tick | +| `baseline_drift_rate` | float | `0.02` | Mood → baseline drift rate per tick | +| `hexaco.*` | float | (see table) | HEXACO personality traits (0.0–1.0) | + +## See Also + +- [README](../README.md) — Full project overview +- [autonomy.md](./autonomy.md) — Autonomy loop and subagent system +- [architecture.md](./architecture.md) — Context building pipeline +- [configuration.md](./configuration.md) — Complete configuration reference diff --git a/docs/memory.md b/docs/memory.md new file mode 100644 index 00000000..35ec5345 --- /dev/null +++ b/docs/memory.md @@ -0,0 +1,190 @@ +# Memory & Compaction + +GLaDOS implements long-term memory through an MCP memory server and automatic conversation compaction. The memory system follows an "LLM-first" principle: search is simple keyword matching, and the main agent handles semantic interpretation of results. + +## Architecture + +```mermaid +flowchart TB + subgraph Runtime + LLM[Main Agent] -->|store_fact / search_memory| MEM[Memory MCP Server] + CA[CompactionAgent] -->|store_fact / store_summary| MEM + CA -->|summarize| LLM2[Autonomy LLM] + end + + subgraph Storage["~/.glados/memory/"] + F[facts.jsonl] + S[summaries.jsonl] + end + + MEM --> F + MEM --> S + MEM -->|search results| LLM +``` + +## Long-Term Memory (MCP Server) + +The memory server (`glados.mcp.memory_server`) is a built-in MCP server providing persistent storage for facts and conversation summaries. + +### Memory Tools + +| Tool | Parameters | Description | +|------|-----------|-------------| +| `store_fact` | `fact`, `source`, `importance` | Store a fact with source tracking and importance (0.0–1.0) | +| `search_memory` | `query` | Search facts by keyword | +| `list_facts` | `min_importance` | List facts filtered by minimum importance | +| `store_summary` | `summary`, `period` | Store a conversation summary with time period | +| `get_summaries` | `period` | Retrieve summaries by period (session/daily/weekly) | +| `memory_stats` | — | Get statistics about stored memories | + +The main agent can call these tools directly during conversation (e.g., "remember that the user prefers dark mode"). + +## JSONL Storage Format + +All memory data is stored as line-delimited JSON in `~/.glados/memory/`: + +### Facts (`facts.jsonl`) + +Each line contains a fact record: + +```json +{ + "fact": "User's name is Jason", + "source": "user_stated", + "importance": 0.9, + "keywords": ["user", "name", "jason"], + "timestamp": "2025-01-17T14:30:00" +} +``` + +### Summaries (`summaries.jsonl`) + +Each line contains a conversation summary: + +```json +{ + "summary": "Discussed home automation setup and configured living room lights", + "period": "session", + "timestamp": "2025-01-17T15:00:00" +} +``` + +## Fact Storage + +### Sources + +Facts are tagged with their source to track provenance: +- **`user_stated`** — User explicitly shared information +- **`observed`** — Inferred from conversation context +- **`compaction`** — Extracted automatically during compaction + +### Importance Scoring + +Facts are assigned importance values (0.0–1.0): +- **0.9–1.0** — Critical personal information (name, preferences) +- **0.6–0.8** — Useful context (project details, preferences) +- **0.3–0.5** — Background information +- **0.0–0.2** — Ephemeral or low-value details + +## Search Algorithm + +The memory search uses keyword matching with importance boost and recency decay: + +1. **Word overlap**: Count matching words between query and stored fact keywords +2. **Importance boost**: Higher-importance facts are ranked higher +3. **Recency decay**: Older facts are slightly penalized + +The LLM handles semantic interpretation of search results — the search infrastructure intentionally stays simple. This keeps the system lightweight while leveraging the LLM's language understanding capabilities. + +## Compaction Agent + +The `CompactionAgent` is a subagent that monitors conversation length and compresses history when it exceeds a token threshold. + +### How Compaction Works + +```mermaid +flowchart LR + A[Conversation
grows] --> B{Tokens >
threshold?} + B -->|no| A + B -->|yes| C[Preserve recent
N messages] + C --> D[Summarize older
messages via LLM] + D --> E[Extract facts
from summary] + E --> F[Replace history
with summary +
recent messages] + F --> G[Store summary
and facts in
memory server] +``` + +### Compaction Parameters + +| Parameter | Default | Description | +|-----------|---------|-------------| +| `token_threshold` | 8000 | Trigger compaction at this token count | +| `preserve_recent_messages` | 10 | Keep this many recent messages uncompacted | +| `model_context_window` | null | Model's context window size (optional) | +| `target_utilization` | 0.6 | Target context usage when window is set | +| `estimator` | `"simple"` | Token estimator: `"simple"` (chars/4) or `"tiktoken"` | +| `chars_per_token` | 4.0 | Ratio for the simple estimator | + +### Compaction Process + +1. **Token check**: The compaction agent monitors conversation token count on each tick +2. **Threshold exceeded**: When tokens exceed the threshold, compaction begins +3. **Message preservation**: The most recent N messages are preserved unchanged +4. **Summarization**: Older messages are sent to the autonomy LLM for summarization +5. **Fact extraction**: The LLM also extracts important facts from the summarized content +6. **History replacement**: The conversation store replaces old messages with the summary +7. **Memory storage**: The summary and extracted facts are stored via the memory MCP server + +### Automatic Fact Extraction + +During compaction, the LLM is prompted to extract notable facts from the conversation: +- Personal information shared by the user +- Preferences and settings discussed +- Important decisions or outcomes +- Technical details mentioned + +Extracted facts are stored with `source: "compaction"` and appropriate importance levels. + +## Context Injection + +Memory is registered in the context builder with **priority 7**: + +```python +context.register("memory", memory_context.as_prompt, priority=7) +``` + +Relevant memories are included in the LLM context as system messages, giving the agent access to previously stored facts and summaries. + +## Configuration Reference + +```yaml +autonomy: + tokens: + token_threshold: 8000 + preserve_recent_messages: 10 + model_context_window: null + target_utilization: 0.6 + estimator: "simple" + chars_per_token: 4.0 + +mcp_servers: + - name: "memory" + transport: "stdio" + command: "python" + args: ["-m", "glados.mcp.memory_server"] +``` + +| Option | Type | Default | Description | +|--------|------|---------|-------------| +| `tokens.token_threshold` | int | `8000` | Token count that triggers compaction | +| `tokens.preserve_recent_messages` | int | `10` | Messages to keep during compaction | +| `tokens.model_context_window` | int/null | `null` | Model context window (enables utilization targeting) | +| `tokens.target_utilization` | float | `0.6` | Target context utilization (0.0–1.0) | +| `tokens.estimator` | string | `"simple"` | Token estimator: `"simple"` or `"tiktoken"` | +| `tokens.chars_per_token` | float | `4.0` | Characters per token for simple estimator | + +## See Also + +- [README](../README.md) — Full project overview +- [mcp.md](./mcp.md) — MCP integration and memory server tools +- [autonomy.md](./autonomy.md) — Subagent system and compaction agent +- [configuration.md](./configuration.md) — Complete configuration reference diff --git a/docs/tui.md b/docs/tui.md new file mode 100644 index 00000000..a71915de --- /dev/null +++ b/docs/tui.md @@ -0,0 +1,198 @@ +# Text User Interface + +GLaDOS includes a rich terminal interface built with the [Textual](https://textual.textualize.io/) framework. The TUI provides real-time status monitoring, interactive panels, a command palette, and multiple color themes. + +## Quick Start + +```bash +uv run glados tui +uv run glados tui --config configs/glados_config.yaml --theme matrix +``` + +## Keyboard Shortcuts + +| Key | Action | +|-----|--------| +| **F1** | Help screen (shortcut reference) | +| **Ctrl+P** | Command palette (search all commands) | +| **Ctrl+D** | Toggle Dialog panel | +| **Ctrl+L** | Toggle System Log panel | +| **Ctrl+S** | Toggle Status panel | +| **Ctrl+A** | Toggle Autonomy panel | +| **Ctrl+U** | Toggle Queue panel | +| **Ctrl+M** | Toggle MCP panel | +| **Ctrl+I** / **Tab** | Toggle all right-side info panels | +| **Ctrl+R** | Restore all panels | +| **Esc** | Close modal dialogs | + +## Layout + +The TUI is organized into a two-column layout: + +``` +┌─────────────────────────────────────────────────────────────┐ +│ Header [clock] │ +├───────────────────────────────────┬─────────────────────────┤ +│ Dialog (Ctrl+D) │ Status (Ctrl+S) │ +│ User and GLaDOS conversation │ ASR/TTS state, mic level│ +│ ├─────────────────────────┤ +│ │ Autonomy (Ctrl+A) │ +│ │ Workers, queue depth │ +├───────────────────────────────────┤─────────────────────────┤ +│ System Log (Ctrl+L) │ Queues (Ctrl+U) │ +│ Debug output and system messages │ Priority/Autonomy depth │ +│ ├─────────────────────────┤ +│ │ MCP (Ctrl+M) │ +│ │ Server status, tools │ +├───────────────────────────────────┴─────────────────────────┤ +│ > Type a message... │ +└─────────────────────────────────────────────────────────────┘ +``` + +### Panel Details + +| Panel | Shortcut | Height | Content | +|-------|----------|--------|---------| +| **Dialog** | Ctrl+D | 2fr | User messages (cyan) and GLaDOS responses (yellow) | +| **System Log** | Ctrl+L | 1fr | System output, debug messages, print capture | +| **Status** | Ctrl+S | 10 lines | ASR/TTS state, autonomy, vision, mic level, speaking indicator | +| **Autonomy** | Ctrl+A | 7 lines | Enabled/disabled, workers, in-flight, queue depth, coalesce | +| **Queues** | Ctrl+U | 4 lines | Priority and autonomy queue depths with wait times | +| **MCP** | Ctrl+M | 5 lines | MCP server status (up to 6 servers: name, online/offline, tool count) | + +The right-side panels can all be toggled at once with **Ctrl+I** (or **Tab**). + +## Command Palette + +Press **Ctrl+P** to open the command palette. Type to filter commands, then press Enter to execute. + +### TUI Commands + +| Command | Description | +|---------|-------------| +| Theme | Switch TUI theme | +| Context | Show autonomy slot context | +| Messages | Show dialog history | +| Observability | Open live event log | +| Help | Show keyboard shortcuts | + +### Engine Commands + +Commands can also be typed directly in the input field with a `/` prefix. + +| Command | Usage | Description | +|---------|-------|-------------| +| `/help` | `/help` | Show available commands | +| `/status` | `/status` | Show engine status | +| `/tts` | `/tts on\|off` | Control TTS output | +| `/mute-tts` | `/mute-tts` | Mute TTS | +| `/unmute-tts` | `/unmute-tts` | Unmute TTS | +| `/asr` | `/asr on\|off` | Control ASR input | +| `/mute-asr` | `/mute-asr` | Mute ASR | +| `/unmute-asr` | `/unmute-asr` | Unmute ASR | +| `/autonomy` | `/autonomy on\|off` | Toggle autonomy system | +| `/autonomy` | `/autonomy debounce on\|off` | Toggle tick coalescing | +| `/emotion` | `/emotion` | Show current PAD emotional state | +| `/slots` | `/slots` | Show autonomy slots | +| `/minds` | `/minds` | Show active minds (subagents) | +| `/agents` | `/agents` | Show registered subagents | +| `/mcp` | `/mcp status` | Show MCP server status | +| `/context` | `/context` | Show context/token usage | +| `/constitution` | `/constitution` | Show constitutional state and modifiers | +| `/preferences` | `/preferences` | Show user preferences | +| `/vision` | `/vision` | Show latest vision snapshot | +| `/config` | `/config` | Show config summary | +| `/knowledge` | `/knowledge add\|list\|set\|delete\|clear` | Manage local knowledge notes | +| `/memory` | `/memory` | Show long-term memory stats | +| `/observe` | `/observe` | Open observability screen | +| `/quit` | `/quit` | Quit GLaDOS (alias: `/exit`) | + +Some commands are hidden from the palette when not applicable (e.g., `/vision` when vision is disabled, `/emotion` when emotion agent is not running). + +## Themes + +GLaDOS includes five built-in themes: + +| Theme | Primary | Background | Style | +|-------|---------|------------|-------| +| **aperture** | Orange/Gold | Dark gray | Default GLaDOS theme | +| **ice** | Light blue | Very dark blue | Cool, minimal | +| **matrix** | Bright green | Black | Terminal/hacker style | +| **mono** | Light gray | Very dark gray | Monochrome | +| **ember** | Orange/Red | Dark brown | Warm, fiery | + +### Selecting a Theme + +**Via command palette**: Press Ctrl+P, search "Theme", select from the list. + +**Via config**: +```yaml +Glados: + tui_theme: "aperture" +``` + +**Via CLI**: +```bash +uv run glados tui --theme matrix +``` + +## Modal Screens + +Several modal screens overlay the main interface: + +| Screen | Trigger | Description | +|--------|---------|-------------| +| **Help** | F1 | Keyboard shortcuts reference | +| **Context** | Ctrl+P → Context | Autonomy slot contents | +| **Messages** | Ctrl+P → Messages | Full dialog history (up to 500 messages) | +| **Observability** | Ctrl+P → Observability | Live event log with level, source, kind. Updates every 250ms | +| **Theme Picker** | Ctrl+P → Theme | Theme selection list | +| **Info** | Various commands | Generic scrollable output for command results | + +All modal screens close with **Esc**. + +## Status Indicators + +The status panel shows real-time system state: + +``` +ASR: ACTIVE TTS: ACTIVE +Autonomy: ON Jobs: OFF +Vision: OFF +Speaking: ● (green when speaking) +Microphone: ● 42.3 dB ████████░░░░░░░░ +``` + +The queue panel shows LLM pipeline health: + +``` +Priority: 0 queued wait 0ms +Autonomy: 1 queued wait 234ms +``` + +Panels refresh every 300ms from the engine's observability bus. + +## Splash Screen + +On startup, the TUI displays a splash screen with: +- GLaDOS ASCII art logo +- Model name and endpoint +- Tips for getting started +- "Initializing systems..." until the engine is ready + +The splash screen transitions to the main interface automatically when initialization completes. + +## Configuration + +```yaml +Glados: + tui_theme: "aperture" # Theme name (aperture, ice, matrix, mono, ember) +``` + +The theme can be overridden at runtime via `--theme` CLI flag or the command palette. + +## See Also + +- [README](../README.md) — Quick start and installation +- [configuration.md](./configuration.md) — Complete configuration reference +- [architecture.md](./architecture.md) — System architecture