Skip to content

Conversation

@lschiavini
Copy link
Collaborator

@lschiavini lschiavini commented Nov 27, 2025

Summary

Adds LLM policy support allowing GPT, Claude, or local Ollama models to control MettaGrid agents.

The LLM observes the game state as structured JSON and outputs action decisions.

Features

New Policy Classes:

  • LLMMultiAgentPolicy - Base multi-agent LLM policy
  • LLMGPTMultiAgentPolicy (llm-gpt) - OpenAI GPT support
  • LLMClaudeMultiAgentPolicy (llm-claude) - Anthropic Claude support
  • LLMOllamaMultiAgentPolicy (llm-ollama) - Local Ollama models (free)

Dynamic Prompt System:

  • LLMPromptBuilder - Context-aware prompt generation
  • Sends full game rules only on the first step and every N steps (configurable context window)
  • Maintains conversation history for multi-turn interactions

Observation Debugging:

  • ObservationDebugger - Human-readable observation output
  • Shows agent inventory, nearby objects, walkable directions
  • Enabled via kw.debug_mode=true

Cost Tracking:

  • Automatic token usage and cost calculation (based on openai's and anthropic's cost tables as of november 2025)
  • Prints summary on exit for paid APIs (GPT/Claude)

Usage

  # Local Ollama (free)
  cogames play -m machina_1 -p class=llm-ollama

  # OpenAI GPT (requires OPENAI_API_KEY exported in your terminal)
  cogames play -m machina_1 -p class=llm-gpt

  # Anthropic Claude (requires ANTHROPIC_API_KEY exported in your terminal)
  cogames play -m machina_1 -p class=llm-claude

  # With debug mode
  cogames play -m machina_1 -p class=llm-gpt,kw.debug_mode=true --steps 50

  # With different context length
  cogames play -m machina_1 -p class=llm-gpt,kw.debug_mode=true --steps 50

Main Files Changed

File Description
mettagrid/policy/llm_policy.py Main LLM policy implementation (~1300 lines)
mettagrid/policy/llm_prompt_builder.py Dynamic prompt generation with context windows
mettagrid/policy/observation_debugger.py Human-readable observation formatting
mettagrid/policy/loader.py Updated to pass mg_cfg to policies that accept it
cogames/README.md Added LLM policy example to Quick Start
tests/policy/test_llm_prompt_builder.py Unit tests (18 tests)
tests/policy/test_llm_prompt_compatibility.py Compatibility tests (10 tests)

Asana Task

@lschiavini lschiavini force-pushed the lschiavini/llm-policy branch from de868b5 to 4a65053 Compare November 27, 2025 20:14
@lschiavini lschiavini force-pushed the lschiavini/llm-policy branch from 4a65053 to ed7c8ab Compare December 1, 2025 14:38
@lschiavini lschiavini marked this pull request as ready for review December 1, 2025 14:38
Comment on lines +250 to +251
for col in range(obs_width):
if row == agent_x and col == agent_y:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coordinate system confusion in grid visualization: The loop iterates for row in range(obs_height): for col in range(obs_width): then checks if row == agent_x and col == agent_y:.

Since agent_x = obs_width // 2 (column-based) and agent_y = obs_height // 2 (row-based), comparing row with agent_x is incorrect.

Should be:

if row == agent_y and col == agent_x:

Or restructure to use consistent variable names where row_idx and col_idx are used instead of row/col.

Suggested change
for col in range(obs_width):
if row == agent_x and col == agent_y:
for col in range(obs_width):
if row == agent_y and col == agent_x:

Spotted by Graphite Agent

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants