Add LLM-based policy for MettaGrid agents #4090

lschiavini · 2025-11-27T20:07:22Z

Summary

Adds LLM policy support allowing GPT, Claude, or local Ollama models to control MettaGrid agents.

The LLM observes the game state as structured JSON and outputs action decisions.

Features

New Policy Classes:

LLMMultiAgentPolicy - Base multi-agent LLM policy
LLMGPTMultiAgentPolicy (llm-gpt) - OpenAI GPT support
LLMClaudeMultiAgentPolicy (llm-claude) - Anthropic Claude support
LLMOllamaMultiAgentPolicy (llm-ollama) - Local Ollama models (free)

Dynamic Prompt System:

LLMPromptBuilder - Context-aware prompt generation
Sends full game rules only on the first step and every N steps (configurable context window)
Maintains conversation history for multi-turn interactions

Observation Debugging:

ObservationDebugger - Human-readable observation output
Shows agent inventory, nearby objects, walkable directions
Enabled via kw.debug_mode=true

Cost Tracking:

Automatic token usage and cost calculation (based on openai's and anthropic's cost tables as of november 2025)
Prints summary on exit for paid APIs (GPT/Claude)

Usage

  # Local Ollama (free)
  cogames play -m machina_1 -p class=llm-ollama

  # OpenAI GPT (requires OPENAI_API_KEY exported in your terminal)
  cogames play -m machina_1 -p class=llm-gpt

  # Anthropic Claude (requires ANTHROPIC_API_KEY exported in your terminal)
  cogames play -m machina_1 -p class=llm-claude

  # With debug mode
  cogames play -m machina_1 -p class=llm-gpt,kw.debug_mode=true --steps 50

  # With different context length
  cogames play -m machina_1 -p class=llm-gpt,kw.debug_mode=true --steps 50

Main Files Changed

File	Description
mettagrid/policy/llm_policy.py	Main LLM policy implementation (~1300 lines)
mettagrid/policy/llm_prompt_builder.py	Dynamic prompt generation with context windows
mettagrid/policy/observation_debugger.py	Human-readable observation formatting
mettagrid/policy/loader.py	Updated to pass mg_cfg to policies that accept it
cogames/README.md	Added LLM policy example to Quick Start
tests/policy/test_llm_prompt_builder.py	Unit tests (18 tests)
tests/policy/test_llm_prompt_compatibility.py	Compatibility tests (10 tests)

Asana Task

graphite-app · 2025-12-01T14:42:50Z

packages/mettagrid/python/src/mettagrid/policy/llm_prompt_builder.py

+            for col in range(obs_width):
+                if row == agent_x and col == agent_y:


Coordinate system confusion in grid visualization: The loop iterates for row in range(obs_height): for col in range(obs_width): then checks if row == agent_x and col == agent_y:.

Since agent_x = obs_width // 2 (column-based) and agent_y = obs_height // 2 (row-based), comparing row with agent_x is incorrect.

Should be:

if row == agent_y and col == agent_x:

Or restructure to use consistent variable names where row_idx and col_idx are used instead of row/col.

Suggested change

for col in range(obs_width):

if row == agent_x and col == agent_y:

for col in range(obs_width):

if row == agent_y and col == agent_x:

Spotted by Graphite Agent

Is this helpful? React 👍 or 👎 to let us know.

lschiavini force-pushed the lschiavini/llm-policy branch from de868b5 to 4a65053 Compare November 27, 2025 20:14

lschiavini added 22 commits November 29, 2025 11:18

init

aa9f91a

demo llm policy run

5aa4bd7

usage costs

9584f2f

dotenv

f25bd96

ollama

4f89369

wip

5fa6fad

wip picking models

e0dfd4e

get models and choose which to run

43fa130

cost summary

977b049

wip

5a20a4e

wip dynamic prompts

3d4398c

adding actions only once per context window

deb9ba3

send mettagrid config to policy

1b78bd9

dynamic prompt

4a652a1

removing logs when not debug, and change prompt

fa4c477

debug flag

f4c2893

updating readme

14b40f1

error when no api key set

a43199b

rollback

c504b55

minimal game rules

0668cc5

condensing tests

f658291

one final test

ed7c8ab

lschiavini force-pushed the lschiavini/llm-policy branch from 4a65053 to ed7c8ab Compare December 1, 2025 14:38

lschiavini marked this pull request as ready for review December 1, 2025 14:38

github-actions bot assigned lschiavini Dec 1, 2025

graphite-app bot reviewed Dec 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add LLM-based policy for MettaGrid agents #4090

Add LLM-based policy for MettaGrid agents #4090

lschiavini commented Nov 27, 2025 •

edited by github-actions bot

Loading

Uh oh!

graphite-app bot Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		for col in range(obs_width):
		if row == agent_x and col == agent_y:

Add LLM-based policy for MettaGrid agents #4090

Are you sure you want to change the base?

Add LLM-based policy for MettaGrid agents #4090

Conversation

lschiavini commented Nov 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Features

Usage

Main Files Changed

Uh oh!

graphite-app bot Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lschiavini commented Nov 27, 2025 •

edited by github-actions bot

Loading