Autonomous research loop for Texas Hold'em poker AI. An agent modifies train.py, trains a DQN for 5 minutes, evaluates against a random opponent, keeps/discards based on bb/hand, and loops forever.
uv sync
uv run train.pyprepare.py— Fixed: environment setup, evaluation harness, constants (DO NOT MODIFY)train.py— Agent modifies this: DQN config, network, training loopprogram.md— Agent instructions