ML reinforcement model for the game of Go https://www.nature.com/articles/nature24270 agz_unformatted_nature.pdf
This project implements a simplified version of DeepMind's AlphaZero algorithm to play the game of Go on a 5×5 board. Leveraging reinforcement learning (RL) without any human data, the system learns to achieve superhuman performance through self-play and Monte Carlo Tree Search (MCTS).
Key features:
- Inspired by Google DeepMind’s AlphaZero, achieving superhuman strength in board games.
- Pure RL approach: no human games or expert data used.
- Configurable board size (default 5×5, easily adjustable).
- Endgame and win conditions are tweaked for simplicity.
- Pretrained model (
model_19.pt
) included for instant play.
Go is an ancient two-player strategy board game with simple rules but immense complexity:
- Board: Traditional sizes are 19×19, 13×13, or 9×9; here we use 5×5 for faster training and experimentation.
- Stones: Black and White alternate placing stones on empty intersections.
- Groups & Liberties: Connected stones form a group; liberties are adjacent empty points. Groups without liberties are captured and removed.
- Objective: Surround more territory (empty points) and capture opponent stones. In this simplified version, the winner is decided by stone count when no legal moves remain or a line of identical stones dominates.
Despite its simple rules, Go’s game-tree complexity on a 5×5 board is already on the order of 3.1×10⁶ possible positions; a 19×19 board has ~10¹⁷⁰ possibilities.
The code is organized into Jupyter notebook cells; here’s what each cell does:
-
Cell 1 (
GoGameAlphaZero
):- Implements the Go game logic on an N×N board.
- Manages state representation, valid move generation, capturing rules, and endgame evaluation.
-
Cell 2 (
ResNet
&ResBlock
):- Defines the neural network architecture: a convolutional residual network for policy and value heads.
- Takes a 3-channel encoded board state and outputs move probabilities and game-value predictions.
-
Cell 3 (Demo & Visualization):
- Shows how to instantiate the game, make moves, encode state, and run the
ResNet
forward pass. - Plots the policy distribution over the 25 actions using Matplotlib.
- Shows how to instantiate the game, make moves, encode state, and run the
-
Cell 4 (
Node
&MCTS
):- Implements the MCTS logic with Upper Confidence Bound (UCB) for selection, expansion, simulation via the neural network, and backpropagation of values.
-
Cell 5 (
AlphaZero
class):- Coordinates self-play, training, and evaluation loops.
- Saves the best model based on win-rate improvement and checkpoints after each iteration.
-
Cell 6 (Interactive Play Script):
- Provides a command-line interface to play against the trained model.
- Loads
model_19.pt
and uses MCTS for AI moves; allows human vs. AI games on the console.
- Clone the repository.
- Install dependencies:
pip install torch numpy matplotlib tqdm
- Place
model_19.pt
in the project root for immediate play.
python play.py # uses model_19.pt by default
Follow on-screen prompts to play as Black against the AI.
- Adjust parameters in
train.py
(e.g., board size, number of iterations). - Run:
python train.py
- New models and optimizers are saved per iteration. To use a newly trained model, place its
.pt
file in the root and run the play script.
- Board Size: Change
board_size
inGoGameAlphaZero
initializer. - AlphaZero Arguments (in code):
C
: Exploration constant for UCB.num_searches
: MCTS rollouts per move.num_iterations
: Meta-iterations of self-play + training.num_selfPlay_iterations
: Games per iteration.num_epochs
: Training epochs per iteration.batch_size
: Samples per training batch.evaluation_games
: Games to evaluate win rate.
Feel free to:
- Scale to larger board sizes.
- Experiment with network depth/width.
- Integrate with GPU training.
This project is released under the MIT License. Enjoy exploring AlphaZero on small boards!