A modular implementation of the classic Snake game with reinforcement learning using Gymnasium and Stable Baselines3.
├── snake_game.py # Core game logic + manual play
├── snake_env.py # Gymnasium environment wrapper
├── train_snake.py # Training script with SB3
├── pyproject.toml # Project configuration and dependencies
└── README.md # This file
- Initialize the project:
uv init- Install dependencies:
uv syncBefore training the AI, you can play the game yourself:
# Basic play
uv run python snake_game.py
# Custom settings
uv run python snake_game.py --width 25 --height 20 --speed 10Manual Controls:
- Arrow Keys: Move snake (Up, Right, Down, Left)
- SPACE: Pause/Unpause game
- R: Restart game
- ESC: Quit game
Train with default settings (PPO, 100k steps):
uv run python train_snake.pyTrain for more steps:
uv run python train_snake.py train 200000Train with visual rendering (slower but you can watch):
uv run python train_snake.py train 50000 --renderWatch the AI play after training:
uv run python train_snake.py playWatch specific model play 10 games:
uv run python train_snake.py play snake_model 10- Danger Detection (3): Straight ahead, right turn, left turn
- Direction (4): Current direction (up, right, down, left)
- Food Location (4): Food relative to head (up, down, left, right)
- 0: Up
- 1: Right
- 2: Down
- 3: Left
- +10: Eating food
- -10: Game over (collision)
- Penalty: Taking too long without eating (encourages food-seeking behavior)
Pure Python implementation of Snake game logic:
- No RL dependencies
- Handles game state, collisions, scoring
- Pygame rendering support
- Clean separation of game logic from AI
Gymnasium environment wrapper:
- Implements standard Gym interface
- Handles observation/action space definitions
- Manages rendering modes
- Environment registration
Complete training pipeline:
- Algorithm support: PPO (Proximal Policy Optimization)
- Training progress monitoring
- Model saving/loading
- Command-line interface
- Random Agent: ~0-2 score
- Trained Agent (50k steps): ~5-15 score
- Well-trained Agent (200k+ steps): ~15-30+ score