A collection of tutorials and demos for reinforcement learning (RL) with large language models (LLMs).
- Example scripts for training and playing with RL agents and LLMs
- Tic-Tac-Toe PPO demos
- Llama model learning scripts
- transformers (Hugging Face) for LLMs (Llama, TinyLlama)
- PyTorch for deep learning
- stable-baselines3 and sb3_contrib for RL algorithms (PPO, MaskablePPO)
- PettingZoo for multi-agent RL environments (Tic-Tac-Toe, Connect Four)
- Gymnasium for RL environment interface
- NumPy
- Proximal Policy Optimization (PPO)
- Maskable PPO (for environments with invalid action masking)
See the tutorials/
directory for example scripts and usage.