Here is the course project for IERG 5350. In this project, we successfully train an agent to play super mario bros. By using Proximal Policy Optimization (PPO) algorithm Proximal Policy Optimization Algorithms paper.
Below are the demos for all solved levels by Dec. 10, 2020. Note that our agent can be further trained to solve more levels without too much efforts. You can try it by simply changing the learning rate.
Current solved levels (All stages in World 1)
we compare the performance of the trained agent with that of human player world record under the warpless mode.
Level | Trained Agent | World Record | Average Human Players |
---|---|---|---|
World 1, Stage 1 | 52 | 52 | 64 |
World 1, Stage 2 | 53 | 50 | 63 |
World 1, Stage 3 | 52 | 42 | 58 |
World 1, Stage 4 | 54 | 40 | 57 |
- Train your model by running
python train.py
. For example:python train.py --world 5 --stage 2 --lr 1e-4 --action_type complex
- Test your trained model by running
python test.py
. For example:python test.py --world 5 --stage 2 --action_type complex --iter 100
For CSE server, do not forget add these two lines (tested).
Xvfb :0 -screen 0 1024x768x24 -ac +extension GLX +render -noreset &> xvfb.log &
export DISPLAY=:0
Build:
sudo docker build --network=host -t ppo .
Run:
docker run --runtime=nvidia -it --rm --volume="$PWD"/../Super-mario-bros-PPO-pytorch:/Super-mario-bros-PPO-pytorch --gpus device=0 ppo
Then inside docker container, you could simply run train.py or test.py scripts as mentioned above.