Proximal Policy Optimization (PPO) for playing Super Mario Bros

Introduction

Here is the course project for IERG 5350. In this project, we successfully train an agent to play super mario bros. By using Proximal Policy Optimization (PPO) algorithm Proximal Policy Optimization Algorithms paper.

Demo

Below are the demos for all solved levels by Dec. 10, 2020. Note that our agent can be further trained to solve more levels without too much efforts. You can try it by simply changing the learning rate.

Current solved levels (All stages in World 1)

Comparison with human players in warpless mode

we compare the performance of the trained agent with that of human player world record under the warpless mode.

Level	Trained Agent	World Record	Average Human Players
World 1, Stage 1	52	52	64
World 1, Stage 2	53	50	63
World 1, Stage 3	52	42	58
World 1, Stage 4	54	40	57

How to run

Train your model by running python train.py. For example: python train.py --world 5 --stage 2 --lr 1e-4 --action_type complex
Test your trained model by running python test.py. For example: python test.py --world 5 --stage 2 --action_type complex --iter 100

For CSE server, do not forget add these two lines (tested).

Xvfb :0 -screen 0 1024x768x24 -ac +extension GLX +render -noreset &> xvfb.log &
export DISPLAY=:0

Docker

Build:

sudo docker build --network=host -t ppo .

Run:

docker run --runtime=nvidia -it --rm --volume="$PWD"/../Super-mario-bros-PPO-pytorch:/Super-mario-bros-PPO-pytorch --gpus device=0 ppo

Then inside docker container, you could simply run train.py or test.py scripts as mentioned above.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
demo		demo
docs		docs
output		output
src		src
trained_models		trained_models
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Proximal Policy Optimization (PPO) for playing Super Mario Bros

Introduction

Demo

Comparison with human players in warpless mode

How to run

Docker

About

Releases

Packages

Languages

License

FernandoPeres/Super-Mario-Bros-PPO

Folders and files

Latest commit

History

Repository files navigation

Proximal Policy Optimization (PPO) for playing Super Mario Bros

Introduction

Demo

Comparison with human players in warpless mode

How to run

Docker

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages