This repository contains the reference implementation of the popular Proximal Policy Optimization (PPO) algorithm and accompanies my Final Master's Project report on the topic:
While the aforementioned report aims at theoretically explaining the PPO algorithm in a detailed, complete, and comprehensible way, the reference implementation provided in this repository aims at practically illustrating the PPO algorithm in a comprehensible way.
For more information on the background of this work, the reader is referred to the corresponding report (link to be added later).
For an easy setup, it is recommended to install the requirements inside a virtual environment, e.g. using virtualenvwrapper.
After having set up and activated some virtual environment, proceed as follows:
- Open the terminal
- Navigate to the main directory of this repository, i.e. the directory containing the
requirements.txt
file - Install the requirements using pip as follows:
pip install -r requirements.txt
When not using a virtual environment, add the --user
flag to the pip command above.
This concludes the setup.
Training (as well as replaying the behavior of previously trained agents) is initiated via the
main.py
file.
Before training an agent, one has to set up a corresponding configuration file defining
the conditions under which training and testing are performed. For convenience, some
default configuration files are provided already. They can be found
here.
Once, a configuration file is created or selected from the examples, training can be
initiated as follows.
Suppose, we want to train an agent on the OpenAI Gym'y CartPole-v0-Environment using one of the provided default configuration files. To train an agent using the aforementioned configuration, proceed as follows:
- Open the terminal
- Navigate to the main directory of this repository, i.e. the directory containing the
main.py
file - Run
python main.py -c='./default_config_files/config_cartpole.py'
. Thec
-flag stands for "config".
Upon completion of the training procedure, a new directory will be created inside the
directory where also the main.py
file is located. That directory will be called
train_results
and contain sub-directories. Each subdirectory inside train_results
will
contain the saved data associated with a performed test run. Such a sub-directory, having
some name like 2021_06_27__22_58_00__tsrOOxrwle
, will contain the following elements:
config.json
: a copy of the config file used to train and test the agentpolicy_model.pt
: This archive contains the saved policy network. Can be used to replay some agent later (which is only saved if requested by user via config file)val_net_model.pt
: This archive contains the saved state value network (only saved if requested by user via config file)train_stats.json
: File containing the training- and evaluation statistics
For replaying a trained model, while visually rendering the agent's environment, the main
file main.py
has to be called with two arguments:
- The path to the configuration file specifying the set up of the agent
- The path to the archive containing the trained policy network
Let's suppose again, the aim was to replay the behavior of the trained agent stored in the
directory ./train_results/2021_06_27__22_58_00__tsrOOxrwle
. Then, to render the agent's
performance, do the following:
- Open the terminal
- Navigate to the main directory of this repository, i.e. the directory containing the
main.py
file - Run
python main.py -c='./train_results/2021_06_27__22_58_00__tsrOOxrwle/config_cartpole.py' -d='./train_results/2021_06_27__22_58_00__tsrOOxrwle/policy_model.pt'
, where thec
-flag stands for "config" again and thed
-flag provides the path to the archive containing the trained policy network.