An attempt at replicating AlphaGo by DeepMind.
Taken from the Rochester-NRT/RocAlphaGo project.
Implements the data generation functionalities for RL and Value iteration stages.
Functions RL_Playout(numGames, policyModel, filename=None, opponentModel) and Value_Playout(numGames, sl_model, rl_model, filename, U_MAX) wraps around those functions numGames times and stores the result to an .hdf5 file specified via filename.
Functions Gym_DataGen(policyModel), RL_DataGen(policyModel, opponentModel), and valueDataGen(sl_model, rl_model, U_MAX) implements 1 pass through of a simulation, and returns appropriate data for that simulation.
.hdf5 file contents for each functions are as follows:
*RL_Playout() - 'states' 'actions' 'rewards' (actions not 1-hot encoded)
*Value_Playout() - 'states' 'rewards'
Implements the Go player class.
Important Fields:
self.states- A list of all states encountered while playingself.actions- A list of all actions madeself.nnmodel- NN backend that makes the decisionself.color- NNGoPlayer.BLACK or NNGoPlayer.WHITEself.rocColor- Rocgo.BLACK or Rocgo.WHITEself.pachiColor- pachi_py.BLACK or pachi_py.WHITE
Important Functions:
makemoveGym()makemoveRL(playRandom)makeRandomValidMove()
nn_vs_nnGame(rocEnv, playBlack, nnBlack, nnWhite) is also implemented, and it plays out a game between two NNGoPlayer classes starting at the board configuration specified in rocEnv
Implements I/O related functions.
Useful Functions:
write2hdf5(filename, dict2store)hdf52dict(hdf5Filename)hdf5Augment(filename, outfilename)pachiGameRecorder(filename)
Wrapper functions for the Rochester Go Board implementations.
Useful Functions:
initRocBoard()rocBoard2State(rocEnv)printRocBoard(rocEnv)returnRocBoard(rocEnv)get_legal_coords(rocEnv)intMove2rocMove(rocEnv)
A Monte-Carlo Tree Search implementation. Class MCNode represents a node in a tree. MCTreeSearch() can be called to initiate the search.