This project contains a working example of a contextual multi-armed bandit.
I wrote this after becoming interested in the contextual bandit problem for providing personalized recommendations, but not being able to find any working code. So I made this to be able to understand how the algorithm works :)
This repository indexes high on code and low on docs. Briefly, it contains:
-
A driver ipython notebook
contextual_bandit_sim.ipynb. You should start here to understand the contents. -
A data generator. We initialize hidden contextual variables that are used to create synthetic samples. Let's call this latent contextual variable set
L. Let's call the synthetic dataX. -
Two strategies / reinforcement functions
Sby which the bandits can be evaluated (binomial:BinaryStrategy.pyand continuous:PositiveStrategy.py). -
A simulator
Simulator.pythat contains a set of variablesMthat are estimates ofL, that through observation ofXand evaluation of those observations using stratigiesScan be used to update estimatesMto converge to the hiddenL.
If you find this useful, please share, retweet, and follow me @allenday on twitter.
Have fun!