a simple tabular Q-learning using epsilon-greedy on frozen ice openAI gym environment.
The red line represent the evolution of epsilon value over time.
The blue line represent the average accuracy on goal-reaching task for the last 20 episodes.
The x axis represent the episode id + 20.
