The solution uses Deep Reinforcement Learning (Q-learning) approach.
Repository contains trained network (after 1000 iterations)
- is wall/tail directly up front
- is wall/tail directly on the right side
- is wall/tail directly on the left side
- is snack ahead (no matter how far)
- is snack on the right (no matter how far)
- is snack on the left (no matter how far)
- do nothing (keep going on current direction)
- turn right
- turn left
- +1 for finding snack
- -1 for hitting wall/tail
| Param | Value | Info |
|---|---|---|
| LEARNING_RATE | 0.001 | |
| GAMMA | 0.95 | Discount rate |
| EPSILON | 1.0 | Exploration rate |
| EPSILON_DECAY | 0.995 | |
| EPSILON_MIN | 0.01 | |
| MEMORY | 2000 | Experience replay |
| MINI_BATCH | 32 |

