Skip to content

Assignment 3 reinforcement learning, learn best policy for tic tac toe using monte carlo search tree

Notifications You must be signed in to change notification settings

geoffreyvd/RL-tictactoe-MCST

Repository files navigation

RL-tictactoe-MCST

Assignment 3 model based reinforcement learning, learn best policy for tic tac toe using monte carlo search tree

Alt text

The plot shows the convergence of the model using different values for the c parameter. We find that a more greedy value (closer to 1) converges way faster and also (almost) finds the best policy. Intuitively this makes sense since its such a simple problem with many duplicate states (since order doesn't matter in tictactoe), therefore extensively exploring is a waste of time.

RL-shortestpath-Qlearning

Assignment 4, model-less (value based) reinforcement learning solution to the shortest path problem. I have implemented Q-learning with some optimizations to find the end point (40,0) in a 50x50 grid, where each state transition is a stochastic process (it may or may not be congested). It learns the fastest path by looking at the maximum Q value for a certain starting point and action pair.

Alt text

The image shows the decreasing Q value the further away from the end point.

About

Assignment 3 reinforcement learning, learn best policy for tic tac toe using monte carlo search tree

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages