RL-tictactoe-MCST

Assignment 3 model based reinforcement learning, learn best policy for tic tac toe using monte carlo search tree

The plot shows the convergence of the model using different values for the c parameter. We find that a more greedy value (closer to 1) converges way faster and also (almost) finds the best policy. Intuitively this makes sense since its such a simple problem with many duplicate states (since order doesn't matter in tictactoe), therefore extensively exploring is a waste of time.

RL-shortestpath-Qlearning

Assignment 4, model-less (value based) reinforcement learning solution to the shortest path problem. I have implemented Q-learning with some optimizations to find the end point (40,0) in a 50x50 grid, where each state transition is a stochastic process (it may or may not be congested). It learns the fastest path by looking at the maximum Q value for a certain starting point and action pair.

The image shows the decreasing Q value the further away from the end point.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
qlearningShortestPath		qlearningShortestPath
DPRL_3.pdf		DPRL_3.pdf
QlearningShortestPath.py		QlearningShortestPath.py
QlearningShortestPathWithSubRewards.py		QlearningShortestPathWithSubRewards.py
README.md		README.md
plotParamterC.png		plotParamterC.png
tictactoeRL.py		tictactoeRL.py
tictactoeRL2.py		tictactoeRL2.py
tictactoeUCB.py		tictactoeUCB.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RL-tictactoe-MCST

RL-shortestpath-Qlearning

About

Releases

Packages

Languages

geoffreyvd/RL-tictactoe-MCST

Folders and files

Latest commit

History

Repository files navigation

RL-tictactoe-MCST

RL-shortestpath-Qlearning

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages