-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solve Hex! and figure out hyperparams #29
Comments
What is your intuition about this initial attempt? |
I think the ratio of selfplay to training is too high. In general, the self play is much more computational heavy, and we want to get the most from each data entry, so lets train A LOT! |
I want to think more about this - but meanwhile see the attached results with the above config. Also, importantly, this is 4x4 hex, so this somewhat affects the numbers you mentioned and the intuition (but is almost the same order of magnitude). |
If the learning stops so fast, maybe our learning rate is too high. Also, policy accuracy of 0.3 is not terriable when you have 16 options, but i suspect the network is very limitied. Would you like to try the ConvNetV1 instead? And another point about the number of training entries: if latest_data_entries=1000 this basiclly says you only learn from the last iteration data, i really think we should either increase latest_data_entries and iteration_data_entries or decrease games_num to ~10 and do 1000 iterations |
Nevertheless, this is super exciting! cant wait to play against it |
Train a two headed network until the engine wins against us consistently.
Understand how long such training requires and with what hyper parameters?:
learning rate
games generated in self play
temperature for softmax, for how many moves should we use softmax
model structure
should we take a single position from each game or more
a lot of this can be taken from lc0
The text was updated successfully, but these errors were encountered: