Solve Hex! and figure out hyperparams #29

barakugav · 2022-08-18T11:38:37Z

Train a two headed network until the engine wins against us consistently.
Understand how long such training requires and with what hyper parameters?:
learning rate
games generated in self play
temperature for softmax, for how many moves should we use softmax
model structure
should we take a single position from each game or more

a lot of this can be taken from lc0

poja · 2022-10-25T14:51:25Z

What is your intuition about this initial attempt?
https://github.com/poja/RL/blob/smaller-hex-2/train/config.json

barakugav · 2022-10-25T16:07:05Z

I think the ratio of selfplay to training is too high.
In each iteration we will play 100 games with ~60 positions and 1000 simulations per move, 6,000,000 network calculation in total.
If we generate 6000 positions each iterations, i think we should train at least on 20000 entries, and we can choose them from the latest 100000 entries.

In general, the self play is much more computational heavy, and we want to get the most from each data entry, so lets train A LOT!

poja · 2022-10-26T05:38:28Z

I want to think more about this - but meanwhile see the attached results with the above config.
Notice how at some point the learning stops (in policy before value).
And the last row seems to me "lucky" in the value-loss, i.e. the next row wouldn't necessarily be as good.

Also, importantly, this is 4x4 hex, so this somewhat affects the numbers you mentioned and the intuition (but is almost the same order of magnitude).

221025_175846.txt

barakugav · 2022-10-26T06:46:47Z

If the learning stops so fast, maybe our learning rate is too high.
What do you think about 10^-3 in the first 50 iterations and 10^-4 in the other 50?

Also, policy accuracy of 0.3 is not terriable when you have 16 options, but i suspect the network is very limitied. Would you like to try the ConvNetV1 instead?

And another point about the number of training entries: if latest_data_entries=1000 this basiclly says you only learn from the last iteration data, i really think we should either increase latest_data_entries and iteration_data_entries or decrease games_num to ~10 and do 1000 iterations

barakugav · 2022-10-26T06:48:06Z

Nevertheless, this is super exciting! cant wait to play against it

barakugav added the priority-medium label Aug 18, 2022

poja added priority-high and removed priority-medium labels Oct 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Solve Hex! and figure out hyperparams #29

Solve Hex! and figure out hyperparams #29

barakugav commented Aug 18, 2022

poja commented Oct 25, 2022 •

edited

Loading

barakugav commented Oct 25, 2022

poja commented Oct 26, 2022 •

edited

Loading

barakugav commented Oct 26, 2022

barakugav commented Oct 26, 2022

Solve Hex! and figure out hyperparams #29

Solve Hex! and figure out hyperparams #29

Comments

barakugav commented Aug 18, 2022

poja commented Oct 25, 2022 • edited Loading

barakugav commented Oct 25, 2022

poja commented Oct 26, 2022 • edited Loading

barakugav commented Oct 26, 2022

barakugav commented Oct 26, 2022

poja commented Oct 25, 2022 •

edited

Loading

poja commented Oct 26, 2022 •

edited

Loading