Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data format? #26

Open
Akababa opened this issue Dec 14, 2017 · 3 comments
Open

Data format? #26

Akababa opened this issue Dec 14, 2017 · 3 comments

Comments

@Akababa
Copy link
Collaborator

Akababa commented Dec 14, 2017

Could someone write a quick documentation of the input planes?
Here's what I think it is:
The last 8 board positions. each one 8x8x12
Current state, also 8x8x12
Side to move, 8x8 constant
Move number, 8x8 constant

I think the move number is typo-d, it's using the halfmove clock instead of 50-move rule counter (which I assume is the intention).
We also theoretically don't need the side to move because we can flip the board and invert the colors, so the side to move is always on the bottom and has king on the right. Alternatively we can augment the dataset x2 by applying this transformation, but I think the dimensionality reduction with x2 learning rate is at least equivalent (and probably better). (It doesn't work for Go because of the 7.5 komi rule)

I think we're also missing castling.

Another idea: shuffle training data to avoid overfitting to one game

How is the policy vector represented?

@evalon32
Copy link

Regarding the move number, I can't speak to the intention, but for what it's worth, the AlphaZero paper lists both the move number and the "no progress counter" as inputs (although I can't imagine why: the only way the move number plays any role is in tournament time controls, and times aren't inputs anyway).

In addition to castling availability (4 bits), I think we're missing en passant availability (16 bits, or just 8 bits with the color-flipping dimensionality reduction). In theory, en passant availability is derivative of the previous board state, but I suspect it's better to have it as a direct input.

Lastly, I've been trying to understand the motivation for using previous states as input. It seems like a huge additional cost for a dubious benefit. Let me make a strawman suggestion:
Network A is trained on all the inputs that AlphaZero used.
Network B is trained on current board state + castling/en passant availability and no other inputs.
Can anyone explain what's the advantage of A over B?

@Akababa
Copy link
Collaborator Author

Akababa commented Dec 17, 2017

I don't know what goes on in NNs between the first and last layer (and I think very few people do...) so it's good to try everything.
With that being said: I think it gives the network a sort of "history heuristic" in case the inputs display some sort of temporal locality; also the last 8 positions help to encode en passant and 3-time-repetition (most of the time). I believe heuristics like this would make training faster at the beginning but the network should rule it out asymptotically (no difference in the long run). In my opinion the same goes for any feature with positive correlation but no causation.
There are a few people trying out different inputs, for example I'm using a "side to move always on bottom" representation and I think @benediamond is emulating the most recent AZ as closely as possible.

Also: I like to keep in mind how many weights are in each layer. I was surprised the first time I actually counted and found out 90% of them are in the last layer! Empirically I guess this means it rarely hurts to add more inputs because there aren't too many weights up front anyway.

@benediamond
Copy link
Collaborator

Yes, I think the main idea is drawing by 3-move repetition--though this too is strange, because each frame in the 8-state history also includes how many times that position has repeated. I sympathize with the point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants