-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data format? #26
Comments
Regarding the move number, I can't speak to the intention, but for what it's worth, the AlphaZero paper lists both the move number and the "no progress counter" as inputs (although I can't imagine why: the only way the move number plays any role is in tournament time controls, and times aren't inputs anyway). In addition to castling availability (4 bits), I think we're missing en passant availability (16 bits, or just 8 bits with the color-flipping dimensionality reduction). In theory, en passant availability is derivative of the previous board state, but I suspect it's better to have it as a direct input. Lastly, I've been trying to understand the motivation for using previous states as input. It seems like a huge additional cost for a dubious benefit. Let me make a strawman suggestion: |
I don't know what goes on in NNs between the first and last layer (and I think very few people do...) so it's good to try everything. Also: I like to keep in mind how many weights are in each layer. I was surprised the first time I actually counted and found out 90% of them are in the last layer! Empirically I guess this means it rarely hurts to add more inputs because there aren't too many weights up front anyway. |
Yes, I think the main idea is drawing by 3-move repetition--though this too is strange, because each frame in the 8-state history also includes how many times that position has repeated. I sympathize with the point. |
Could someone write a quick documentation of the input planes?
Here's what I think it is:
The last 8 board positions. each one 8x8x12
Current state, also 8x8x12
Side to move, 8x8 constant
Move number, 8x8 constant
I think the move number is typo-d, it's using the halfmove clock instead of 50-move rule counter (which I assume is the intention).
We also theoretically don't need the side to move because we can flip the board and invert the colors, so the side to move is always on the bottom and has king on the right. Alternatively we can augment the dataset x2 by applying this transformation, but I think the dimensionality reduction with x2 learning rate is at least equivalent (and probably better). (It doesn't work for Go because of the 7.5 komi rule)
I think we're also missing castling.
Another idea: shuffle training data to avoid overfitting to one game
How is the policy vector represented?
The text was updated successfully, but these errors were encountered: