-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Advanced topics
Both approaches assign a value to a position that is used in alpha-beta (PVS) search to find the best move. The classical evaluation computes this value as a function of various chess concepts, handcrafted by experts, tested and tuned using fishtest. The NNUE evaluation computes this value with a neural network based on basic inputs (e.g. piece positions only). The network is optimized and trained on the evaluations of millions of positions at moderate search depth.
The NNUE evaluation was first introduced in shogi, and ported to Stockfish afterward. It can be evaluated efficiently on CPUs, and exploits the fact that only parts of the neural network need to be updated after a typical chess move. The nodchip repository provided the first version of the needed tools to train and develop the NNUE networks. Today, more advanced training tools are available in the nnue-pytorch repository, while data generation tools are available in a dedicated branch.
On CPUs supporting modern vector instructions (avx2 and similar), the NNUE evaluation results in much stronger playing strength, even if the nodes per second computed by the engine is somewhat lower (roughly 50% of nps is typical).
Notes:
-
the NNUE evaluation depends on the Stockfish binary and the network parameter file (see the EvalFile UCI option). Not every parameter file is compatible with a given Stockfish binary, but the default value of the EvalFile UCI option is the name of a network that is guaranteed to be compatible with that binary.
-
to use the NNUE evaluation, the additional data file with neural network parameters needs to be available. Normally, this file is already embedded in the binary or it can be downloaded. The filename for the default (recommended) net can be found as the default value of the
EvalFile
UCI option, with the formatnn-[SHA256 first 12 digits].nnue
(for instance,nn-c157e0a5755b.nnue
). This file can be downloaded fromhttps://tests.stockfishchess.org/api/nn/[filename]
replacing[filename]
as needed.
If the engine is searching a position that is not in the tablebases (e.g. a position with 8 pieces), it will access the tablebases (TB) during the search. If the engine reports a very large score (near 200.00), this means it has found a winning line into a tablebase position. For general game play, TB bring only a limited increase in strength (see our Elo measurements).
If the engine is given a position to search that is in the tablebases, it will use the tablebases at the beginning of the search to preselect all good moves, i.e. all moves that preserve the win or preserve the draw while taking into account the 50-move rule. It will then perform a search only on those moves. The engine will not move immediately, unless there is only a single good move. The engine will search for the shortest possible mate, even though the position is known to be won.
It is therefore clear that this behaviour is not identical to what one might be used to with Nalimov tablebases. There are technical reasons for this difference, the main technical reason being that Nalimov tablebases use the DTM metric (distance-to-mate), while the Syzygy tablebases use a variation of the DTZ metric (distance-to-zero, zero meaning any move that resets the 50-move counter). This special metric is one of the reasons that the Syzygy tablebases are more compact than Nalimov tablebases, while still storing all information needed for optimal play and in addition being able to take into account the 50-move rule.
Note that the use of TB 7 is possible with SF, it assumes that about 17TB of storage is available. Furthermore, the system must allow a single process to open all 1511 files, which sometimes requires increasing the default limit (e.g. ulimit -n
).
Stockfish supports large pages on Linux and Windows. Large pages make the hash access more efficient, improving the engine speed, especially on large hash sizes.
The support is automatic, Stockfish attempts to use large pages when available and will fall back to regular memory allocation when this is not the case.
Typical increases are 5-10% in terms of nodes per second, but speed increases up to 30% have been measured.
Large page support on Linux is obtained by the Linux kernel transparent huge pages functionality. Typically, transparent huge pages are already enabled, and no configuration is needed.
The use of large pages requires "Lock Pages in Memory" privilege. See Enable the Lock Pages in Memory Option (Windows) on how to enable this privilege, then run RAMMap to double-check that large pages are used.
We suggest that you reboot your computer after you have enabled large pages, because long Windows sessions suffer from memory fragmentation, which may prevent Stockfish from getting large pages: a fresh session is better in this regard.
The "speed of Stockfish" is the number of nodes (positions) Stockfish can search per second. Nodes per second (nps) is a useful benchmark number as the same version of Stockfish playing will play stronger with larger nps. Different versions of Stockfish will play at different nps, for example, if the NNUE network architecture changes, but in this case the nps difference is not related to the strength difference.
Notes:
- Stop all other applications when measuring the speedup of Stockfish
- Run at least 20 default benches (depth 13) for each build of Stockfish to have accurate measures
- A speedup of 0.3% could be meaningless (i.e. within the measurement noise)
To measure the speed of several builds of Stockfish, use one of these applications:
-
All OS:
-
bash script
bench_parallel.sh
(runbash bench_parallel.sh
for the help):Click to view
#!/bin/bash _bench () { ${1} << EOF > /dev/null 2>> ${2} bench 16 1 ${depth} default depth EOF } # _bench function customization example # setoption name SyzygyPath value C:\table_bases\wdl345;C:\table_bases\dtz345 # bench 128 4 ${depth} default depth if [[ ${#} -ne 4 ]]; then cat << EOF usage: ${0} ./stockfish_base ./stockfish_test depth n_runs fast bench: ${0} ./stockfish_base ./stockfish_test 13 10 slow bench: ${0} ./stockfish_base ./stockfish_test 20 10 EOF exit 1 fi sf_base=${1} sf_test=${2} depth=${3} n_runs=${4} # preload of CPU/cache/memory printf "preload CPU" (_bench ${sf_base} sf_base0.txt)& (_bench ${sf_test} sf_test0.txt)& wait # temporary files initialization : > sf_base0.txt : > sf_test0.txt : > sf_temp0.txt # bench loop: SMP bench with background subshells for ((k=1; k<=${n_runs}; k++)); do printf "\rrun %3d /%3d" ${k} ${n_runs} # swap the execution order to avoid bias if [ $((k%2)) -eq 0 ]; then (_bench ${sf_base} sf_base0.txt)& (_bench ${sf_test} sf_test0.txt)& wait else (_bench ${sf_test} sf_test0.txt)& (_bench ${sf_base} sf_base0.txt)& wait fi done # text processing to extract nps values cat sf_base0.txt | grep second | grep -Eo '[0-9]{1,}' > sf_base1.txt cat sf_test0.txt | grep second | grep -Eo '[0-9]{1,}' > sf_test1.txt for ((k=1; k<=${n_runs}; k++)); do echo ${k} >> sf_temp0.txt done printf "\rrun sf_base sf_test diff\n" paste sf_temp0.txt sf_base1.txt sf_test1.txt | awk '{printf "%3d %8d %8d %8+d\n", $1, $2, $3, $3-$2}' #paste sf_temp0.txt sf_base1.txt sf_test1.txt | awk '{printf "%3d\t%8d\t%8d\t%7+d\n", $1, $2, $3, $3-$2}' paste sf_base1.txt sf_test1.txt | awk '{printf "%d\t%d\t%d\n", $1, $2, $2-$1}' > sf_temp0.txt # compute: sample mean, 1.96 * std of sample mean (95% of samples), speedup # std of sample mean = sqrt(NR/(NR-1)) * (std population) / sqrt(NR) cat sf_temp0.txt | awk '{sum1 += $1 ; sumq1 += $1**2 ;sum2 += $2 ; sumq2 += $2**2 ;sum3 += $3 ; sumq3 += $3**2 } END {printf "\nsf_base = %8d +/- %6d (95%)\nsf_test = %8d +/- %6d (95%)\ndiff = %8d +/- %6d (95%)\nspeedup = %.5f% +/- %.3f% (95%)\n\n", sum1/NR , 1.96 * sqrt(sumq1/NR - (sum1/NR)**2)/sqrt(NR-1) , sum2/NR , 1.96 * sqrt(sumq2/NR - (sum2/NR)**2)/sqrt(NR-1) , sum3/NR , 1.96 * sqrt(sumq3/NR - (sum3/NR)**2)/sqrt(NR-1) , 100*(sum2 - sum1)/sum1 , 100 * (1.96 * sqrt(sumq3/NR - (sum3/NR)**2)/sqrt(NR-1)) / (sum1/NR) }' # remove temporary files rm -f sf_base0.txt sf_test0.txt sf_temp0.txt sf_base1.txt sf_test1.txt
-
Windows only:
- FishBench: Latest release Fishbench v6.0
- Buildtester: Latest release Buildtester 1.4.7.0
There is a branch developed with a MPI cluster implementation of Stockfish, allowing stockfish to run on clusters of compute nodes connected with a high-speed network. See https://github.com/official-stockfish/Stockfish/pull/1571 for some discussion of the initial implementation and https://github.com/official-stockfish/Stockfish/pull/1931 for some early performance results.
Feedback on this branch is welcome! Here are some git commands for people interested to test this MPI/Cluster idea:
-
If you don't have the cluster branch yet on your local git repository, you can download the latest state of the
official-stockfish/cluster
branch (See also https://github.com/official-stockfish/Stockfish/tree/cluster), then switch to it with the following commands:git fetch official cluster:cluster git checkout -f cluster
-
After switching to the cluster branch as above, see the README.md for detailed instructions on how to compile and run the branch. TL;DR:
make clean make -j ARCH=x86-64-avx2 COMPILER=mpic++ build mpirun -np 4 ./stockfish bench