covers the most important cases from the user perspective:
It embeds the default net in the binary, so a download of that binary will result
in a working engine with the default net. The engine will be functional in the default mode
without any additional user action.
It allows non-default nets to be used, which will be looked for in up to
three directories (working directory, location of the binary, and optionally a specific default directory).
This mechanism is also kept for those developers that use MSVC,
the one compiler that doesn't have an easy mechanism for embedding data.
It is possible to disable embedding, and instead specify a specific directory, e.g. linux distros might want to use
CXXFLAGS="-DNNUE_EMBEDDING_OFF -DDEFAULT_NNUE_DIRECTORY=/usr/share/games/stockfish/" make -j ARCH=x86-64 profile-build
passed STC non-regression:
https://tests.stockfishchess.org/tests/view/5f4a581c150f0aef5f8ae03a
LLR: 2.95 (-2.94,2.94) {-1.25,-0.25}
Total: 66928 W: 7202 L: 7147 D: 52579
Ptnml(0-2): 291, 5309, 22211, 5360, 293
closes https://github.com/official-stockfish/Stockfish/pull/3070
fixes https://github.com/official-stockfish/Stockfish/issues/3030
No functional change.
This patch removes the EvalList structure from the Position object and generally simplifies the interface between do_move() and the NNUE code.
The NNUE evaluation function first calculates the "accumulator". The accumulator consists of two halves: one for white's perspective, one for black's perspective.
If the "friendly king" has moved or the accumulator for the parent position is not available, the accumulator for this half has to be calculated from scratch. To do this, the NNUE node needs to know the positions and types of all non-king pieces and the position of the friendly king. This information can easily be obtained from the Position object.
If the "friendly king" has not moved, its half of the accumulator can be calculated by incrementally updating the accumulator for the previous position. For this, the NNUE code needs to know which pieces have been added to which squares and which pieces have been removed from which squares. In principle this information can be derived from the Position object and StateInfo struct (in the same way as undo_move() does this). However, it is probably a bit faster to prepare this information in do_move(), so I have kept the DirtyPiece struct. Since the DirtyPiece struct now stores the squares rather than "PieceSquare" indices, there are now at most three "dirty pieces" (previously two). A promotion move that captures a piece removes the capturing pawn and the captured piece from the board (to SQ_NONE) and moves the promoted piece to the promotion square (from SQ_NONE).
An STC test has confirmed a small speedup:
https://tests.stockfishchess.org/tests/view/5f43f06b5089a564a10d850a
LLR: 2.94 (-2.94,2.94) {-0.25,1.25}
Total: 87704 W: 9763 L: 9500 D: 68441
Ptnml(0-2): 426, 6950, 28845, 7197, 434
closes https://github.com/official-stockfish/Stockfish/pull/3068
No functional change
due to downclocking on current chips (tested up to cascade lake)
supporting avx512 and vnni512, it is better to use avx2 or vnni256
in multithreaded (in particular hyperthreaded) engine use.
In single threaded use, the picture is different.
gcc compilation for vnni256 requires a toolchain for gcc >= 9.
closes https://github.com/official-stockfish/Stockfish/pull/3038
No functional change
Clang-10.0.0 poses as gcc-4.2:
$ clang++ -E -dM - </dev/null | grep GNUC
This means that Clang is using the workaround for the alignment bug of gcc-8
even though it does not have the bug (as far as I know).
This patch should speed up AVX2 and AVX512 compiles on Windows (when using Clang),
because it disables (for Clang) the gcc workaround we had introduced in this commit:
875183b310
closes https://github.com/official-stockfish/Stockfish/pull/3050
No functional change.
This patch fixes the byte order when reading 16- and 32-bit values from the network file on a big-endian machine.
Bytes are ordered in read_le() using unsigned arithmetic, which doesn't need tricks to determine the endianness of the machine. Unfortunately the compiler doesn't seem to be able to optimise the ordering operation, but reading in the weights is not a time-critical operation and the extra time it takes should not be noticeable.
Big endian systems are still untested with NNUE.
fixes#3007
closes https://github.com/official-stockfish/Stockfish/pull/3009
No functional change.
Adds support for Vector Neural Network Instructions (avx512), as available on Intel Cascade Lake
The _mm512_dpbusd_epi32() intrinsic (vpdpbusd instruction) is taylor made for NNUE.
on a cascade lake CPU (AWS C5.24x.large, gcc 10) NNUE eval is at roughly 78% nps of classical
(single core test)
bench 1024 1 24 default depth:
target classical NNUE ratio
vnni 2207232 1725987 78.20
avx512 2216789 1671734 75.41
avx2 2194006 1611263 73.44
modern 2185001 1352469 61.90
closes https://github.com/official-stockfish/Stockfish/pull/2987
No functional change
this workaround is possibly rather a windows & gcc specific problem. See e.g.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412#c25
on Linux with gcc 8 this patch brings roughly a 8% speedup.
However, probably needs some testing in the wild.
includes a workaround for an old msys make (3.81) installation (fixes#2984)
No functional change
This patch allows old x86 CPUs, from AMD K8 (which the x86-64 baseline
targets) all the way down to the Pentium MMX, to benefit from NNUE with
comparable performance hit versus hand-written eval as on more modern
processors.
NPS of the bench with NNUE enabled on a Pentium III 1.13 GHz (using the
MMX code):
master: 38951
this patch: 80586
NPS of the bench with NNUE enabled using baseline x86-64 arch, which is
how linux distros are likely to package stockfish, on a modern CPU
(using the SSE2 code):
master: 882584
this patch: 1203945
closes https://github.com/official-stockfish/Stockfish/pull/2956
No functional change.
despite usage of alignas, the generated (avx2/avx512) code with older compilers needs to use
unaligned loads with older gcc (e.g. confirmed crash with gcc 7.3/mingw on abrok).
Better performance thus requires gcc >= 9 on hardware supporting avx2/avx512
closes https://github.com/official-stockfish/Stockfish/pull/2969
No functional change
This patch ports the efficiently updatable neural network (NNUE) evaluation to Stockfish.
Both the NNUE and the classical evaluations are available, and can be used to
assign a value to a position that is later used in alpha-beta (PVS) search to find the
best move. The classical evaluation computes this value as a function of various chess
concepts, handcrafted by experts, tested and tuned using fishtest. The NNUE evaluation
computes this value with a neural network based on basic inputs. The network is optimized
and trained on the evalutions of millions of positions at moderate search depth.
The NNUE evaluation was first introduced in shogi, and ported to Stockfish afterward.
It can be evaluated efficiently on CPUs, and exploits the fact that only parts
of the neural network need to be updated after a typical chess move.
[The nodchip repository](https://github.com/nodchip/Stockfish) provides additional
tools to train and develop the NNUE networks.
This patch is the result of contributions of various authors, from various communities,
including: nodchip, ynasu87, yaneurao (initial port and NNUE authors), domschl, FireFather,
rqs, xXH4CKST3RXx, tttak, zz4032, joergoster, mstembera, nguyenpham, erbsenzaehler,
dorzechowski, and vondele.
This new evaluation needed various changes to fishtest and the corresponding infrastructure,
for which tomtor, ppigazzini, noobpwnftw, daylen, and vondele are gratefully acknowledged.
The first networks have been provided by gekkehenker and sergiovieri, with the latter
net (nn-97f742aaefcd.nnue) being the current default.
The evaluation function can be selected at run time with the `Use NNUE` (true/false) UCI option,
provided the `EvalFile` option points the the network file (depending on the GUI, with full path).
The performance of the NNUE evaluation relative to the classical evaluation depends somewhat on
the hardware, and is expected to improve quickly, but is currently on > 80 Elo on fishtest:
60000 @ 10+0.1 th 1
https://tests.stockfishchess.org/tests/view/5f28fe6ea5abc164f05e4c4c
ELO: 92.77 +-2.1 (95%) LOS: 100.0%
Total: 60000 W: 24193 L: 8543 D: 27264
Ptnml(0-2): 609, 3850, 9708, 10948, 4885
40000 @ 20+0.2 th 8
https://tests.stockfishchess.org/tests/view/5f290229a5abc164f05e4c58
ELO: 89.47 +-2.0 (95%) LOS: 100.0%
Total: 40000 W: 12756 L: 2677 D: 24567
Ptnml(0-2): 74, 1583, 8550, 7776, 2017
At the same time, the impact on the classical evaluation remains minimal, causing no significant
regression:
sprt @ 10+0.1 th 1
https://tests.stockfishchess.org/tests/view/5f2906a2a5abc164f05e4c5b
LLR: 2.94 (-2.94,2.94) {-6.00,-4.00}
Total: 34936 W: 6502 L: 6825 D: 21609
Ptnml(0-2): 571, 4082, 8434, 3861, 520
sprt @ 60+0.6 th 1
https://tests.stockfishchess.org/tests/view/5f2906cfa5abc164f05e4c5d
LLR: 2.93 (-2.94,2.94) {-6.00,-4.00}
Total: 10088 W: 1232 L: 1265 D: 7591
Ptnml(0-2): 49, 914, 3170, 843, 68
The needed networks can be found at https://tests.stockfishchess.org/nns
It is recommended to use the default one as indicated by the `EvalFile` UCI option.
Guidelines for testing new nets can be found at
https://github.com/glinscott/fishtest/wiki/Creating-my-first-test#nnue-net-tests
Integration has been discussed in various issues:
https://github.com/official-stockfish/Stockfish/issues/2823https://github.com/official-stockfish/Stockfish/issues/2728
The integration branch will be closed after the merge:
https://github.com/official-stockfish/Stockfish/pull/2825https://github.com/official-stockfish/Stockfish/tree/nnue-player-wip
closes https://github.com/official-stockfish/Stockfish/pull/2912
This will be an exciting time for computer chess, looking forward to seeing the evolution of
this approach.
Bench: 4746616