covers the most important cases from the user perspective:
It embeds the default net in the binary, so a download of that binary will result
in a working engine with the default net. The engine will be functional in the default mode
without any additional user action.
It allows non-default nets to be used, which will be looked for in up to
three directories (working directory, location of the binary, and optionally a specific default directory).
This mechanism is also kept for those developers that use MSVC,
the one compiler that doesn't have an easy mechanism for embedding data.
It is possible to disable embedding, and instead specify a specific directory, e.g. linux distros might want to use
CXXFLAGS="-DNNUE_EMBEDDING_OFF -DDEFAULT_NNUE_DIRECTORY=/usr/share/games/stockfish/" make -j ARCH=x86-64 profile-build
passed STC non-regression:
https://tests.stockfishchess.org/tests/view/5f4a581c150f0aef5f8ae03a
LLR: 2.95 (-2.94,2.94) {-1.25,-0.25}
Total: 66928 W: 7202 L: 7147 D: 52579
Ptnml(0-2): 291, 5309, 22211, 5360, 293
closes https://github.com/official-stockfish/Stockfish/pull/3070
fixes https://github.com/official-stockfish/Stockfish/issues/3030
No functional change.
Adds support for Vector Neural Network Instructions (avx512), as available on Intel Cascade Lake
The _mm512_dpbusd_epi32() intrinsic (vpdpbusd instruction) is taylor made for NNUE.
on a cascade lake CPU (AWS C5.24x.large, gcc 10) NNUE eval is at roughly 78% nps of classical
(single core test)
bench 1024 1 24 default depth:
target classical NNUE ratio
vnni 2207232 1725987 78.20
avx512 2216789 1671734 75.41
avx2 2194006 1611263 73.44
modern 2185001 1352469 61.90
closes https://github.com/official-stockfish/Stockfish/pull/2987
No functional change
fails to build on that target, because of missing Intel Intrinsics.
macOS has posix_memalign() since ~2014 so we can simplify the code and just use that for all Apple platforms.
closes https://github.com/official-stockfish/Stockfish/pull/2985
No functional change.
This patch allows old x86 CPUs, from AMD K8 (which the x86-64 baseline
targets) all the way down to the Pentium MMX, to benefit from NNUE with
comparable performance hit versus hand-written eval as on more modern
processors.
NPS of the bench with NNUE enabled on a Pentium III 1.13 GHz (using the
MMX code):
master: 38951
this patch: 80586
NPS of the bench with NNUE enabled using baseline x86-64 arch, which is
how linux distros are likely to package stockfish, on a modern CPU
(using the SSE2 code):
master: 882584
this patch: 1203945
closes https://github.com/official-stockfish/Stockfish/pull/2956
No functional change.
Makefile targets x86-64-sse42, x86-sse3 are removed; x86-64-sse41
is renamed to x86-64-sse41-popcnt (it did enable popcnt).
Makefile variables sse3, sse42, their associated compilation flags
and code in misc.cpp are removed.
closes https://github.com/official-stockfish/Stockfish/pull/2922
No functional change
This patch ports the efficiently updatable neural network (NNUE) evaluation to Stockfish.
Both the NNUE and the classical evaluations are available, and can be used to
assign a value to a position that is later used in alpha-beta (PVS) search to find the
best move. The classical evaluation computes this value as a function of various chess
concepts, handcrafted by experts, tested and tuned using fishtest. The NNUE evaluation
computes this value with a neural network based on basic inputs. The network is optimized
and trained on the evalutions of millions of positions at moderate search depth.
The NNUE evaluation was first introduced in shogi, and ported to Stockfish afterward.
It can be evaluated efficiently on CPUs, and exploits the fact that only parts
of the neural network need to be updated after a typical chess move.
[The nodchip repository](https://github.com/nodchip/Stockfish) provides additional
tools to train and develop the NNUE networks.
This patch is the result of contributions of various authors, from various communities,
including: nodchip, ynasu87, yaneurao (initial port and NNUE authors), domschl, FireFather,
rqs, xXH4CKST3RXx, tttak, zz4032, joergoster, mstembera, nguyenpham, erbsenzaehler,
dorzechowski, and vondele.
This new evaluation needed various changes to fishtest and the corresponding infrastructure,
for which tomtor, ppigazzini, noobpwnftw, daylen, and vondele are gratefully acknowledged.
The first networks have been provided by gekkehenker and sergiovieri, with the latter
net (nn-97f742aaefcd.nnue) being the current default.
The evaluation function can be selected at run time with the `Use NNUE` (true/false) UCI option,
provided the `EvalFile` option points the the network file (depending on the GUI, with full path).
The performance of the NNUE evaluation relative to the classical evaluation depends somewhat on
the hardware, and is expected to improve quickly, but is currently on > 80 Elo on fishtest:
60000 @ 10+0.1 th 1
https://tests.stockfishchess.org/tests/view/5f28fe6ea5abc164f05e4c4c
ELO: 92.77 +-2.1 (95%) LOS: 100.0%
Total: 60000 W: 24193 L: 8543 D: 27264
Ptnml(0-2): 609, 3850, 9708, 10948, 4885
40000 @ 20+0.2 th 8
https://tests.stockfishchess.org/tests/view/5f290229a5abc164f05e4c58
ELO: 89.47 +-2.0 (95%) LOS: 100.0%
Total: 40000 W: 12756 L: 2677 D: 24567
Ptnml(0-2): 74, 1583, 8550, 7776, 2017
At the same time, the impact on the classical evaluation remains minimal, causing no significant
regression:
sprt @ 10+0.1 th 1
https://tests.stockfishchess.org/tests/view/5f2906a2a5abc164f05e4c5b
LLR: 2.94 (-2.94,2.94) {-6.00,-4.00}
Total: 34936 W: 6502 L: 6825 D: 21609
Ptnml(0-2): 571, 4082, 8434, 3861, 520
sprt @ 60+0.6 th 1
https://tests.stockfishchess.org/tests/view/5f2906cfa5abc164f05e4c5d
LLR: 2.93 (-2.94,2.94) {-6.00,-4.00}
Total: 10088 W: 1232 L: 1265 D: 7591
Ptnml(0-2): 49, 914, 3170, 843, 68
The needed networks can be found at https://tests.stockfishchess.org/nns
It is recommended to use the default one as indicated by the `EvalFile` UCI option.
Guidelines for testing new nets can be found at
https://github.com/glinscott/fishtest/wiki/Creating-my-first-test#nnue-net-tests
Integration has been discussed in various issues:
https://github.com/official-stockfish/Stockfish/issues/2823https://github.com/official-stockfish/Stockfish/issues/2728
The integration branch will be closed after the merge:
https://github.com/official-stockfish/Stockfish/pull/2825https://github.com/official-stockfish/Stockfish/tree/nnue-player-wip
closes https://github.com/official-stockfish/Stockfish/pull/2912
This will be an exciting time for computer chess, looking forward to seeing the evolution of
this approach.
Bench: 4746616
Do not send the following info string on the first call to
aligned_ttmem_alloc() on Windows:
info string Hash table allocation: Windows large pages [not] used.
The first call occurs before the 'uci' command has been received. This
confuses some GUIs, which expect the first engine-sent command to be
'id' as the response to the 'uci' command. (see https://github.com/official-stockfish/Stockfish/issues/2681)
closes https://github.com/official-stockfish/Stockfish/pull/2689
No functional change.
for users that set the needed privilige "Lock Pages in Memory"
large pages will be automatically enabled (see Readme.md).
This expert setting might improve speed, 5% - 30%, depending
on the hardware, the number of threads and hash size. More for
large hashes, large number of threads and NUMA. If the operating
system can not allocate large pages (easier after a reboot), default
allocation is used automatically. The engine log provides details.
closes https://github.com/official-stockfish/Stockfish/pull/2656
fixes https://github.com/official-stockfish/Stockfish/issues/2619
No functional change
Align the TT allocation by 2M to make it huge page friendly and advise the
kernel to use huge pages.
Benchmarks on my i7-8700K (6C/12T) box: (3 runs per bench per config)
vanilla (nps) hugepages (nps) avg
==================================================================================
bench | 3012490 3024364 3036331 3071052 3067544 3071052 +1.5%
bench 16 12 20 | 19237932 19050166 19085315 19266346 19207025 19548758 +1.1%
bench 16384 12 20 | 18182313 18371581 18336838 19381275 19738012 19620225 +7.0%
On my box, huge pages have a significant perf impact when using a big
hash size. They also speed up TT initialization big time:
vanilla (s) huge pages (s) speed-up
=======================================================================
time stockfish bench 16384 1 1 | 5.37 1.48 3.6x
In practice, huge pages with auto-defrag may always be enabled in the
system, in which case this patch has no effect. This
depends on the values in /sys/kernel/mm/transparent_hugepage/enabled
and /sys/kernel/mm/transparent_hugepage/defrag.
closes https://github.com/official-stockfish/Stockfish/pull/2463
No functional change
Official release version of Stockfish 11.
Bench: 5156767
-----------------------
It is our pleasure to release Stockfish 11 to our fans and supporters.
Downloads are freely available at http://stockfishchess.org/download/
This version 11 of Stockfish is 50 Elo stronger than the last version, and
150 Elo stronger than the version which famously lost a match to AlphaZero
two years ago. This makes Stockfish the strongest chess engine running on
your smartphone or normal desktop PC, and we estimate that on a modern four
cores CPU, Stockfish 11 could give 1:1000 time odds to the human chess champion
having classical time control, and be on par with him. More specific data,
including nice cumulative curves for the progression of Stockfish strength
over the last seven years, can be found on [our progression page][1], at
[Stefan Pohl site][2] or at [NextChessMove][3].
In October 2019 Stockfish has regained its crown in the TCEC competition,
beating in the superfinal of season 16 an evolution of the neural-network
engine Leela that had won the previous season. This clash of style between an
alpha-beta and an neural-network engine produced spectacular chess as always,
with Stockfish [emerging victorious this time][0].
Compared to Stockfish 10, we have made hundreds of improvements to the
[codebase][4], from the evaluation function (improvements in king attacks,
middlegame/endgame transitions, and many more) to the search algorithm (some
innovative coordination methods for the searching threads, better pruning of
unsound tactical lines, etc), and fixed a couple of bugs en passant.
Our testing framework [Fishtest][5] has also seen its share of improvements
to continue propelling Stockfish forward. Along with a lot of small enhancements,
Fishtest has switched to new SPRT bounds to increase the chance of catching Elo
gainers, along with a new testing book and the use of pentanomial statistics to
be more resource-efficient.
Overall the Stockfish project is an example of open-source at its best, as
its buzzing community of programmers sharing ideas and daily reviewing their
colleagues' patches proves to be an ideal form to develop innovative ideas for
chess programming, while the mathematical accuracy of the testing framework
allows us an unparalleled level of quality control for each patch we put in
the engine. If you wish, you too can help our ongoing efforts to keep improving
it, just [get involved][6] :-)
Stockfish is also special in that every chess fan, even if not a programmer,
[can easily help][7] the team to improve the engine by connecting their PC to
Fishtest and let it play some games in the background to test new patches.
Individual contributions vary from 1 to 32 cores, but this year Bojun Guo
made it a little bit special by plugging a whole data center during the whole
year: it was a vertiginous experience to see Fishtest spikes with 17466 cores
connected playing [25600 games/minute][8]. Thanks Guo!
The Stockfish team
[0]: <http://mytcecexperience.blogspot.com/2019/10/season-16-superfinal-games-91-100.html>
[1]: <https://github.com/glinscott/fishtest/wiki/Regression-Tests>
[2]: <https://www.sp-cc.de/index.htm>
[3]: <https://nextchessmove.com/dev-builds>
[4]: <https://github.com/official-stockfish/Stockfish>
[5]: <https://tests.stockfishchess.org/tests>
[6]: <https://stockfishchess.org/get-involved/>
[7]: <https://github.com/glinscott/fishtest/wiki>
[8]: <https://groups.google.com/forum/?fromgroups=#!topic/fishcooking/lebEmG5vgng%5B1-25%5D>
This patch shows a description of the compiler used to compile Stockfish,
when starting from the console.
Usage:
```
./stockfish
compiler
```
Example of output:
```
Stockfish 120120 64 POPCNT by T. Romstad, M. Costalba, J. Kiiski, G. Linscott
Compiled by clang++ 9.0.0 on Apple
__VERSION__ macro expands to: 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.38)
```
No functional change
Stockfish crashes immediately if users enter a wrong file name (or even an existing
folder name) for debug log file. It may be hard for users to find out since it prints
nothing. If they enter the string via a chess GUI, the chess GUI may remember and
auto-send to Stockfish next time, makes Stockfish crashes all the time. Bug report by
Nguyen Hong Pham in this issue: https://github.com/official-stockfish/Stockfish/issues/2365
This patch avoids the crash and instead prefers to exit gracefully with a error
message on std:cerr, like we do with the fenFile for instance.
Closes https://github.com/official-stockfish/Stockfish/pull/2366
No functional change.
Revert the previous patch now that the binary for the super-final
of TCEC season 16 has been sent.
Maybe the feature of showing the name of compiler will be added to the
master branch in the future. But we may use a cleaner way to code it, see
some ideas using the Makefile approach at the end of pull request #2327 :
https://github.com/official-stockfish/Stockfish/pull/2327
Bench: 3618154
This patch shows a description of the compiler used to compile Stockfish,
when starting from the console.
Usage:
```
./stockfish
compiler
```
Example of output:
```
Stockfish 240919 64 POPCNT by T. Romstad, M. Costalba, J. Kiiski, G. Linscott
Compiled by clang++ 9.0.0 on Apple
__VERSION__ macro expands to: 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.38)
```
No functional change
This pull request fixes a rare crashing bug on Windows inside our NUMA code, first
reported by Dann Corbit in the following forum thread (thanks!):
https://groups.google.com/forum/?fromgroups=#!topic/fishcooking/gA6aoMEuOwg
The fix is to not use structure member beyond known size when iterating through
'SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX' structure. We note that the Microsoft
API is guaranteed to provide us at least one element upon successful, and no
element in the structure can have a zero size.
No functional change.
Official release version of Stockfish 10.
This is also the 10th anniversary version of the Stockfish project, which
started exactly ten years ago! I wish to extend a huge thank you to
all contributors and authors in our amazing community :-)
Bench: 3939338
Preparation commit for the upcoming Stockfish 10 version, giving a chance to catch last minute feature bugs and evaluation regression during the one-week code freeze period. Also changing the copyright dates to include 2019.
No functional change
This patch implements some idea by Alain Savard and Mike Whiteley taken from the perpertual renaming/reformatting thread.
This is a pure code cleaning patch (so no change in functionality), but I use it as a pretext to correct the bogus bench number that I introduced in the previous commit.
Bench: 4413383
Silences the following warnings when compiling with GCC 8.
The fix is to use an intermediate pointer to anonymous function:
```
misc.cpp: In function 'int WinProcGroup::get_group(size_t)':
misc.cpp:241:77: warning: cast between incompatible function types from 'FARPROC' {aka 'long long int (*)()'} to 'fun1_t' {aka 'bool (*)(_LOGICAL_PROCESSOR_RELATIONSHIP, _SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX*, long unsigned int*)'} [-Wcast-function-type]
auto fun1 = (fun1_t)GetProcAddress(k32, "GetLogicalProcessorInformationEx");
^
misc.cpp: In function 'void WinProcGroup::bindThisThread(size_t)':
misc.cpp:309:71: warning: cast between incompatible function types from 'FARPROC' {aka 'long long int (*)()'} to 'fun2_t' {aka 'bool (*)(short unsigned int, _GROUP_AFFINITY*)'} [-Wcast-function-type]
auto fun2 = (fun2_t)GetProcAddress(k32, "GetNumaNodeProcessorMaskEx");
^
misc.cpp:310:67: warning: cast between incompatible function types from 'FARPROC' {aka 'long long int (*)()'} to 'fun3_t' {aka 'bool (*)(void*, const _GROUP_AFFINITY*, _GROUP_AFFINITY*)'} [-Wcast-function-type]
auto fun3 = (fun3_t)GetProcAddress(k32, "SetThreadGroupAffinity");
^
```
No functional change.
The heuristic to avoid thread binding if less than 8 threads are requested resulted in the first 7 threads not being bound.
The branch was verified to yield a roughly 13% speedup by @CoffeeOne on the appropriate hardware and OS, and an earlier version of this patch tested well on his machine:
http://tests.stockfishchess.org/tests/view/5a3693480ebc590ccbb8be5a
ELO: 9.24 +-4.6 (95%) LOS: 100.0%
Total: 5000 W: 634 L: 501 D: 3865
To make sure all threads (including mainThread) are bound as soon as the total number exceeds 7, recreate all threads on a change of thread number.
To do this, unify Threads::init, Threads::exit and Threads::set are unified in a single Threads::set function that goes through the needed steps.
The code includes several suggestions from @joergoster.
Fixes issue #1312
No functional change
Some warnings after a run of:
$ clang-tidy-3.8 -checks='modernize-*' *.cpp syzygy/*.cpp -header-filter=.* -- -std=c++11
I have not fixed all suggestions, for instance I still prefer
to declare the type instead of a spread use of 'auto'. I also
perfer good old 'typedef' to the new 'using' form.
I have not fixed some warnings in the last functions of
syzygy code because those are still the original functions
and need to be completely rewritten anyhow.
Thanks to erbsenzaehler for the original idea.
No functional change.
The needed Windows API for processor groups could be missed from old Windows
versions, so instead of calling them directly (forcing the linker to resolve
the calls at compile time), try to load them at runtime. To do this we need
first to define the corresponding function pointers.
Also don't interfere with running fishtest on numa hardware with Windows.
Avoid all stockfish one-threaded processes will run on the same node
No functional change.
Under Windows it is not possible for a process to run on more than one
logical processor group. This usually means to be limited to use max 64
cores. To overcome this, some special platform specific API should be
called to set group affinity for each thread. Original code from Texel by
Peter sterlund.
Tested by Jean-Paul Vael on a Xeon E7-8890 v4 with 88 threads and confimed
speed up between 44 and 88 threads is about 30%, as expected.
No functional change.