fixes minor merge conflicts, and a first quick testing looks OK:
4mpix4th vs 4th at 30+0.3s:
Score of cluster vs master: 3 - 0 - 37 [0.537] 40
Elo difference: 26.1 +/- 28.5, LOS: 95.8 %, DrawRatio: 92.5 %
No functional change.
add new ARCH targets
x86-32-sse41-popcnt > x86 32-bit with sse41 and popcnt support
x86-32-sse2 > x86 32-bit with sse2 support
x86-32 > x86 32-bit generic (with mmx and sse support)
retire x86-32-old (use general-32)
closes https://github.com/official-stockfish/Stockfish/pull/3022
No functional change.
The easiest way to use the NDK in conjunction with this Makefile (tested on linux-x86_64):
1. Download the latest NDK (r21d) from Google from https://developer.android.com/ndk/downloads
2. Place and unzip the NDK in $HOME/ndk folder
3. Export the path variable e.g., `export PATH=$PATH:$HOME/ndk/android-ndk-r21d/toolchains/llvm/prebuilt/linux-x86_64/bin`
4. cd to your Stockfish/src dir
5. Issue `make -j ARCH=armv8 COMP=ndk build` (use `ARCH=armv7` or `ARCH=armv7-neon` for older CPUs)
6. Optionally `make -j ARCH=armv8 COMP=ndk strip`
7. That's all. Enjoy!
Improves support from Raspberry Pi (incomplete?) and compiling on arm in general
closes https://github.com/official-stockfish/Stockfish/pull/3015
fixes https://github.com/official-stockfish/Stockfish/issues/2860
fixes https://github.com/official-stockfish/Stockfish/issues/2641
Support is still fragile as we're missing CI on these targets. Nevertheless tested with:
```bash
# build crosses from ubuntu 20.04 on x86 to various arch/OS combos
# tested with suitable packages installed
# (build-essentials, mingw-w64, g++-arm-linux-gnueabihf, NDK (r21d) from google)
# cross to Android
export PATH=$HOME/ndk/android-ndk-r21d/toolchains/llvm/prebuilt/linux-x86_64/bin:$PATH
make clean && make -j build ARCH=armv7 COMP=ndk && make -j build ARCH=armv7 COMP=ndk strip
make clean && make -j build ARCH=armv7-neon COMP=ndk && make -j build ARCH=armv7-neon COMP=ndk strip
make clean && make -j build ARCH=armv8 COMP=ndk && make -j build ARCH=armv8 COMP=ndk strip
# cross to Raspberry Pi
make clean && make -j build ARCH=armv7 COMP=gcc COMPILER=arm-linux-gnueabihf-g++
make clean && make -j build ARCH=armv7-neon COMP=gcc COMPILER=arm-linux-gnueabihf-g++
# cross to Windows
make clean && make -j build ARCH=x86-64-modern COMP=mingw
```
No functional change
This patch fixes the byte order when reading 16- and 32-bit values from the network file on a big-endian machine.
Bytes are ordered in read_le() using unsigned arithmetic, which doesn't need tricks to determine the endianness of the machine. Unfortunately the compiler doesn't seem to be able to optimise the ordering operation, but reading in the weights is not a time-critical operation and the extra time it takes should not be noticeable.
Big endian systems are still untested with NNUE.
fixes#3007
closes https://github.com/official-stockfish/Stockfish/pull/3009
No functional change.
Increases the use of NNUE evaluation in positions without captures/pawn moves,
by increasing the NNUEThreshold threshold with rule50_count.
This patch will force Stockfish to use NNUE eval more and more in materially
unbalanced positions, when it seems that the classical eval is struggling to
win and only manages to shuffle. This will ask the (slower) NNUE eval to
double-check the potential fortress branches of the search tree, but only
when necessary.
passed STC:
https://tests.stockfishchess.org/tests/view/5f36f1bf11a9b1a1dbf192d8
LLR: 2.93 (-2.94,2.94) {-0.50,1.50}
Total: 51824 W: 5836 L: 5653 D: 40335
Ptnml(0-2): 264, 4356, 16512, 4493, 287
passed LTC:
https://tests.stockfishchess.org/tests/view/5f37836111a9b1a1dbf1936d
LLR: 2.93 (-2.94,2.94) {0.25,1.75}
Total: 29768 W: 1747 L: 1590 D: 26431
Ptnml(0-2): 33, 1347, 11977, 1484, 43
closes https://github.com/official-stockfish/Stockfish/pull/3011
Bench: 4173967
Do not show the details of the default architecture for a simple "make help"
invocation, as the details are most likely to confuse beginners. Instead we
make it clear which architecture is the default and put an example at the end
of the Makefile as an incentative to use "make help ARCH=blah" to discover
the flags used by the different architectures.
```
make help
make help ARCH=x86-64-ssse3
```
Also clean-up and modernize a bit the Makefile examples while at it.
closes https://github.com/official-stockfish/Stockfish/pull/2996
No functional change
Adds support for Vector Neural Network Instructions (avx512), as available on Intel Cascade Lake
The _mm512_dpbusd_epi32() intrinsic (vpdpbusd instruction) is taylor made for NNUE.
on a cascade lake CPU (AWS C5.24x.large, gcc 10) NNUE eval is at roughly 78% nps of classical
(single core test)
bench 1024 1 24 default depth:
target classical NNUE ratio
vnni 2207232 1725987 78.20
avx512 2216789 1671734 75.41
avx2 2194006 1611263 73.44
modern 2185001 1352469 61.90
closes https://github.com/official-stockfish/Stockfish/pull/2987
No functional change
fails to build on that target, because of missing Intel Intrinsics.
macOS has posix_memalign() since ~2014 so we can simplify the code and just use that for all Apple platforms.
closes https://github.com/official-stockfish/Stockfish/pull/2985
No functional change.
this workaround is possibly rather a windows & gcc specific problem. See e.g.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412#c25
on Linux with gcc 8 this patch brings roughly a 8% speedup.
However, probably needs some testing in the wild.
includes a workaround for an old msys make (3.81) installation (fixes#2984)
No functional change
Change condition from three friendly pieces to two. This now means that we only extend castling on the king side if there are no other friendly pieces aside from king and rook. For the queen side, we only extend if there is only a rook and another friendly piece or if there is only a single rook and no other friendly piece but this is very rare.
STC:
LLR: 3.20 (-2.94,2.94) {-0.50,1.50}
Total: 31144 W: 4086 L: 3903 D: 23155
Ptnml(0-2): 227, 2843, 9278, 2968, 256
https://tests.stockfishchess.org/tests/view/5f31487f9081672066537516
LTC:
LLR: 2.93 (-2.94,2.94) {0.25,1.75}
Total: 57816 W: 3786 L: 3538 D: 50492
Ptnml(0-2): 92, 2991, 22488, 3251, 86
https://tests.stockfishchess.org/tests/view/5f3167c3908167206653753d
closes https://github.com/official-stockfish/Stockfish/pull/2980
Bench: 4244812
Joint work gvreuls / vondele
* Download the default NNUE net in AppVeyor
* Download net in travis CI `make net`
* Adjust tests to cover more archs, speedup instrumented testing
* Introduce 'mixed' bench as default, with further options:
classical, NNUE, mixed.
mixed (default) and NNUE require the default net to be present,
which can be obtained with
```
make net
```
Further examples (first is equivalent to `./stockfish bench`):
```
./stockfish bench 16 1 13 default depth mixed
./stockfish bench 16 1 13 default depth classical
./stockfish bench 16 1 13 default depth NNUE
```
The net is now downloaded automatically if needed for `profile-build`
(usual `build` works fine without net present)
PGO gives a nice speedup on fishtest:
passed STC:
LLR: 2.93 (-2.94,2.94) {-0.50,1.50}
Total: 3360 W: 469 L: 343 D: 2548
Ptnml(0-2): 20, 246, 1030, 356, 28
https://tests.stockfishchess.org/tests/view/5f31b5499081672066537569
passed LTC:
LLR: 2.97 (-2.94,2.94) {0.25,1.75}
Total: 8824 W: 609 L: 502 D: 7713
Ptnml(0-2): 8, 430, 3438, 519, 17
https://tests.stockfishchess.org/tests/view/5f31c87b908167206653757c
closes https://github.com/official-stockfish/Stockfish/pull/2931
fixes https://github.com/official-stockfish/Stockfish/issues/2907
requires fishtest updates before commit
Bench: 4290577
This patch allows old x86 CPUs, from AMD K8 (which the x86-64 baseline
targets) all the way down to the Pentium MMX, to benefit from NNUE with
comparable performance hit versus hand-written eval as on more modern
processors.
NPS of the bench with NNUE enabled on a Pentium III 1.13 GHz (using the
MMX code):
master: 38951
this patch: 80586
NPS of the bench with NNUE enabled using baseline x86-64 arch, which is
how linux distros are likely to package stockfish, on a modern CPU
(using the SSE2 code):
master: 882584
this patch: 1203945
closes https://github.com/official-stockfish/Stockfish/pull/2956
No functional change.
Makefile targets x86-64-sse42, x86-sse3 are removed; x86-64-sse41
is renamed to x86-64-sse41-popcnt (it did enable popcnt).
Makefile variables sse3, sse42, their associated compilation flags
and code in misc.cpp are removed.
closes https://github.com/official-stockfish/Stockfish/pull/2922
No functional change
despite usage of alignas, the generated (avx2/avx512) code with older compilers needs to use
unaligned loads with older gcc (e.g. confirmed crash with gcc 7.3/mingw on abrok).
Better performance thus requires gcc >= 9 on hardware supporting avx2/avx512
closes https://github.com/official-stockfish/Stockfish/pull/2969
No functional change
The idea of this patch is that positions are usually more complex and hard to evaluate even if there are more pawns.
This patch adjusts NNUE threshold usage depending on number of pawns in position, if pawn count is <3 we use the
classical evaluation more often, for pawn count = 3 patch the is non-functional,
with pawn count > 3 NNUE evaluation is used more often.
passed STC
https://tests.stockfishchess.org/tests/view/5f2f02d09081672066536b1f
LLR: 2.96 (-2.94,2.94) {-0.50,1.50}
Total: 36520 W: 5011 L: 4823 D: 26686
Ptnml(0-2): 299, 3482, 10548, 3594, 337
passed LTC
https://tests.stockfishchess.org/tests/view/5f2f4c329081672066536b5c
LLR: 2.98 (-2.94,2.94) {0.25,1.75}
Total: 39272 W: 2630 L: 2433 D: 34209
Ptnml(0-2): 53, 2066, 15218, 2229, 70
closes https://github.com/official-stockfish/Stockfish/pull/2960
bench 4084753
This patch tries to run multiple LTO threads in parallel, speeding up
the build process of optimized builds if the -j make parameter is used.
This mitigates the longer linking times of optimized builds since the
integration of the NNUE code. Roughly 2x build speedup.
I've tried a similar patch some two years ago but it ran into trouble
with old compiler versions then. Since we're on the C++17 standard now
these old compilers should be obsolete.
closes https://github.com/official-stockfish/Stockfish/pull/2943
No functional change.