This patch improves the codegen in the AffineTransform::forward function for architectures >=SSSE3. Current code works directly on memory and the compiler cannot see that the stores through outptr do not alias the loads through weights and input32. The solution implemented is to perform the affine transform with local variables as accumulators and only store the result to memory at the end. The number of accumulators required is OutputDimensions / OutputSimdWidth, which means that for the 1024->16 affine transform it requires 4 registers with SSSE3, 2 with AVX2, 1 with AVX512. It also cuts the number of stores required by NumRegs * 256 for each node evaluated. The local accumulators are expected to be assigned to registers, but even if this cannot be done in some case due to register pressure it will help the compiler to see that there is no aliasing between the loads and stores and may still result in better codegen.
See https://godbolt.org/z/59aTKbbYc for codegen comparison.
passed STC:
LLR: 2.94 (-2.94,2.94) <-0.50,2.50>
Total: 140328 W: 10635 L: 10358 D: 119335
Ptnml(0-2): 302, 8339, 52636, 8554, 333
closes https://github.com/official-stockfish/Stockfish/pull/3634
No functional change
combined work by Serio Vieri, Michael Byrne, and Jonathan D (aka SFisGod) based on top of previous developments, by restarts from good nets.
Sergio generated the net https://tests.stockfishchess.org/api/nn/nn-d8609abe8caf.nnue:
The initial net nn-d8609abe8caf.nnue is trained by generating around 16B of training data from the last master net nn-9e3c6298299a.nnue, then trained, continuing from the master net, with lambda=0.2 and sampling ratio of 1. Starting with LR=2e-3, dropping LR with a factor of 0.5 until it reaches LR=5e-4. in_scaling is set to 361. No other significant changes made to the pytorch trainer.
Training data gen command (generates in chunks of 200k positions):
generate_training_data min_depth 9 max_depth 11 count 200000 random_move_count 10 random_move_max_ply 80 random_multi_pv 12 random_multi_pv_diff 100 random_multi_pv_depth 8 write_min_ply 10 eval_limit 1500 book noob_3moves.epd output_file_name gendata/$(date +"%Y%m%d-%H%M")_${HOSTNAME}.binpack
PyTorch trainer command (Note that this only trains for 20 epochs, repeatedly train until convergence):
python train.py --features "HalfKAv2^" --max_epochs 20 --smart-fen-skipping --random-fen-skipping 500 --batch-size 8192 --default_root_dir $dir --seed $RANDOM --threads 4 --num-workers 32 --gpus $gpuids --track_grad_norm 2 --gradient_clip_val 0.05 --lambda 0.2 --log_every_n_steps 50 $resumeopt $data $val
See https://github.com/sergiovieri/Stockfish/tree/tools_mod/rl for the scripts used to generate data.
Based on that Michael generated nn-76a8a7ffb820.nnue in the following way:
The net being submitted was trained with the pytorch trainer: https://github.com/glinscott/nnue-pytorch
python train.py i:/bin/all.binpack i:/bin/all.binpack --gpus 1 --threads 4 --num-workers 30 --batch-size 16384 --progress_bar_refresh_rate 30 --smart-fen-skipping --random-fen-skipping 3 --features=HalfKAv2^ --auto_lr_find True --lambda=1.0 --max_epochs=240 --seed %random%%random% --default_root_dir exp/run_109 --resume-from-model ./pt/nn-d8609abe8caf.pt
This run is thus started from Segio Vieri's net nn-d8609abe8caf.nnue
all.binpack equaled 4 parts Wrong_NNUE_2.binpack https://drive.google.com/file/d/1seGNOqcVdvK_vPNq98j-zV3XPE5zWAeq/view?usp=sharing plus two parts of Training_Data.binpack https://drive.google.com/file/d/1RFkQES3DpsiJqsOtUshENtzPfFgUmEff/view?usp=sharing
Each set was concatenated together - making one large Wrong_NNUE 2 binpack and one large Training so the were approximately equal in size. They were then interleaved together. The idea was to give Wrong_NNUE.binpack closer to equal weighting with the Training_Data binpack
model.py modifications:
loss = torch.pow(torch.abs(p - q), 2.6).mean()
LR = 8.0e-5 calculated as follows: 1.5e-3*(.992^360) - the idea here was to take a highly trained net and just use all.binpack as a finishing micro refinement touch for the last 2 Elo or so. This net was discovered on the 59th epoch.
optimizer = ranger.Ranger(train_params, betas=(.90, 0.999), eps=1.0e-7, gc_loc=False, use_gc=False)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.992)
For this micro optimization, I had set the period to "5" in train.py. This changes the checkpoint output so that every 5th checkpoint file is created
The final touches were to adjust the NNUE scale, as was done by Jonathan in tests running at the same time.
passed LTC
https://tests.stockfishchess.org/tests/view/60fa45aed8a6b65b2f3a77a4
LLR: 2.94 (-2.94,2.94) <0.50,3.50>
Total: 53040 W: 1732 L: 1575 D: 49733
Ptnml(0-2): 14, 1432, 23474, 1583, 17
passed STC
https://tests.stockfishchess.org/tests/view/60f9fee2d8a6b65b2f3a7775
LLR: 2.94 (-2.94,2.94) <-0.50,2.50>
Total: 37928 W: 3178 L: 3001 D: 31749
Ptnml(0-2): 100, 2446, 13695, 2623, 100.
closes https://github.com/official-stockfish/Stockfish/pull/3626
Bench: 5169957
The main idea is that illegal moves influencing search or
qsearch obviously can't be any sort of good. The only reason
why initially legality checks for search and qsearch were done
after they actually can influence some heuristics is because
legality check is expensive computationally. Eventually in
search it was moved to the place where it makes sure that
illegal moves can't influence search.
This patch shows that the same can be done for qsearch + it
passed STC with elo-gaining bounds + it removes 3 lines of code
because one no longer needs to increment/decrement movecount
on illegal moves.
passed STC with elo-gaining bounds
https://tests.stockfishchess.org/tests/view/60f20aefd1189bed71812da0
LLR: 2.94 (-2.94,2.94) <-0.50,2.50>
Total: 61512 W: 4688 L: 4492 D: 52332
Ptnml(0-2): 139, 3730, 22848, 3874, 165
The same version functionally but with moving condition ever earlier
passed LTC with simplification bounds.
https://tests.stockfishchess.org/tests/view/60f292cad1189bed71812de9
LLR: 2.98 (-2.94,2.94) <-2.50,0.50>
Total: 60944 W: 1724 L: 1685 D: 57535
Ptnml(0-2): 11, 1556, 27298, 1597, 10
closes https://github.com/official-stockfish/Stockfish/pull/3618
bench 4709569
Not all linux users will have libatomic installed.
When using clang as the system compiler with compiler-rt as the default
runtime library instead of libgcc, atomic builtins may be provided by compiler-rt.
This change allows such users to pass RTLIB=compiler-rt to make sure
the build doesn't error out on the missing (unnecessary) libatomic.
closes https://github.com/official-stockfish/Stockfish/pull/3597
No functional change
Official release version of Stockfish 14
Bench: 4770936
---
Today, we have the pleasure to announce Stockfish 14.
As usual, downloads will be freely available at https://stockfishchess.org
The engine is now significantly stronger than just a few months ago,
and wins four times more game pairs than it loses against the previous
release version [0]. Stockfish 14 is now at least 400 Elo ahead of
Stockfish 7, a top engine in 2016 [1]. During the last five years,
Stockfish has thus gained about 80 Elo per year.
Stockfish 14 evaluates positions more accurately than Stockfish 13 as
a result of two major steps forward in defining and training the
efficiently updatable neural network (NNUE) that provides the evaluation
for positions.
First, the collaboration with the Leela Chess Zero team - announced
previously [2] - has come to fruition. The LCZero team has provided a
collection of billions of positions evaluated by Leela that we have
combined with billions of positions evaluated by Stockfish to train the
NNUE net that powers Stockfish 14. The fact that we could use and combine
these datasets freely was essential for the progress made and demonstrates
the power of open source and open data [3].
Second, the architecture of the NNUE network was significantly updated:
the new network is not only larger, but more importantly, it deals better
with large material imbalances and can specialize for multiple phases of
the game [4]. A new project, kick-started by Gary Linscott and
Tomasz Sobczyk, led to a GPU accelerated net trainer written in
pytorch.[5] This tool allows for training high-quality nets in a couple
of hours.
Finally, this release features some search refinements, minor bug
fixes and additional improvements. For example, Stockfish is now about
90 Elo stronger for chess960 (Fischer random chess) at short time control.
The Stockfish project builds on a thriving community of enthusiasts
(thanks everybody!) that contribute their expertise, time, and resources
to build a free and open-source chess engine that is robust, widely
available, and very strong. We invite our chess fans to join the fishtest
testing framework and programmers to contribute to the project on
github [6].
Stay safe and enjoy chess!
The Stockfish team
[0] https://tests.stockfishchess.org/tests/view/60dae5363beab81350aca077
[1] https://nextchessmove.com/dev-builds
[2] https://stockfishchess.org/blog/2021/stockfish-13/
[3] https://lczero.org/blog/2021/06/the-importance-of-open-data/
[4] https://github.com/official-stockfish/Stockfish/commit/e8d64af1
[5] https://github.com/glinscott/nnue-pytorch/
[6] https://stockfishchess.org/get-involved/
In the so-called "hybrid" method of evaluation of current master, we use the
classical eval (because of its speed) instead of the NNUE eval when the classical
material balance approximation hints that the position is "winning enough" to
rely on the classical eval.
This trade-off idea between speed and accuracy works well in general, but in
some fortress positions the classical eval is just bad. So in shuffling branches
of the search tree, we (slowly) increase the thresehold so that eventually we
don't trust classical anymore and switch to NNUE evaluation.
This patch increases that threshold faster, so that we switch to NNUE quicker
in shuffling branches. Idea is to incite Stockfish to spend less time in fortresses
lines in the search tree, and spend more time searching the critical lines.
passed STC:
LLR: 2.96 (-2.94,2.94) <-0.50,2.50>
Total: 47872 W: 3908 L: 3720 D: 40244
Ptnml(0-2): 122, 3053, 17419, 3199, 143
https://tests.stockfishchess.org/tests/view/60cef34b457376eb8bcab79d
passed LTC:
LLR: 2.93 (-2.94,2.94) <0.50,3.50>
Total: 73616 W: 2326 L: 2143 D: 69147
Ptnml(0-2): 21, 1940, 32705, 2119, 23
https://tests.stockfishchess.org/tests/view/60cf6d842114332881e73528
Retested at LTC against lastest master:
LLR: 2.93 (-2.94,2.94) <0.50,3.50>
Total: 18264 W: 642 L: 532 D: 17090
Ptnml(0-2): 6, 479, 8055, 583, 9
https://tests.stockfishchess.org/tests/view/60d18cd540925195e7a6c351
closes https://github.com/official-stockfish/Stockfish/pull/3578
Bench: 5139233
This patch removes the UCI option for setting Contempt in classical evaluation.
It is exactly equivalent to using Contempt=0 for the UCI contempt value and keeping
the dynamic part in the algo (renaming this dynamic part `trend` to better describe
what it does). We have tried quite hard to implement a working Contempt feature for
NNUE but nothing really worked, so it is probably time to give up.
Interested chess fans wishing to keep playing with the UCI option for Contempt and
use it with the classical eval are urged to download the version tagged "SF_Classical"
of Stockfish (dated 31 July 2020), as it was the last version where our search
algorithm was tuned for the classical eval and is probably our strongest classical
player ever: https://github.com/official-stockfish/Stockfish/tags
Passed STC:
LLR: 2.95 (-2.94,2.94) <-2.50,0.50>
Total: 72904 W: 6228 L: 6175 D: 60501
Ptnml(0-2): 221, 5006, 25971, 5007, 247
https://tests.stockfishchess.org/tests/view/60c98bf9457376eb8bcab18d
Passed LTC:
LLR: 2.93 (-2.94,2.94) <-2.50,0.50>
Total: 45168 W: 1601 L: 1547 D: 42020
Ptnml(0-2): 38, 1331, 19786, 1397, 32
https://tests.stockfishchess.org/tests/view/60c9c7fa457376eb8bcab1bb
closes https://github.com/official-stockfish/Stockfish/pull/3575
Bench: 4947716
This patch increase the weight of pawns and pieces from 28 to 32
in the scaling formula we apply to the output of the NNUE pure eval.
Increasing this gradient for pawns and pieces means that Stockfish
will try a little harder to keep material when she has the advantage,
and try a little bit harder to escape into an endgame when she is
under pressure.
STC:
LLR: 2.93 (-2.94,2.94) <-0.50,2.50>
Total: 53168 W: 4371 L: 4177 D: 44620
Ptnml(0-2): 160, 3389, 19283, 3601, 151
https://tests.stockfishchess.org/tests/view/60cefd1d457376eb8bcab7ab
LTC:
LLR: 2.94 (-2.94,2.94) <0.50,3.50>
Total: 10888 W: 386 L: 288 D: 10214
Ptnml(0-2): 3, 260, 4821, 356, 4
https://tests.stockfishchess.org/tests/view/60cf709d2114332881e7352b
closes https://github.com/official-stockfish/Stockfish/pull/3571
Bench: 4965430
trained with the Python command
c:\nnue>python train.py i:/bin/all.binpack i:/bin/all.binpack --gpus 1 --threads 4 --num-workers 30 --batch-size 16384 --progress_bar_refresh_rate 300 --smart-fen-skipping --random-fen-skipping 3 --features=HalfKAv2^ --lambda=1.0 --max_epochs=440 --seed %random%%random% --default_root_dir exp/run_10 --resume-from-model ./pt/nn-3b20abec10c1.pt
`
all.binpack equaled 4 parts Wrong_NNUE_2.binpack https://drive.google.com/file/d/1seGNOqcVdvK_vPNq98j-zV3XPE5zWAeq/view?usp=sharing plus two parts of Training_Data.binpack https://drive.google.com/file/d/1RFkQES3DpsiJqsOtUshENtzPfFgUmEff/view?usp=sharing
Each set was concatenated together - making one large Wrong_NNUE 2 binpack and one large Training so the were approximately equal in size. They were then interleaved together. The idea was to give Wrong_NNUE.binpack closer to equal weighting with the Training_Data binpack .
Net nn-3b20abec10c1.nnue was chosen as the --resume-from-model with the idea that through learning, the manually hex edited values will be learned and will not need to be manually adjusted going forward. They would also be fine tuned by the learning process.
passed STC:
https://tests.stockfishchess.org/tests/view/60cdf91e457376eb8bcab66f
LLR: 2.95 (-2.94,2.94) <-0.50,2.50>
Total: 18256 W: 1639 L: 1479 D: 15138
Ptnml(0-2): 59, 1179, 6505, 1313, 72
passed LTC:
https://tests.stockfishchess.org/tests/view/60ce2166457376eb8bcab6e1
LLR: 2.94 (-2.94,2.94) <0.50,3.50>
Total: 18792 W: 654 L: 542 D: 17596
Ptnml(0-2): 9, 490, 8291, 592, 14
closes https://github.com/official-stockfish/Stockfish/pull/3570
Bench: 5020972
The Cygwin environment has two g++ compilers, each with a different problem
for compiling Stockfish at the moment:
(a) g++.exe : full posix build compiler, linked to cygwin dll.
=> This one has a problem embedding the net.
(b) x86_64-w64-mingw32-g++.exe : native Windows build compiler.
=> This one manages to embed the net, but has a problem related to libgcov
when we use the profile-build target of Stockfish.
This patch solves the problem for compiler (b), so that our recommended command line
if you want to build an optimized version of Stockfish on Cygwin becomes something
like the following (you can change the ARCH value to whatever you want, but note
the COMP and CXX variables pointing at the right compiler):
```
make -j profile-build ARCH=x86-64-modern COMP=mingw CXX=x86_64-w64-mingw32-c++.exe
```
closes https://github.com/official-stockfish/Stockfish/pull/3569
No functional change
move to github actions to replace travis CI.
First version, testing on linux using gcc and clang.
gcc build with sanitizers and valgrind.
No functional change
Optimization of vondele's nn-33c9d39e5eb6.nnue using SPSA
https://tests.stockfishchess.org/tests/view/60ca68be457376eb8bcab28b
Setting: ck values are default based on how large the parameters are
The new values for this net are the raw values at the end of the tuning (80k games)
The significant changes are in buckets 1 and 2 (5-12 pieces) so the main difference is in playing endgames if we compare it to nn-33c9. There is also change in bucket 7 (29-32 pieces) but not as substantial as the changes in buckets 1 and 2. If we interpret the changes based on an experiment a few months ago, this new net plays more optimistically during endgames and less optimistically during openings.
STC:
LLR: 2.93 (-2.94,2.94) <-0.50,2.50>
Total: 49504 W: 4246 L: 4053 D: 41205
Ptnml(0-2): 140, 3282, 17749, 3407, 174
https://tests.stockfishchess.org/tests/view/60cbd752457376eb8bcab478
LTC:
LLR: 2.95 (-2.94,2.94) <0.50,3.50>
Total: 88720 W: 4926 L: 4651 D: 79143
Ptnml(0-2): 105, 4048, 35793, 4295, 119
https://tests.stockfishchess.org/tests/view/60cc7828457376eb8bcab4fa
closes https://github.com/official-stockfish/Stockfish/pull/3566
Bench: 4758885
This net was created by @pleomati, who manually edited with an hex editor
10 values randomly chosen in the LCSFNet10 net (nn-6ad41a9207d0.nnue) to
create this one. The LCSFNet10 net was trained by Joost VandeVondele from
a dataset combining Stockfish games and Leela games (16x10^9 positions from
SF self-play at depth 9, and 6.3x10^9 positions from Leela games, so overall
72% of Stockfish positions and 28% of Leela positions).
passed STC 10+0.1:
LLR: 2.94 (-2.94,2.94) <-0.50,2.50>
Total: 50888 W: 5881 L: 5654 D: 39353
Ptnml(0-2): 281, 4290, 16085, 4497, 291
https://tests.stockfishchess.org/tests/view/60cbfa68457376eb8bcab49a
passed LTC 60+0.6:
LLR: 2.94 (-2.94,2.94) <0.50,3.50>
Total: 25480 W: 1498 L: 1338 D: 22644
Ptnml(0-2): 36, 1155, 10193, 1325, 31
https://tests.stockfishchess.org/tests/view/60cc4af8457376eb8bcab4d4
closes https://github.com/official-stockfish/Stockfish/pull/3564
Bench: 4904930
This reverts commit "Fix for Cygwin's environment build-profile", as it was
giving errors for "make clean" on some Windows environments. See comments in
68bf362ea2
Possibly somebody can propose a solution that would fix Cygwin builds and
not break on other system too, stay tuned! :-)
No functional change
The Cygwin environment has two g++ compilers, each with a different problem
for compiling Stockfish at the moment:
(a) g++.exe : full posix build compiler, linked to cygwin dll.
=> This one has a problem embedding the net.
(b) x86_64-w64-mingw32-g++.exe : native Windows build compiler.
=> This one manages to embed the net, but has a problem related to libgcov
when we use the profile-build target of Stockfish.
This patch solves the problem for compiler (b), so that our recommended command line
if you want to build an optimized version of Stockfish on Cygwin becomes something
like the following (you can change the ARCH value to whatever you want, but note
the COMP and CXX variables pointing at the right compiler):
```
make -j profile-build ARCH=x86-64-modern COMP=mingw CXX=x86_64-w64-mingw32-c++.exe
```
closes https://github.com/official-stockfish/Stockfish/pull/3463
No functional change
of a root move leading to a 3-fold repetition.
With this small fix a draw ranking and thus a draw score is being applied.
This works for both, ranking by dtz or wdl tables.
Fixes https://github.com/official-stockfish/Stockfish/issues/3542
(No functional change without TBs.)
Bench: 4877339
This net is the result of training on data used by the Leela project. More precisely,
we shuffled T60 and T74 data kindly provided by borg (for different Tnn, the data is
a result of Leela selfplay with differently sized Leela nets).
The data is available at vondele's google drive:
https://drive.google.com/drive/folders/1mftuzYdl9o6tBaceR3d_VBQIrgKJsFpl.
The Leela data comes in small chunks of .binpack files. To shuffle them, we simply
used a small python script to randomly rename the files, and then concatenated them
using `cat`. As validation data we picked a file of T60 data. We will further investigate
T74 data.
The training for the NNUE architecture used 200 epochs with the Python trainer from
the Stockfish project. Unlike the previous run we tried with this data, this run does
not have adjusted scaling — not because we didn't want to, but because we forgot.
However, this training randomly skips 40% more positions than previous run. The loss
was very spiky and decreased slower than it does usually.
Training loss: https://github.com/official-stockfish/images/blob/main/training-loss-8e47cf062333.png
Validation loss: https://github.com/official-stockfish/images/blob/main/validation-loss-8e47cf062333.png
This is the exact training command:
python train.py --smart-fen-skipping --random-fen-skipping 14 --batch-size 16384 --threads 4 --num-workers 4 --gpus 1 trainingdata\training_data.binpack validationdata\val.binpack
---
10k STC result:
ELO: 3.61 +-3.3 (95%) LOS: 98.4%
Total: 10000 W: 1241 L: 1137 D: 7622
Ptnml(0-2): 68, 841, 3086, 929, 76
https://tests.stockfishchess.org/tests/view/60c67e50457376eb8bcaae70
10k LTC result:
ELO: 2.71 +-2.4 (95%) LOS: 98.8%
Total: 10000 W: 659 L: 581 D: 8760
Ptnml(0-2): 22, 485, 3900, 579, 14
https://tests.stockfishchess.org/tests/view/60c69deb457376eb8bcaae98
Passed LTC:
LLR: 2.93 (-2.94,2.94) <0.50,3.50>
Total: 9648 W: 685 L: 545 D: 8418
Ptnml(0-2): 22, 448, 3740, 596, 18
https://tests.stockfishchess.org/tests/view/60c6d41c457376eb8bcaaecf
---
closes https://github.com/official-stockfish/Stockfish/pull/3550
Bench: 4877339
Compute optimal register count for feature transformer accumulation dynamically.
This also introduces a change where AVX512 would only use 8 registers instead of 16
(now possible due to a 2x increase in feature transformer size).
closes https://github.com/official-stockfish/Stockfish/pull/3543
No functional change
This patch restricts LMR extensions (of non-transposition table moves) from being
used when the transposition table move was extended by two plies via singular
extension. This may serve to limit search explosions in certain positions.
This makes a lot of sense because the precondition for the tt-move to have been
singular extended by two plies is that the result of the alternate search (with
excluded the tt-move) has been a hard fail low: it is natural to later search less
for non tt-moves in this situation.
The current state of depth/extensions/reductions management is getting quite tricky
in our search algo, see https://github.com/official-stockfish/Stockfish/pull/3546#issuecomment-860174549
for some discussion. Suggestions welcome!
Passed STC
https://tests.stockfishchess.org/tests/view/60c3f293457376eb8bcaac8d
LLR: 2.95 (-2.94,2.94) <-0.50,2.50>
Total: 117984 W: 9698 L: 9430 D: 98856
Ptnml(0-2): 315, 7708, 42703, 7926, 340
passed LTC
https://tests.stockfishchess.org/tests/view/60c46ea5457376eb8bcaacc7
LLR: 2.97 (-2.94,2.94) <0.50,3.50>
Total: 11280 W: 401 L: 302 D: 10577
Ptnml(0-2): 2, 271, 4998, 364, 5
closes https://github.com/official-stockfish/Stockfish/pull/3546
Bench: 4709974