1
0
Fork 0
mirror of https://github.com/sockspls/badfish synced 2025-04-30 08:43:09 +00:00
Commit graph

182 commits

Author SHA1 Message Date
Sebastian Buchwald
31acd6bab7 Warn if a global function has no previous declaration
If a global function has no previous declaration, either the declaration
is missing in the corresponding header file or the function should be
declared static. Static functions are local to the translation unit,
which allows the compiler to apply some optimizations earlier (when
compiling the translation unit rather than during link-time
optimization).

The commit enables the warning for gcc, clang, and mingw. It also fixes
the reported warnings by declaring the functions static or by adding a
header file (benchmark.h).

closes https://github.com/official-stockfish/Stockfish/pull/4325

No functional change
2023-01-09 20:18:39 +01:00
Sebastian Buchwald
b60f9cc451 Update copyright years
Happy New Year!

closes https://github.com/official-stockfish/Stockfish/pull/4315

No functional change
2023-01-02 19:07:38 +01:00
Joost VandeVondele
ad2aa8c06f Normalize evaluation
Normalizes the internal value as reported by evaluate or search
to the UCI centipawn result used in output. This value is derived from
the win_rate_model() such that Stockfish outputs an advantage of
"100 centipawns" for a position if the engine has a 50% probability to win
from this position in selfplay at fishtest LTC time control.

The reason to introduce this normalization is that our evaluation is, since NNUE,
no longer related to the classical parameter PawnValueEg (=208). This leads to
the current evaluation changing quite a bit from release to release, for example,
the eval needed to have 50% win probability at fishtest LTC (in cp and internal Value):

June 2020  :   113cp (237)
June 2021  :   115cp (240)
April 2022 :   134cp (279)
July 2022  :   167cp (348)

With this patch, a 100cp advantage will have a fixed interpretation,
i.e. a 50% win chance. To keep this value steady, it will be needed to update the win_rate_model()
from time to time, based on fishtest data. This analysis can be performed with
a set of scripts currently available at https://github.com/vondele/WLD_model

fixes https://github.com/official-stockfish/Stockfish/issues/4155
closes https://github.com/official-stockfish/Stockfish/pull/4216

No functional change
2022-11-05 09:15:53 +01:00
Clausable
8333b2a94c Fix README typos, update AUTHORS
closes https://github.com/official-stockfish/Stockfish/pull/4208

No functional change
2022-10-27 08:15:46 +02:00
mstembera
93f71ecfe1 Optimize make_index() using templates and lookup tables.
https://tests.stockfishchess.org/tests/view/634517e54bc7650f07542f99
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 642672 W: 171819 L: 170658 D: 300195
Ptnml(0-2): 2278, 68077, 179416, 69336, 2229

this also introduces `-flto-partition=one` as suggested by MinetaS (Syine Mineta)
to avoid linking errors due to LTO on 32 bit mingw. This change was tested in isolation as well

https://tests.stockfishchess.org/tests/view/634aacf84bc7650f0755188b
LLR: 2.94 (-2.94,2.94) <-1.75,0.25>
Total: 119352 W: 31986 L: 31862 D: 55504
Ptnml(0-2): 439, 12624, 33400, 12800, 413

closes https://github.com/official-stockfish/Stockfish/pull/4199

No functional change
2022-10-16 11:42:19 +02:00
mstembera
82bb21dc7a Optimize AVX2 path in NNUE evaluation
always selecting AffineTransform specialization for small inputs.

A related patch was tested as

Initially tested as a simplification
STC https://tests.stockfishchess.org/tests/view/6317c3f437f41b13973d6dff
LLR: 2.95 (-2.94,2.94) <-1.75,0.25>
Total: 58072 W: 15619 L: 15425 D: 27028
Ptnml(0-2): 241, 6191, 15992, 6357, 255

Elo gain speedup test
STC https://tests.stockfishchess.org/tests/view/63181c1b37f41b13973d79dc
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 184496 W: 49922 L: 49401 D: 85173
Ptnml(0-2): 851, 19397, 51208, 19964, 828

and this patch gained in testing

speedup        = +0.0071
P(speedup > 0) =  1.0000
on CPU: 16 x AMD Ryzen 9 3950X

closes https://github.com/official-stockfish/Stockfish/pull/4158

No functional change
2022-09-11 14:19:57 +02:00
Dubslow
442c40b43d Use NNUE complexity in search, retune related parameters
This builds on ideas of xoto10 and mstembera to use more output from NNUE in the search algorithm.

passed STC:
https://tests.stockfishchess.org/tests/view/62ae454fe7ee5525ef88a957
LLR: 2.95 (-2.94,2.94) <0.00,2.50>
Total: 89208 W: 24127 L: 23753 D: 41328
Ptnml(0-2): 400, 9886, 23642, 10292, 384

passed LTC:
https://tests.stockfishchess.org/tests/view/62acc6ddd89eb6cf1e0750a1
LLR: 2.93 (-2.94,2.94) <0.50,3.00>
Total: 56352 W: 15430 L: 15115 D: 25807
Ptnml(0-2): 44, 5501, 16782, 5794, 55

closes https://github.com/official-stockfish/Stockfish/pull/4088

bench 5332964
2022-06-20 08:30:57 +02:00
xoto10
7f1333ccf8 Blend nnue complexity with classical.
Following mstembera's test of the complexity value derived from nnue values,
this change blends that idea with the old complexity calculation.

STC 10+0.1:
LLR: 2.95 (-2.94,2.94) <0.00,2.50>
Total: 42320 W: 11436 L: 11148 D: 19736
Ptnml(0-2): 209, 4585, 11263, 4915, 188
https://tests.stockfishchess.org/tests/live_elo/6295c9239c8c2fcb2bad7fd9

LTC 60+0.6:
LLR: 2.98 (-2.94,2.94) <0.50,3.00>
Total: 34600 W: 9393 L: 9125 D: 16082
Ptnml(0-2): 32, 3323, 10319, 3597, 29
https://tests.stockfishchess.org/tests/view/6295fd5d9c8c2fcb2bad88cf

closes https://github.com/official-stockfish/Stockfish/pull/4046

Bench 6078140
2022-06-02 07:47:23 +02:00
Giacomo Lorenzetti
f7d1491b3d Assorted small cleanups
closes https://github.com/official-stockfish/Stockfish/pull/3973

No functional change
2022-05-29 18:42:48 +02:00
Tomasz Sobczyk
c079acc26f Update NNUE architecture to SFNNv5. Update network to nn-3c0aa92af1da.nnue.
Architecture changes:

    Duplicated activation after the 1024->15 layer with squared crelu (so 15->15*2). As proposed by vondele.

Trainer changes:

    Added bias to L1 factorization, which was previously missing (no measurable improvement but at least neutral in principle)
    For retraining linearly reduce lambda parameter from 1.0 at epoch 0 to 0.75 at epoch 800.
    reduce max_skipping_rate from 15 to 10 (compared to vondele's outstanding PR)

Note: This network was trained with a ~0.8% error in quantization regarding the newly added activation function.
      This will be fixed in the released trainer version. Expect a trainer PR tomorrow.

Note: The inference implementation cuts a corner to merge results from two activation functions.
       This could possibly be resolved nicer in the future. AVX2 implementation likely not necessary, but NEON is missing.

First training session invocation:

python3 train.py \
    ../nnue-pytorch-training/data/nodes5000pv2_UHO.binpack \
    ../nnue-pytorch-training/data/nodes5000pv2_UHO.binpack \
    --gpus "$3," \
    --threads 4 \
    --num-workers 8 \
    --batch-size 16384 \
    --progress_bar_refresh_rate 20 \
    --random-fen-skipping 3 \
    --features=HalfKAv2_hm^ \
    --lambda=1.0 \
    --max_epochs=400 \
    --default_root_dir ../nnue-pytorch-training/experiment_$1/run_$2

Second training session invocation:

python3 train.py \
    ../nnue-pytorch-training/data/T60T70wIsRightFarseerT60T74T75T76.binpack \
    ../nnue-pytorch-training/data/T60T70wIsRightFarseerT60T74T75T76.binpack \
    --gpus "$3," \
    --threads 4 \
    --num-workers 8 \
    --batch-size 16384 \
    --progress_bar_refresh_rate 20 \
    --random-fen-skipping 3 \
    --features=HalfKAv2_hm^ \
    --start-lambda=1.0 \
    --end-lambda=0.75 \
    --gamma=0.995 \
    --lr=4.375e-4 \
    --max_epochs=800 \
    --resume-from-model /data/sopel/nnue/nnue-pytorch-training/data/exp367/nn-exp367-run3-epoch399.pt \
    --default_root_dir ../nnue-pytorch-training/experiment_$1/run_$2

Passed STC:
LLR: 2.95 (-2.94,2.94) <0.00,2.50>
Total: 27288 W: 7445 L: 7178 D: 12665
Ptnml(0-2): 159, 3002, 7054, 3271, 158
https://tests.stockfishchess.org/tests/view/627e8c001919125939623644

Passed LTC:
LLR: 2.95 (-2.94,2.94) <0.50,3.00>
Total: 21792 W: 5969 L: 5727 D: 10096
Ptnml(0-2): 25, 2152, 6294, 2406, 19
https://tests.stockfishchess.org/tests/view/627f2a855734b18b2e2ece47

closes https://github.com/official-stockfish/Stockfish/pull/4020

Bench: 6481017
2022-05-14 12:47:22 +02:00
Topologist
471d93063a Play more positional in endgames
This patch chooses the delta value (which skews the nnue evaluation between positional and materialistic)
depending on the material: If the material is low, delta will be higher and the evaluation is shifted
to the positional value. If the material is high, the evaluation will be shifted to the psqt value.
I don't think slightly negative values of delta should be a concern.

Passed STC:
https://tests.stockfishchess.org/tests/view/62418513b3b383e86185766f
LLR: 2.94 (-2.94,2.94) <0.00,2.50>
Total: 28808 W: 7832 L: 7564 D: 13412
Ptnml(0-2): 147, 3186, 7505, 3384, 182

Passed LTC:
https://tests.stockfishchess.org/tests/view/62419137b3b383e861857842
LLR: 2.96 (-2.94,2.94) <0.50,3.00>
Total: 58632 W: 15776 L: 15450 D: 27406
Ptnml(0-2): 42, 5889, 17149, 6173, 63

closes https://github.com/official-stockfish/Stockfish/pull/3971

Bench: 7588855
2022-03-28 22:43:52 +02:00
Ben Chaney
270a0e737f Generalize the feature transform to use vec_t macros
This commit generalizes the feature transform to use vec_t macros
that are architecture defined instead of using a seperate code path for each one.

It should make some old architectures (MMX, including improvements by Fanael) faster
and make further such improvements easier in the future.

Includes some corrections to CI for mingw.

closes https://github.com/official-stockfish/Stockfish/pull/3955
closes https://github.com/official-stockfish/Stockfish/pull/3928

No functional change
2022-03-02 23:39:08 +01:00
Tomasz Sobczyk
174b038bf3 Use dynamic allocation for evaluation scratch TLS buffer.
fixes #3946 an issue related with the toolchain as found in xcode 12 on macOS,
related to previous commit 5f781d36.

closes https://github.com/official-stockfish/Stockfish/pull/3950

No functional change
2022-03-01 17:51:02 +01:00
mstembera
5f781d366e Clean up and simplify some nnue code.
Remove some unnecessary code and it's execution during inference. Also the change on line 49 in nnue_architecture.h results in a more efficient SIMD code path through ClippedReLU::propagate().

passed STC:
https://tests.stockfishchess.org/tests/view/6217d3bfda649bba32ef25d5
LLR: 2.94 (-2.94,2.94) <-2.25,0.25>
Total: 12056 W: 3281 L: 3092 D: 5683
Ptnml(0-2): 55, 1213, 3312, 1384, 64

passed STC SMP:
https://tests.stockfishchess.org/tests/view/6217f344da649bba32ef295e
LLR: 2.94 (-2.94,2.94) <-2.25,0.25>
Total: 27376 W: 7295 L: 7137 D: 12944
Ptnml(0-2): 52, 2859, 7715, 3003, 59

closes https://github.com/official-stockfish/Stockfish/pull/3944

No functional change

bench: 6820724
2022-02-25 08:37:57 +01:00
Tomasz Sobczyk
cb9c2594fc Update architecture to "SFNNv4". Update network to nn-6877cd24400e.nnue.
Architecture:

The diagram of the "SFNNv4" architecture:
https://user-images.githubusercontent.com/8037982/153455685-cbe3a038-e158-4481-844d-9d5fccf5c33a.png

The most important architectural changes are the following:

* 1024x2 [activated] neurons are pairwise, elementwise multiplied (not quite pairwise due to implementation details, see diagram), which introduces a non-linearity that exhibits similar benefits to previously tested sigmoid activation (quantmoid4), while being slightly faster.
* The following layer has therefore 2x less inputs, which we compensate by having 2 more outputs. It is possible that reducing the number of outputs might be beneficial (as we had it as low as 8 before). The layer is now 1024->16.
* The 16 outputs are split into 15 and 1. The 1-wide output is added to the network output (after some necessary scaling due to quantization differences). The 15-wide is activated and follows the usual path through a set of linear layers. The additional 1-wide output is at least neutral, but has shown a slightly positive trend in training compared to networks without it (all 16 outputs through the usual path), and allows possibly an additional stage of lazy evaluation to be introduced in the future.

Additionally, the inference code was rewritten and no longer uses a recursive implementation. This was necessitated by the splitting of the 16-wide intermediate result into two, which was impossible to do with the old implementation with ugly hacks. This is hopefully overall for the better.

First session:

The first session was training a network from scratch (random initialization). The exact trainer used was slightly different (older) from the one used in the second session, but it should not have a measurable effect. The purpose of this session is to establish a strong network base for the second session. Small deviations in strength do not harm the learnability in the second session.

The training was done using the following command:

python3 train.py \
    /home/sopel/nnue/nnue-pytorch-training/data/nodes5000pv2_UHO.binpack \
    /home/sopel/nnue/nnue-pytorch-training/data/nodes5000pv2_UHO.binpack \
    --gpus "$3," \
    --threads 4 \
    --num-workers 4 \
    --batch-size 16384 \
    --progress_bar_refresh_rate 20 \
    --random-fen-skipping 3 \
    --features=HalfKAv2_hm^ \
    --lambda=1.0 \
    --gamma=0.992 \
    --lr=8.75e-4 \
    --max_epochs=400 \
    --default_root_dir ../nnue-pytorch-training/experiment_$1/run_$2

Every 20th net was saved and its playing strength measured against some baseline at 25k nodes per move with pure NNUE evaluation (modified binary). The exact setup is not important as long as it's consistent. The purpose is to sift good candidates from bad ones.

The dataset can be found https://drive.google.com/file/d/1UQdZN_LWQ265spwTBwDKo0t1WjSJKvWY/view

Second session:

The second training session was done starting from the best network (as determined by strength testing) from the first session. It is important that it's resumed from a .pt model and NOT a .ckpt model. The conversion can be performed directly using serialize.py

The LR schedule was modified to use gamma=0.995 instead of gamma=0.992 and LR=4.375e-4 instead of LR=8.75e-4 to flatten the LR curve and allow for longer training. The training was then running for 800 epochs instead of 400 (though it's possibly mostly noise after around epoch 600).

The training was done using the following command:

The training was done using the following command:

python3 train.py \
        /data/sopel/nnue/nnue-pytorch-training/data/T60T70wIsRightFarseerT60T74T75T76.binpack \
        /data/sopel/nnue/nnue-pytorch-training/data/T60T70wIsRightFarseerT60T74T75T76.binpack \
        --gpus "$3," \
        --threads 4 \
        --num-workers 4 \
        --batch-size 16384 \
        --progress_bar_refresh_rate 20 \
        --random-fen-skipping 3 \
        --features=HalfKAv2_hm^ \
        --lambda=1.0 \
        --gamma=0.995 \
        --lr=4.375e-4 \
        --max_epochs=800 \
        --resume-from-model /data/sopel/nnue/nnue-pytorch-training/data/exp295/nn-epoch399.pt \
        --default_root_dir ../nnue-pytorch-training/experiment_$1/run_$run_id

In particular note that we now use lambda=1.0 instead of lambda=0.8 (previous nets), because tests show that WDL-skipping introduced by vondele performs better with lambda=1.0. Nets were being saved every 20th epoch. In total 16 runs were made with these settings and the best nets chosen according to playing strength at 25k nodes per move with pure NNUE evaluation - these are the 4 nets that have been put on fishtest.

The dataset can be found either at ftp://ftp.chessdb.cn/pub/sopel/data_sf/T60T70wIsRightFarseerT60T74T75T76.binpack in its entirety (download might be painfully slow because hosted in China) or can be assembled in the following way:

Get the 5640ad48ae/script/interleave_binpacks.py script.
Download T60T70wIsRightFarseer.binpack https://drive.google.com/file/d/1_sQoWBl31WAxNXma2v45004CIVltytP8/view
Download farseerT74.binpack http://trainingdata.farseer.org/T74-May13-End.7z
Download farseerT75.binpack http://trainingdata.farseer.org/T75-June3rd-End.7z
Download farseerT76.binpack http://trainingdata.farseer.org/T76-Nov10th-End.7z
Run python3 interleave_binpacks.py T60T70wIsRightFarseer.binpack farseerT74.binpack farseerT75.binpack farseerT76.binpack T60T70wIsRightFarseerT60T74T75T76.binpack

Tests:

STC: https://tests.stockfishchess.org/tests/view/6203fb85d71106ed12a407b7
LLR: 2.94 (-2.94,2.94) <0.00,2.50>
Total: 16952 W: 4775 L: 4521 D: 7656
Ptnml(0-2): 133, 1818, 4318, 2076, 131

LTC: https://tests.stockfishchess.org/tests/view/62041e68d71106ed12a40e85
LLR: 2.94 (-2.94,2.94) <0.50,3.00>
Total: 14944 W: 4138 L: 3907 D: 6899
Ptnml(0-2): 21, 1499, 4202, 1728, 22

closes https://github.com/official-stockfish/Stockfish/pull/3927

Bench: 4919707
2022-02-10 19:54:31 +01:00
Brad Knox
ad926d34c0 Update copyright years
Happy New Year!

closes https://github.com/official-stockfish/Stockfish/pull/3881

No functional change
2022-01-06 15:45:45 +01:00
Stéphane Nicolet
74776dbcd5 Simplification in evaluate_nnue.cpp
Removes the test on non-pawn-material before applying the positional/materialistic bonus.

Passed STC:
LLR: 2.94 (-2.94,2.94) <-2.25,0.25>
Total: 46904 W: 12197 L: 12059 D: 22648
Ptnml(0-2): 170, 5243, 12479, 5399, 161
https://tests.stockfishchess.org/tests/view/61be57cf57a0d0f327c3999d

Passed LTC:
LLR: 2.95 (-2.94,2.94) <-2.25,0.25>
Total: 18760 W: 4958 L: 4790 D: 9012
Ptnml(0-2): 14, 1942, 5301, 2108, 15
https://tests.stockfishchess.org/tests/view/61bed1fb57a0d0f327c3afa9

closes https://github.com/official-stockfish/Stockfish/pull/3866

Bench: 4826206
2021-12-19 15:44:01 +01:00
Tomasz Sobczyk
4766dfc395 Optimize FT activation and affine transform for NEON.
This patch optimizes the NEON implementation in two ways.

    The activation layer after the feature transformer is rewritten to make it easier for the compiler to see through dependencies and unroll. This in itself is a minimal, but a positive improvement. Other architectures could benefit from this too in the future. This is not an algorithmic change.
    The affine transform for large matrices (first layer after FT) on NEON now utilizes the same optimized code path as >=SSSE3, which makes the memory accesses more sequential and makes better use of the available registers, which allows for code that has longer dependency chains.

Benchmarks from Redshift#161, profile-build with apple clang

george@Georges-MacBook-Air nets % ./stockfish-b82d93 bench 2>&1 | tail -4 (current master)
===========================
Total time (ms) : 2167
Nodes searched  : 4667742
Nodes/second    : 2154011
george@Georges-MacBook-Air nets % ./stockfish-7377b8 bench 2>&1 | tail -4 (this patch)
===========================
Total time (ms) : 1842
Nodes searched  : 4667742
Nodes/second    : 2534061

This is a solid 18% improvement overall, larger in a bench with NNUE-only, not mixed.

Improvement is also observed on armv7-neon (Raspberry Pi, and older phones), around 5% speedup.

No changes for architectures other than NEON.

closes https://github.com/official-stockfish/Stockfish/pull/3837

No functional changes.
2021-12-07 18:08:54 +01:00
Michael Ortmann
4b86ef8c4f Fix typos in comments, adjust readme
closes https://github.com/official-stockfish/Stockfish/pull/3822

also adjusts readme as requested in https://github.com/official-stockfish/Stockfish/pull/3816

No functional change
2021-12-01 18:07:30 +01:00
hengyu
64f21ecdae Small clean-up
remove unneeded calculation.

closes https://github.com/official-stockfish/Stockfish/pull/3807

No functional change.
2021-12-01 17:59:20 +01:00
Stefano Cardanobile
2214fcecf7 Rewrite NNUE evaluation adjustments
Make the eval code in the evaluate_nnue.cpp more similar to the rest of the codebase:

* remove multiple variable assignment
* make if conditions explicit and indent on multiple lines

passed STC
LLR: 2.93 (-2.94,2.94) <-2.50,0.50>
Total: 59032 W: 14834 L: 14751 D: 29447
Ptnml(0-2): 176, 6310, 16459, 6397, 174
https://tests.stockfishchess.org/tests/view/616f250540f619782fd4f76d

closes https://github.com/official-stockfish/Stockfish/pull/3753

No functional change
2021-10-23 12:22:02 +02:00
mstembera
644f6d4790 Simplify away ValueListInserter
plus minor cleanups

STC: https://tests.stockfishchess.org/tests/view/616f059b40f619782fd4f73f
LLR: 2.94 (-2.94,2.94) <-2.50,0.50>
Total: 84992 W: 21244 L: 21197 D: 42551
Ptnml(0-2): 279, 9005, 23868, 9078, 266

closes https://github.com/official-stockfish/Stockfish/pull/3749

No functional change
2021-10-23 12:21:17 +02:00
xoto10
f21a66f70d Small clean-up, Sept 2021
Closes https://github.com/official-stockfish/Stockfish/pull/3485

No functional change
2021-10-07 09:41:57 +02:00
Michael Chaly
e8788d1b32 Combo of various parameter tweaks
Combination of parameter tweaks in search, evaluation and time management.
Original patches by snicolet xoto10 lonfom169 and Vizvezdenec.

Includes:

* Use bigger grain of positional evaluation more frequently (up to 1 exchange difference in non-pawn-material);
* More extra time according to increment;
* Increase margin for singular extensions;
* Do more aggresive parent node futility pruning.

Passed STC
https://tests.stockfishchess.org/tests/view/6147deab3733d0e0dd9f313d
LLR: 2.94 (-2.94,2.94) <-0.50,2.50>
Total: 45488 W: 11691 L: 11450 D: 22347
Ptnml(0-2): 145, 5208, 11824, 5395, 172

Passed LTC
https://tests.stockfishchess.org/tests/view/6147f1d53733d0e0dd9f3141
LLR: 2.94 (-2.94,2.94) <0.50,3.50>
Total: 62520 W: 15808 L: 15482 D: 31230
Ptnml(0-2): 43, 6439, 17960, 6785, 33

closes https://github.com/official-stockfish/Stockfish/pull/3710

bench 5575265
2021-09-21 19:48:40 +02:00
Tomasz Sobczyk
18dcf1f097 Optimize and tidy up affine transform code.
The new network caused some issues initially due to the very narrow neuron set between the first two FC layers. Necessary changes were hacked together to make it work. This patch is a mature approach to make the affine transform code faster, more readable, and easier to maintain should the layer sizes change again.

The following changes were made:

* ClippedReLU always produces a multiple of 32 outputs. This is about as good of a solution for AffineTransform's SIMD requirements as it can get without a bigger rewrite.

* All self-contained simd helpers are moved to a separate file (simd.h). Inline asm is utilized to work around GCC's issues with code generation and register assignment. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101693, https://godbolt.org/z/da76fY1n7

* AffineTransform has 2 specializations. While it's more lines of code due to the boilerplate, the logic in both is significantly reduced, as these two are impossible to nicely combine into one.
 1) The first specialization is for cases when there's >=128 inputs. It uses a different approach to perform the affine transform and can make full use of AVX512 without any edge cases. Furthermore, it has higher theoretical throughput because less loads are needed in the hot path, requiring only a fixed amount of instructions for horizontal additions at the end, which are amortized by the large number of inputs.
 2) The second specialization is made to handle smaller layers where performance is still necessary but edge cases need to be handled. AVX512 implementation for this was ommited by mistake, a remnant from the temporary implementation for the new... This could be easily reintroduced if needed. A slightly more detailed description of both implementations is in the code.

Overall it should be a minor speedup, as shown on fishtest:

passed STC:
LLR: 2.96 (-2.94,2.94) <-0.50,2.50>
Total: 51520 W: 4074 L: 3888 D: 43558
Ptnml(0-2): 111, 3136, 19097, 3288, 128

and various tests shown in the pull request

closes https://github.com/official-stockfish/Stockfish/pull/3663

No functional change
2021-08-20 08:50:25 +02:00
Tomasz Sobczyk
d61d38586e New NNUE architecture and net
Introduces a new NNUE network architecture and associated network parameters

The summary of the changes:

* Position for each perspective mirrored such that the king is on e..h files. Cuts the feature transformer size in half, while preserving enough knowledge to be good. See https://docs.google.com/document/d/1gTlrr02qSNKiXNZ_SuO4-RjK4MXBiFlLE6jvNqqMkAY/edit#heading=h.b40q4rb1w7on.
* The number of neurons after the feature transformer increased two-fold, to 1024x2. This is possibly mostly due to the now very optimized feature transformer update code.
* The number of neurons after the second layer is reduced from 16 to 8, to reduce the speed impact. This, perhaps surprisingly, doesn't harm the strength much. See https://docs.google.com/document/d/1gTlrr02qSNKiXNZ_SuO4-RjK4MXBiFlLE6jvNqqMkAY/edit#heading=h.6qkocr97fezq

The AffineTransform code did not work out-of-the box with the smaller number of neurons after the second layer, so some temporary changes have been made to add a special case for InputDimensions == 8. Also additional 0 padding is added to the output for some archs that cannot process inputs by <=8 (SSE2, NEON). VNNI uses an implementation that can keep all outputs in the registers while reducing the number of loads by 3 for each 16 inputs, thanks to the reduced number of output neurons. However GCC is particularily bad at optimization here (and perhaps why the current way the affine transform is done even passed sprt) (see https://docs.google.com/document/d/1gTlrr02qSNKiXNZ_SuO4-RjK4MXBiFlLE6jvNqqMkAY/edit# for details) and more work will be done on this in the following days. I expect the current VNNI implementation to be improved and extended to other architectures.

The network was trained with a slightly modified version of the pytorch trainer (https://github.com/glinscott/nnue-pytorch); the changes are in https://github.com/glinscott/nnue-pytorch/pull/143

The training utilized 2 datasets.

    dataset A - https://drive.google.com/file/d/1VlhnHL8f-20AXhGkILujnNXHwy9T-MQw/view?usp=sharing
    dataset B - as described in ba01f4b954

The training process was as following:

    train on dataset A for 350 epochs, take the best net in terms of elo at 20k nodes per move (it's fine to take anything from later stages of training).
    convert the .ckpt to .pt
    --resume-from-model from the .pt file, train on dataset B for <600 epochs, take the best net. Lambda=0.8, applied before the loss function.

The first training command:

python3 train.py \
    ../nnue-pytorch-training/data/large_gensfen_multipvdiff_100_d9.binpack \
    ../nnue-pytorch-training/data/large_gensfen_multipvdiff_100_d9.binpack \
    --gpus "$3," \
    --threads 1 \
    --num-workers 1 \
    --batch-size 16384 \
    --progress_bar_refresh_rate 20 \
    --smart-fen-skipping \
    --random-fen-skipping 3 \
    --features=HalfKAv2_hm^ \
    --lambda=1.0 \
    --max_epochs=600 \
    --default_root_dir ../nnue-pytorch-training/experiment_$1/run_$2

The second training command:

python3 serialize.py \
    --features=HalfKAv2_hm^ \
    ../nnue-pytorch-training/experiment_131/run_6/default/version_0/checkpoints/epoch-499.ckpt \
    ../nnue-pytorch-training/experiment_$1/base/base.pt

python3 train.py \
    ../nnue-pytorch-training/data/michael_commit_b94a65.binpack \
    ../nnue-pytorch-training/data/michael_commit_b94a65.binpack \
    --gpus "$3," \
    --threads 1 \
    --num-workers 1 \
    --batch-size 16384 \
    --progress_bar_refresh_rate 20 \
    --smart-fen-skipping \
    --random-fen-skipping 3 \
    --features=HalfKAv2_hm^ \
    --lambda=0.8 \
    --max_epochs=600 \
    --resume-from-model ../nnue-pytorch-training/experiment_$1/base/base.pt \
    --default_root_dir ../nnue-pytorch-training/experiment_$1/run_$2

STC: https://tests.stockfishchess.org/tests/view/611120b32a8a49ac5be798c4

LLR: 2.97 (-2.94,2.94) <-0.50,2.50>
Total: 22480 W: 2434 L: 2251 D: 17795
Ptnml(0-2): 101, 1736, 7410, 1865, 128

LTC: https://tests.stockfishchess.org/tests/view/611152b32a8a49ac5be798ea

LLR: 2.93 (-2.94,2.94) <0.50,3.50>
Total: 9776 W: 442 L: 333 D: 9001
Ptnml(0-2): 5, 295, 4180, 402, 6

closes https://github.com/official-stockfish/Stockfish/pull/3646

bench: 5189338
2021-08-15 12:05:43 +02:00
Tomasz Sobczyk
26edf9534a Avoid unnecessary stores in the affine transform
This patch improves the codegen in the AffineTransform::forward function for architectures >=SSSE3. Current code works directly on memory and the compiler cannot see that the stores through outptr do not alias the loads through weights and input32. The solution implemented is to perform the affine transform with local variables as accumulators and only store the result to memory at the end. The number of accumulators required is OutputDimensions / OutputSimdWidth, which means that for the 1024->16 affine transform it requires 4 registers with SSSE3, 2 with AVX2, 1 with AVX512. It also cuts the number of stores required by NumRegs * 256 for each node evaluated. The local accumulators are expected to be assigned to registers, but even if this cannot be done in some case due to register pressure it will help the compiler to see that there is no aliasing between the loads and stores and may still result in better codegen.

See https://godbolt.org/z/59aTKbbYc for codegen comparison.

passed STC:
LLR: 2.94 (-2.94,2.94) <-0.50,2.50>
Total: 140328 W: 10635 L: 10358 D: 119335
Ptnml(0-2): 302, 8339, 52636, 8554, 333

closes https://github.com/official-stockfish/Stockfish/pull/3634

No functional change
2021-07-30 17:15:52 +02:00
Stéphane Nicolet
b51b094419 Simplify format_cp_aligned_dot()
closes https://github.com/official-stockfish/Stockfish/pull/3583

No functional change
2021-07-03 09:25:16 +02:00
Joost VandeVondele
2e2865d34b Fix build error on OSX
directly use integer version for cp calculation.

fixes https://github.com/official-stockfish/Stockfish/issues/3573

closes https://github.com/official-stockfish/Stockfish/pull/3574

No functional change
2021-06-21 23:14:58 +02:00
Tomasz Sobczyk
2e745956c0 Change trace with NNUE eval support
This patch adds some more output to the `eval` command. It adds a board display
with estimated piece values (method is remove-piece, evaluate, put-piece), and
splits the NNUE evaluation with (psqt,layers) for each bucket for the NNUE net.

Example:

```

./stockfish
position fen 3Qb1k1/1r2ppb1/pN1n2q1/Pp1Pp1Pr/4P2p/4BP2/4B1R1/1R5K b - - 11 40
eval

 Contributing terms for the classical eval:
+------------+-------------+-------------+-------------+
|    Term    |    White    |    Black    |    Total    |
|            |   MG    EG  |   MG    EG  |   MG    EG  |
+------------+-------------+-------------+-------------+
|   Material |  ----  ---- |  ----  ---- | -0.73 -1.55 |
|  Imbalance |  ----  ---- |  ----  ---- | -0.21 -0.17 |
|      Pawns |  0.35 -0.00 |  0.19 -0.26 |  0.16  0.25 |
|    Knights |  0.04 -0.08 |  0.12 -0.01 | -0.08 -0.07 |
|    Bishops | -0.34 -0.87 | -0.17 -0.61 | -0.17 -0.26 |
|      Rooks |  0.12  0.00 |  0.08  0.00 |  0.04  0.00 |
|     Queens |  0.00  0.00 | -0.27 -0.07 |  0.27  0.07 |
|   Mobility |  0.84  1.76 |  0.01  0.66 |  0.83  1.10 |
|King safety | -0.99 -0.17 | -0.72 -0.10 | -0.27 -0.07 |
|    Threats |  0.27  0.27 |  0.73  0.86 | -0.46 -0.59 |
|     Passed |  0.00  0.00 |  0.79  0.82 | -0.79 -0.82 |
|      Space |  0.61  0.00 |  0.24  0.00 |  0.37  0.00 |
|   Winnable |  ----  ---- |  ----  ---- |  0.00 -0.03 |
+------------+-------------+-------------+-------------+
|      Total |  ----  ---- |  ----  ---- | -1.03 -2.14 |
+------------+-------------+-------------+-------------+

 NNUE derived piece values:
+-------+-------+-------+-------+-------+-------+-------+-------+
|       |       |       |   Q   |   b   |       |   k   |       |
|       |       |       | +12.4 | -1.62 |       |       |       |
+-------+-------+-------+-------+-------+-------+-------+-------+
|       |   r   |       |       |   p   |   p   |   b   |       |
|       | -3.89 |       |       | -0.84 | -1.19 | -3.32 |       |
+-------+-------+-------+-------+-------+-------+-------+-------+
|   p   |   N   |       |   n   |       |       |   q   |       |
| -1.81 | +3.71 |       | -4.82 |       |       | -5.04 |       |
+-------+-------+-------+-------+-------+-------+-------+-------+
|   P   |   p   |       |   P   |   p   |       |   P   |   r   |
| +1.16 | -0.91 |       | +0.55 | +0.12 |       | +0.50 | -4.02 |
+-------+-------+-------+-------+-------+-------+-------+-------+
|       |       |       |       |   P   |       |       |   p   |
|       |       |       |       | +2.33 |       |       | +1.17 |
+-------+-------+-------+-------+-------+-------+-------+-------+
|       |       |       |       |   B   |   P   |       |       |
|       |       |       |       | +4.79 | +1.54 |       |       |
+-------+-------+-------+-------+-------+-------+-------+-------+
|       |       |       |       |   B   |       |   R   |       |
|       |       |       |       | +4.54 |       | +6.03 |       |
+-------+-------+-------+-------+-------+-------+-------+-------+
|       |   R   |       |       |       |       |       |   K   |
|       | +4.81 |       |       |       |       |       |       |
+-------+-------+-------+-------+-------+-------+-------+-------+

 NNUE network contributions (Black to move)
+------------+------------+------------+------------+
|   Bucket   |  Material  | Positional |   Total    |
|            |   (PSQT)   |  (Layers)  |            |
+------------+------------+------------+------------+
|  0         |  +  0.32   |  -  1.46   |  -  1.13   |
|  1         |  +  0.25   |  -  0.68   |  -  0.43   |
|  2         |  +  0.46   |  -  1.72   |  -  1.25   |
|  3         |  +  0.55   |  -  1.80   |  -  1.25   |
|  4         |  +  0.48   |  -  1.77   |  -  1.29   |
|  5         |  +  0.40   |  -  2.00   |  -  1.60   |
|  6         |  +  0.57   |  -  2.12   |  -  1.54   | <-- this bucket is used
|  7         |  +  3.38   |  -  2.00   |  +  1.37   |
+------------+------------+------------+------------+

Classical evaluation   -1.00 (white side)
NNUE evaluation        +1.54 (white side)
Final evaluation       +2.38 (white side) [with scaled NNUE, hybrid, ...]

```

Also renames the export_net() function to save_eval() while there.

closes https://github.com/official-stockfish/Stockfish/pull/3562

No functional change
2021-06-19 11:57:01 +02:00
Tomasz Sobczyk
900f249f59 Reduce the number of accumulator states
Reduce from 3 to 2. Make the intent of the states clearer.

STC: https://tests.stockfishchess.org/tests/view/60c50111457376eb8bcaad03
LLR: 2.95 (-2.94,2.94) <-2.50,0.50>
Total: 61888 W: 5007 L: 4944 D: 51937
Ptnml(0-2): 164, 3947, 22649, 4030, 154

LTC: https://tests.stockfishchess.org/tests/view/60c52b1c457376eb8bcaad2c
LLR: 2.94 (-2.94,2.94) <-2.50,0.50>
Total: 20248 W: 688 L: 618 D: 18942
Ptnml(0-2): 7, 551, 8946, 605, 15

closes https://github.com/official-stockfish/Stockfish/pull/3548

No functional change.
2021-06-14 11:22:08 +02:00
Tomasz Sobczyk
ce4c523ad3 Register count for feature transformer
Compute optimal register count for feature transformer accumulation dynamically.
This also introduces a change where AVX512 would only use 8 registers instead of 16
(now possible due to a 2x increase in feature transformer size).

closes https://github.com/official-stockfish/Stockfish/pull/3543

No functional change
2021-06-13 13:10:56 +02:00
Stéphane Nicolet
7819412002 Clarify use of UCI options
Update README.md to clarify use of UCI options

closes https://github.com/official-stockfish/Stockfish/pull/3540

No functional change
2021-06-13 10:02:43 +02:00
Tomasz Sobczyk
b84fa04db6 Read NNUE net faster
Load feature transformer weights in bulk on little-endian machines.
This is in particular useful to test new nets with c-chess-cli,
see https://github.com/lucasart/c-chess-cli/issues/44

```
$ time ./stockfish.exe uci

Before : 0m0.914s
After  : 0m0.483s
```

No functional change
2021-06-13 09:39:03 +02:00
Stéphane Nicolet
8f081c86f7 Clean SIMD code a bit
Cleaner vector code structure in feature transformer. This patch just
regroups the parts of the inner loop for each SIMD instruction set.

Tested for non-regression:
LLR: 2.96 (-2.94,2.94) <-2.50,0.50>
Total: 115760 W: 9835 L: 9831 D: 96094
Ptnml(0-2): 326, 7776, 41715, 7694, 369
https://tests.stockfishchess.org/tests/view/60b96b39457376eb8bcaa26e

It would be nice if a future patch could use some of the macros at
the top of the file to unify the code between the distincts SIMD
instruction sets (of course, unifying the Relu will be the challenge).

closes https://github.com/official-stockfish/Stockfish/pull/3506

No functional change
2021-06-04 14:07:46 +02:00
Tomasz Sobczyk
5448cad29e Fix export of the feature transformer.
PSQT export was missing.

fixes #3507

closes https://github.com/official-stockfish/Stockfish/pull/3508

No functional change
2021-05-30 21:31:58 +02:00
Stéphane Nicolet
f193778446 Do not use lazy evaluation inside NNUE
This simplification patch implements two changes:

1. it simplifies away the so-called "lazy" path in the NNUE evaluation internals,
   where we trusted the psqt head alone to avoid the costly "positional" head in
   some cases;
2. it raises a little bit the NNUEThreshold1 in evaluate.cpp (from 682 to 800),
   which increases the limit where we switched from NNUE eval to Classical eval.

Both effects increase the number of positional evaluations done by our new net
architecture, but the results of our tests below seem to indicate that the loss
of speed will be compensated by the gain of eval quality.

STC:
LLR: 2.95 (-2.94,2.94) <-2.50,0.50>
Total: 26280 W: 2244 L: 2137 D: 21899
Ptnml(0-2): 72, 1755, 9405, 1810, 98
https://tests.stockfishchess.org/tests/view/60ae73f112066fd299795a51

LTC:
LLR: 2.95 (-2.94,2.94) <-2.50,0.50>
Total: 20592 W: 750 L: 677 D: 19165
Ptnml(0-2): 9, 614, 8980, 681, 12
https://tests.stockfishchess.org/tests/view/60ae88e812066fd299795a82

closes https://github.com/official-stockfish/Stockfish/pull/3503

Bench: 3817907
2021-05-27 01:21:56 +02:00
Tomasz Sobczyk
9d53129075 Expose the lazy threshold for the feature transformer PSQT as a parameter.
Definition of the lazy threshold moved to evaluate.cpp where all others are.
Lazy threshold only used for real searches, not used for the "eval" call.
This preserves the purity of NNUE evaluation, which is useful to verify
consistency between the engine and the NNUE trainer.

closes https://github.com/official-stockfish/Stockfish/pull/3499

No functional change
2021-05-25 21:40:51 +02:00
Stéphane Nicolet
a2f01c07eb Sometimes change the (materialist, positional) balance
Our new nets output two values for the side to move in the last layer.
We can interpret the first value as a material evaluation of the
position, and the second one as the dynamic, positional value of the
location of pieces.

This patch changes the balance for the (materialist, positional) parts
of the score from (128, 128) to (121, 135) when the piece material is
equal between the two players, but keeps the standard (128, 128) balance
when one player is at least an exchange up.

Passed STC:
LLR: 2.93 (-2.94,2.94) <-0.50,2.50>
Total: 15936 W: 1421 L: 1266 D: 13249
Ptnml(0-2): 37, 1037, 5694, 1134, 66
https://tests.stockfishchess.org/tests/view/60a82df9ce8ea25a3ef0408f

Passed LTC:
LLR: 2.94 (-2.94,2.94) <0.50,3.50>
Total: 13904 W: 516 L: 410 D: 12978
Ptnml(0-2): 4, 374, 6088, 484, 2
https://tests.stockfishchess.org/tests/view/60a8bbf9ce8ea25a3ef04101

closes https://github.com/official-stockfish/Stockfish/pull/3492

Bench: 3856635
2021-05-22 21:09:22 +02:00
Fanael Linithien
038487f954 Use packed 32-bit MMX operations for updating the PSQT accumulator
This improves the speed of NNUE by a bit on old hardware that code path
is intended for, like a Pentium III 1.13 GHz:

10 repeats of "./stockfish bench 16 1 13 default depth NNUE":

Before:
54 642 504 897 cycles (± 0.12%)
62 301 937 829 instructions (± 0.03%)

After:
54 320 821 928 cycles (± 0.13%)
62 084 742 699 instructions (± 0.02%)

Speed of go depth 20 from startpos:

Before: 53103 nps
After: 53856 nps

closes https://github.com/official-stockfish/Stockfish/pull/3476

No functional change.
2021-05-19 19:34:44 +02:00
Tomasz Sobczyk
e8d64af123 New NNUE architecture and net
Introduces a new NNUE network architecture and associated network parameters,
as obtained by a new pytorch trainer.

The network is already very strong at short TC, without regression at longer TC,
and has potential for further improvements.

https://tests.stockfishchess.org/tests/view/60a159c65085663412d0921d
TC: 10s+0.1s, 1 thread
ELO: 21.74 +-3.4 (95%) LOS: 100.0%
Total: 10000 W: 1559 L: 934 D: 7507
Ptnml(0-2): 38, 701, 2972, 1176, 113

https://tests.stockfishchess.org/tests/view/60a187005085663412d0925b
TC: 60s+0.6s, 1 thread
ELO: 5.85 +-1.7 (95%) LOS: 100.0%
Total: 20000 W: 1381 L: 1044 D: 17575
Ptnml(0-2): 27, 885, 7864, 1172, 52

https://tests.stockfishchess.org/tests/view/60a2beede229097940a03806
TC: 20s+0.2s, 8 threads
LLR: 2.93 (-2.94,2.94) <0.50,3.50>
Total: 34272 W: 1610 L: 1452 D: 31210
Ptnml(0-2): 30, 1285, 14350, 1439, 32

https://tests.stockfishchess.org/tests/view/60a2d687e229097940a03c72
TC: 60s+0.6s, 8 threads
LLR: 2.94 (-2.94,2.94) <-2.50,0.50>
Total: 45544 W: 1262 L: 1214 D: 43068
Ptnml(0-2): 12, 1129, 20442, 1177, 12

The network has been trained (by vondele) using the https://github.com/glinscott/nnue-pytorch/ trainer (started by glinscott),
specifically the branch https://github.com/Sopel97/nnue-pytorch/tree/experiment_56.
The data used are in 64 billion positions (193GB total) generated and scored with the current master net
d8: https://drive.google.com/file/d/1hOOYSDKgOOp38ZmD0N4DV82TOLHzjUiF/view?usp=sharing
d9: https://drive.google.com/file/d/1VlhnHL8f-20AXhGkILujnNXHwy9T-MQw/view?usp=sharing
d10: https://drive.google.com/file/d/1ZC5upzBYMmMj1gMYCkt6rCxQG0GnO3Kk/view?usp=sharing
fishtest_d9: https://drive.google.com/file/d/1GQHt0oNgKaHazwJFTRbXhlCN3FbUedFq/view?usp=sharing

This network also contains a few architectural changes with respect to the current master:

    Size changed from 256x2-32-32-1 to 512x2-16-32-1
        ~15-20% slower
        ~2x larger
        adds a special path for 16 valued ClippedReLU
        fixes affine transform code for 16 inputs/outputs, buy using InputDimensions instead of PaddedInputDimensions
            this is safe now because the inputs are processed in groups of 4 in the current affine transform code
    The feature set changed from HalfKP to HalfKAv2
        Includes information about the kings like HalfKA
        Packs king features better, resulting in 8% size reduction compared to HalfKA
    The board is flipped for the black's perspective, instead of rotated like in the current master
    PSQT values for each feature
        the feature transformer now outputs a part that is fowarded directly to the output and allows learning piece values more directly than the previous network architecture. The effect is visible for high imbalance positions, where the current master network outputs evaluations skewed towards zero.
        8 PSQT values per feature, chosen based on (popcount(pos.pieces()) - 1) / 4
        initialized to classical material values on the start of the training
    8 subnetworks (512x2->16->32->1), chosen based on (popcount(pos.pieces()) - 1) / 4
        only one subnetwork is evaluated for any position, no or marginal speed loss

A diagram of the network is available: https://user-images.githubusercontent.com/8037982/118656988-553a1700-b7eb-11eb-82ef-56a11cbebbf2.png
A more complete description: https://github.com/glinscott/nnue-pytorch/blob/master/docs/nnue.md

closes https://github.com/official-stockfish/Stockfish/pull/3474

Bench: 3806488
2021-05-18 18:06:23 +02:00
Stéphane Nicolet
f90274d8ce Small clean-ups
- Comment for Countemove pruning -> Continuation history
- Fix comment in input_slice.h
- Shorter lines in Makefile
- Comment for scale factor
- Fix comment for pinners in see_ge()
- Change Thread.id() signature to size_t
- Trailing space in reprosearch.sh
- Add Douglas Matos Gomes to the AUTHORS file
- Introduce comment for undo_null_move()
- Use Stockfish coding style for export_net()
- Change date in AUTHORS file

closes https://github.com/official-stockfish/Stockfish/pull/3416

No functional change
2021-05-17 10:47:14 +02:00
Tomasz Sobczyk
58054fd0fa Exporting the currently loaded network file
This PR adds an ability to export any currently loaded network.
The export_net command now takes an optional filename parameter.
If the loaded net is not the embedded net the filename parameter is required.

Two changes were required to support this:

* the "architecture" string, which is really just a some kind of description in the net, is now saved into netDescription on load and correctly saved on export.
* the AffineTransform scrambles weights for some architectures and sparsifies them, such that retrieving the index is hard. This is solved by having a temporary scrambled<->unscrambled index lookup table when loading the network, and the actual index is saved for each individual weight that makes it to canSaturate16. This increases the size of the canSaturate16 entries by 6 bytes.

closes https://github.com/official-stockfish/Stockfish/pull/3456

No functional change
2021-05-11 19:36:11 +02:00
Tomasz Sobczyk
b748b46714 Cleanup and simplify NNUE code.
A lot of optimizations happend since the NNUE was introduced
and since then some parts of the code were left unused. This
got to the point where asserts were have to be made just to
let people know that modifying something will not have any
effects or may even break everything due to the assumptions
being made. Removing these parts removes those inexisting
"false dependencies". Additionally:

 * append_changed_indices now takes the king pos and stateinfo
   explicitly, no more misleading pos parameter
 * IndexList is removed in favor of a generic ValueList.
   Feature transformer just instantiates the type it needs.
 * The update cost and refresh requirement is deferred to the
   feature set once again, but now doesn't go through the whole
   FeatureSet machinery and just calls HalfKP directly.
 * accumulator no longer has a singular dimension.
 * The PS constants and the PieceSquareIndex array are made local
   to the HalfKP feature set because they are specific to it and
   DO differ for other feature sets.
 * A few names are changed to more descriptive

Passed STC non-regression:
https://tests.stockfishchess.org/tests/view/608421dd95e7f1852abd2790
LLR: 2.95 (-2.94,2.94) <-2.50,0.50>
Total: 180008 W: 16186 L: 16258 D: 147564
Ptnml(0-2): 587, 12593, 63725, 12503, 596

closes https://github.com/official-stockfish/Stockfish/pull/3441

No functional change
2021-04-25 13:16:30 +02:00
Tomasz Sobczyk
fbbd4adc3c Unify naming convention of the NNUE code
matches the rest of the stockfish code base

closes https://github.com/official-stockfish/Stockfish/pull/3437

No functional change
2021-04-24 12:49:29 +02:00
Tomasz Sobczyk
255514fb29 Documentation patch: AppendChangedIndices
Clarify the assumptions on the position passed to the AppendChangedIndices().

Closes https://github.com/official-stockfish/Stockfish/pull/3428

No functional change
2021-04-15 12:21:30 +02:00
Stéphane Nicolet
83eac08e75 Small cleanups (march 2021)
With help of @BM123499, @mstembera, @gvreuls, @noobpwnftw and @Fanael
Thanks!

Closes https://github.com/official-stockfish/Stockfish/pull/3405

No functional change
2021-03-24 17:11:06 +01:00
Guy Vreuls
ec42154ef2 Use reference instead of pointer for pop_lsb() signature
This patch changes the pop_lsb() signature from Square pop_lsb(Bitboard*) to
Square pop_lsb(Bitboard&). This is more idomatic for C++ style signatures.

Passed a non-regression STC test:
LLR: 2.93 (-2.94,2.94) {-1.25,0.25}
Total: 21280 W: 1928 L: 1847 D: 17505
Ptnml(0-2): 71, 1427, 7558, 1518, 66
https://tests.stockfishchess.org/tests/view/6053a1e22433018de7a38e2f

We have verified that the generated binary is identical on gcc-10.

Closes https://github.com/official-stockfish/Stockfish/pull/3404

No functional change.
2021-03-19 20:28:57 +01:00
Dieter Dobbelaere
7ffae17f85 Add Stockfish namespace.
fixes #3350 and is a small cleanup that might make it easier to use SF
in separate projects, like a NNUE trainer or similar.

closes https://github.com/official-stockfish/Stockfish/pull/3370

No functional change.
2021-03-07 14:26:54 +01:00
MaximMolchanov
303713b560 Affine transform robust implementation
Size of the weights in the last layer is less than 512 bits. It leads to wrong data access for AVX512. There is no error because in current implementation it is guaranteed that there is an array of zeros after weights so zero multiplied by something is returned and sum is correct. It is a mistake that can lead to unexpected bugs in the future. Used AVX2 instructions for smaller input size.

No measurable slowdown on avx512.

closes https://github.com/official-stockfish/Stockfish/pull/3298

No functional change.
2021-01-11 18:54:18 +01:00