1
0
Fork 0
mirror of https://github.com/sockspls/badfish synced 2025-04-30 08:43:09 +00:00
Commit graph

182 commits

Author SHA1 Message Date
cj5716
d6a5c2b085 Small formatting improvements
Changes some C style casts to C++ style, and fixes some incorrect comments and variable names.

closes #4845

No functional change
2023-10-24 17:42:13 +02:00
Disservin
a105978bbd remove blank line between function and it's description
- remove the blank line between the declaration of the function and it's
  comment, leads to better IDE support when hovering over a function to see it's
  description
- remove the unnecessary duplication of the function name in the functions
  description
- slightly refactored code for lsb, msb in bitboard.h There are still a few
  things we can be improved later on, move the description of a function where
  it was declared (instead of implemented) and add descriptions to functions
  which are behind macros ifdefs

closes https://github.com/official-stockfish/Stockfish/pull/4840

No functional change
2023-10-23 20:39:48 +02:00
Disservin
2d0237db3f add clang-format
This introduces clang-format to enforce a consistent code style for Stockfish.

Having a documented and consistent style across the code will make contributing easier
for new developers, and will make larger changes to the codebase easier to make.

To facilitate formatting, this PR includes a Makefile target (`make format`) to format the code,
this requires clang-format (version 17 currently) to be installed locally.

Installing clang-format is straightforward on most OS and distros
(e.g. with https://apt.llvm.org/, brew install clang-format, etc), as this is part of quite commonly
used suite of tools and compilers (llvm / clang).

Additionally, a CI action is present that will verify if the code requires formatting,
and comment on the PR as needed. Initially, correct formatting is not required, it will be
done by maintainers as part of the merge or in later commits, but obviously this is encouraged.

fixes https://github.com/official-stockfish/Stockfish/issues/3608
closes https://github.com/official-stockfish/Stockfish/pull/4790

Co-Authored-By: Joost VandeVondele <Joost.VandeVondele@gmail.com>
2023-10-22 16:06:27 +02:00
mstembera
d3d0c69dc1 Remove outdated Tile naming.
cleanup variable naming after  #4816

closes #4833

No functional change
2023-10-21 10:28:55 +02:00
FauziAkram
edb4ab924f Standardize Comments
use double slashes (//) only for comments.

closes #4820

No functional change.
2023-10-21 10:25:03 +02:00
mstembera
c17a657b04 Optimize the most common update accumalator cases w/o tiling
In the most common case where we only update a single state
it's faster to not use temporary accumulation registers and tiling.
(Also includes a couple of small cleanups.)

passed STC
https://tests.stockfishchess.org/tests/view/651918e3cff46e538ee0023b
LLR: 2.95 (-2.94,2.94) <0.00,2.00>
Total: 34944 W: 8989 L: 8687 D: 17268
Ptnml(0-2): 88, 3743, 9512, 4037, 92

A simpler version
https://tests.stockfishchess.org/tests/view/65190dfacff46e538ee00155
also passed but this version is stronger still
https://tests.stockfishchess.org/tests/view/6519b95fcff46e538ee00fa2

closes https://github.com/official-stockfish/Stockfish/pull/4816

No functional change
2023-10-08 07:42:39 +02:00
mstembera
8a912951de Remove handcrafted MMX code
too small a benefit to maintain this old target

closes https://github.com/official-stockfish/Stockfish/pull/4804

No functional change
2023-10-08 07:37:01 +02:00
Sebastian Buchwald
4f0fecad8a Use C++17 variable templates for type traits
The C++17 variable templates are slightly more readable and allow us to
remove the typename keyword in a few cases.

closes https://github.com/official-stockfish/Stockfish/pull/4806

No functional change
2023-09-29 22:22:40 +02:00
Linmiao Xu
70ba9de85c Update NNUE architecture to SFNNv8: L1-2560 nn-ac1dbea57aa3.nnue
Creating this net involved:
- a 6-stage training process from scratch. The datasets used in stages 1-5 were fully minimized.
- permuting L1 weights with https://github.com/official-stockfish/nnue-pytorch/pull/254

A strong epoch after each training stage was chosen for the next. The 6 stages were:

```
1. 400 epochs, lambda 1.0, default LR and gamma
   UHOx2-wIsRight-multinet-dfrc-n5000 (135G)
     nodes5000pv2_UHO.binpack
     data_pv-2_diff-100_nodes-5000.binpack
     wrongIsRight_nodes5000pv2.binpack
     multinet_pv-2_diff-100_nodes-5000.binpack
     dfrc_n5000.binpack

2. 800 epochs, end-lambda 0.75, LR 4.375e-4, gamma 0.995, skip 12
   LeelaFarseer-T78juntoaugT79marT80dec.binpack (141G)
     T60T70wIsRightFarseerT60T74T75T76.binpack
     test78-junjulaug2022-16tb7p.no-db.min.binpack
     test79-mar2022-16tb7p.no-db.min.binpack
     test80-dec2022-16tb7p.no-db.min.binpack

3. 800 epochs, end-lambda 0.725, LR 4.375e-4, gamma 0.995, skip 20
   leela93-v1-dfrc99-v2-T78juntosepT80jan-v6dd-T78janfebT79aprT80aprmay.min.binpack
     leela93-filt-v1.min.binpack
     dfrc99-16tb7p-filt-v2.min.binpack
     test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack
     test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack
     test78-janfeb2022-16tb7p.min.binpack
     test79-apr2022-16tb7p.min.binpack
     test80-apr2022-16tb7p.min.binpack
     test80-may2022-16tb7p.min.binpack

4. 800 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 24
   leela96-dfrc99-v2-T78juntosepT79mayT80junsepnovjan-v6dd-T80mar23-v6-T60novdecT77decT78aprmayT79aprT80may23.min.binpack
     leela96-filt-v2.min.binpack
     dfrc99-16tb7p-filt-v2.min.binpack
     test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack
     test79-may2022-16tb7p.filter-v6-dd.min.binpack
     test80-jun2022-16tb7p.filter-v6-dd.min.binpack
     test80-sep2022-16tb7p.filter-v6-dd.min.binpack
     test80-nov2022-16tb7p.filter-v6-dd.min.binpack
     test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack
     test80-mar2023-2tb7p.v6-sk16.min.binpack
     test60-novdec2021-16tb7p.min.binpack
     test77-dec2021-16tb7p.min.binpack
     test78-aprmay2022-16tb7p.min.binpack
     test79-apr2022-16tb7p.min.binpack
     test80-may2023-2tb7p.min.binpack

5. 960 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28
   Increased max-epoch to 960 near the end of the first 800 epochs
   5af11540bbfe dataset: https://github.com/official-stockfish/Stockfish/pull/4635

6. 1000 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28
   Increased max-epoch to 1000 near the end of the first 800 epochs
   1ee1aba5ed dataset: https://github.com/official-stockfish/Stockfish/pull/4782
```

L1 weights permuted with:
```bash
python3 serialize.py $nnue $nnue_permuted \
  --features=HalfKAv2_hm \
  --ft_optimize \
  --ft_optimize_data=/data/fishpack32.binpack \
  --ft_optimize_count=10000
```

Speed measurements from 100 bench runs at depth 13 with profile-build x86-64-avx2:
```
sf_base =  1329051 +/-   2224 (95%)
sf_test =  1163344 +/-   2992 (95%)
diff    =  -165706 +/-   4913 (95%)
speedup = -12.46807% +/- 0.370% (95%)
```

Training data can be found at:
https://robotmoon.com/nnue-training-data/

Local elo at 25k nodes per move (vs. L1-2048 nn-1ee1aba5ed4c.nnue)
ep959 : 16.2 +/- 2.3

Failed 10+0.1 STC:
https://tests.stockfishchess.org/tests/view/6501beee2cd016da89abab21
LLR: -2.92 (-2.94,2.94) <0.00,2.00>
Total: 13184 W: 3285 L: 3535 D: 6364
Ptnml(0-2): 85, 1662, 3334, 1440, 71

Failed 180+1.8 VLTC:
https://tests.stockfishchess.org/tests/view/6505cf9a72620bc881ea908e
LLR: -2.94 (-2.94,2.94) <0.00,2.00>
Total: 64248 W: 16224 L: 16374 D: 31650
Ptnml(0-2): 26, 6788, 18640, 6650, 20

Passed 60+0.6 th 8 VLTC SMP (STC bounds):
https://tests.stockfishchess.org/tests/view/65084a4618698b74c2e541dc
LLR: 2.95 (-2.94,2.94) <0.00,2.00>
Total: 90630 W: 23372 L: 23033 D: 44225
Ptnml(0-2): 13, 8490, 27968, 8833, 11

Passed 60+0.6 th 8 VLTC SMP:
https://tests.stockfishchess.org/tests/view/6501d45d2cd016da89abacdb
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 137804 W: 35764 L: 35276 D: 66764
Ptnml(0-2): 31, 13006, 42326, 13522, 17

closes https://github.com/official-stockfish/Stockfish/pull/4795

bench 1246812
2023-09-22 19:26:16 +02:00
mstembera
95fe2b9a9d Reduce SIMD register count from 32 to 16
in the case of avx512 and vnni512 archs.

Up to 17% speedup, depending on the compiler, e.g.

```
AMD pro 7840u (zen4 phoenix apu 4nm)
bash bench_parallel.sh ./stockfish_avx512_gcc13 ./stockfish_avx512_pr_gcc13 20 10
sf_base =  1077737 +/-   8446 (95%)
sf_test =  1264268 +/-   8543 (95%)
diff    =   186531 +/-   4280 (95%)
speedup =  17.308% +/- 0.397% (95%)
```

Prior to this patch, it appears gcc spills registers.

closes https://github.com/official-stockfish/Stockfish/pull/4796

No functional change
2023-09-22 19:15:34 +02:00
cj5716
fce4cc1829 Make casting styles consistent
Make casting styles consistent with the rest of the code.

closes https://github.com/official-stockfish/Stockfish/pull/4793

No functional change
2023-09-22 19:14:29 +02:00
mstembera
97f706ecc1 Sparse impl of affine_transform_non_ssse3()
deal with the general case

About a 8.6% speedup (for general arch)

Results for 200 tests for each version:

            Base      Test      Diff
    Mean    141741    153998    -12257
    StDev   2990      3042      3742

p-value: 0.999
speedup: 0.086

closes https://github.com/official-stockfish/Stockfish/pull/4786

No functional change
2023-09-22 19:03:47 +02:00
Tomasz Sobczyk
1461d861c8 Prevent usage of AVX-512 for the last layer.
Add more static checks regarding the SIMD width match.

STC: https://tests.stockfishchess.org/tests/view/64f5c568a9bc5a78c669e70e
LLR: 2.95 (-2.94,2.94) <-1.75,0.25>
Total: 125216 W: 31756 L: 31636 D: 61824
Ptnml(0-2): 327, 13993, 33848, 14113, 327

Fixes a bug introduced in 2f2f45f, where with AVX-512 the weights and input to
the last layer were being read out of bounds. Now AVX-512 is only used for the
layers it can be used for. Additional static assertions have been added to
prevent more errors like this in the future.

closes https://github.com/official-stockfish/Stockfish/pull/4773

No functional change
2023-09-11 22:11:30 +02:00
Disservin
3c0e86a91e Cleanup includes
Reorder a few includes, include "position.h" where it was previously missing
and apply include-what-you-use suggestions. Also make the order of the includes
consistent, in the following way:

1. Related header (for .cpp files)
2. A blank line
3. C/C++ headers
4. A blank line
5. All other header files

closes https://github.com/official-stockfish/Stockfish/pull/4763
fixes https://github.com/official-stockfish/Stockfish/issues/4707

No functional change
2023-09-03 08:24:51 +02:00
Gian-Carlo Pascutto
c6f62363a6 Simplify Square Clipped ReLU code.
Squared numbers are never negative, so barring any wraparound there
is no need to clamp to 0. From reading the code, there's no obvious
way to get wraparound, so the entire operation can be simplified
away. Updated original truncated code comments to be sensible.

Verified by running ./stockfish bench 128 1 24 and by the following test:

STC: https://tests.stockfishchess.org/tests/view/64da4db95b17f7c21c0eabe7
LLR: 2.94 (-2.94,2.94) <-1.75,0.25>
Total: 60224 W: 15425 L: 15236 D: 29563
Ptnml(0-2): 195, 6576, 16382, 6763, 196

closes https://github.com/official-stockfish/Stockfish/pull/4751

No functional change
2023-08-22 11:14:19 +02:00
Matthies
9abef246a9 Allow compilation on Raspi (for ARMv8)
Current master fails to compile for ARMv8 on Raspi cause gcc (version 10.2.1)
does not like to cast between signed and unsigned vector types. This patch
fixes it by using unsigned vector pointer for ARM to avoid implicite cast.

closes https://github.com/official-stockfish/Stockfish/pull/4752

No functional change
2023-08-22 10:43:51 +02:00
Joost VandeVondele
8192945870 Improve testing coverage, remove unused code
a) Add further tests to CI to cover most features. This uncovered a potential race
in case setoption was sent between two searches. As the UCI protocol requires
this sent to be went the engine is not searching, setoption now ensures that
this is the case.

b) Remove some unused code

closes https://github.com/official-stockfish/Stockfish/pull/4730

No functional change
2023-08-11 19:27:46 +02:00
AndrovT
a6d9a302b8 Implement AffineTransformSparseInput for armv8
Implements AffineTransformSparseInput layer for the NNUE evaluation
for the armv8 and armv8-dotprod architectures. We measured some nice
speed improvements via 10 runs of our benchmark:

armv8, Cortex-X1                  :   18.5% speed-up
armv8, Cortex-A76                 :   13.2% speed-up
armv8-dotprod, Cortex-X1          :   27.1% speed-up
armv8-dotprod, Cortex-A76         :   12.1% speed-up
armv8, Cortex-A72, Raspberry Pi 4 :    8.2% speed-up (thanks Torom!)

closes https://github.com/official-stockfish/Stockfish/pull/4719

No functional change
2023-08-06 21:22:37 +02:00
Stéphane Nicolet
f84eb1f3ef Improve some comments
- clarify the examples for the bench command
- typo  in search.cpp

closes https://github.com/official-stockfish/Stockfish/pull/4710

No functional change
2023-07-28 23:38:49 +02:00
mstembera
cb22520a9c Remove unused return type from propagate()
Also make two get_weight_index() static methods constexpr, for
consistency with the other static get_hash_value() method right above.
Tested for speed by user Torom (thanks).

closes https://github.com/official-stockfish/Stockfish/pull/4708

No functional change
2023-07-28 23:24:42 +02:00
mstembera
1444837887 Remove inline assembly
closes https://github.com/official-stockfish/Stockfish/pull/4698

No functional change
2023-07-19 21:39:51 +02:00
Joost VandeVondele
e8a5c64988 Consolidate to centipawns conversion
avoid doing this calculations in multiple places.

closes https://github.com/official-stockfish/Stockfish/pull/4686

No functional change
2023-07-16 15:14:50 +02:00
AndrovT
a42ab95e1f remove large input specialization
Removes unused large input specialization for dense affine transform. It has been obsolete since #4612 was merged.

closes https://github.com/official-stockfish/Stockfish/pull/4684

No functional change
2023-07-16 15:12:21 +02:00
mstembera
529d3be8e2 More simplifications and cleanup in affine_transform_sparse_input.h
closes https://github.com/official-stockfish/Stockfish/pull/4677

No functional change
2023-07-13 08:20:33 +02:00
Joost VandeVondele
af110e02ec Remove classical evaluation
since the introduction of NNUE (first released with Stockfish 12), we
have maintained the classical evaluation as part of SF in frozen form.
The idea that this code could lead to further inputs to the NN or
search did not materialize. Now, after five releases, this PR removes
the classical evaluation from SF. Even though this evaluation is
probably the best of its class, it has become unimportant for the
engine's strength, and there is little need to maintain this
code (roughly 25% of SF) going forward, or to expend resources on
trying to improve its integration in the NNUE eval.

Indeed, it had still a very limited use in the current SF, namely
for the evaluation of positions that are nearly decided based on
material difference, where the speed of the classical evaluation
outweights its inaccuracies. This impact on strength is small,
roughly 2Elo, and probably decreasing in importance as the TC grows.

Potentially, removal of this code could lead to the development of
techniques to have faster, but less accurate NN evaluation,
for certain positions.

STC
https://tests.stockfishchess.org/tests/view/64a320173ee09aa549c52157
Elo: -2.35 ± 1.1 (95%) LOS: 0.0%
Total: 100000 W: 24916 L: 25592 D: 49492
Ptnml(0-2): 287, 12123, 25841, 11477, 272
nElo: -4.62 ± 2.2 (95%) PairsRatio: 0.95

LTC
https://tests.stockfishchess.org/tests/view/64a320293ee09aa549c5215b
 Elo: -1.74 ± 1.0 (95%) LOS: 0.0%
Total: 100000 W: 25010 L: 25512 D: 49478
Ptnml(0-2): 44, 11069, 28270, 10579, 38
nElo: -3.72 ± 2.2 (95%) PairsRatio: 0.96

VLTC SMP
https://tests.stockfishchess.org/tests/view/64a3207c3ee09aa549c52168
 Elo: -1.70 ± 0.9 (95%) LOS: 0.0%
Total: 100000 W: 25673 L: 26162 D: 48165
Ptnml(0-2): 8, 9455, 31569, 8954, 14
nElo: -3.95 ± 2.2 (95%) PairsRatio: 0.95

closes https://github.com/official-stockfish/Stockfish/pull/4674

Bench: 1444646
2023-07-11 22:56:49 +02:00
mstembera
f8e65d82eb Simplify away lookup_count.
https://tests.stockfishchess.org/tests/view/64a3c1a93ee09aa549c53167
LLR: 2.94 (-2.94,2.94) <-1.75,0.25>
Total: 32832 W: 8497 L: 8280 D: 16055
Ptnml(0-2): 80, 3544, 8967, 3729, 96

closes https://github.com/official-stockfish/Stockfish/pull/4662

No functional change
2023-07-06 23:02:11 +02:00
mstembera
80564bcfcd Simplify lookup_count and clean up pieces().
https://github.com/official-stockfish/Stockfish/pull/4656

No functional change
2023-07-03 18:20:10 +02:00
Linmiao Xu
915532181f Update NNUE architecture to SFNNv7 with larger L1 size of 2048
Creating this net involved:
- a 5-step training process from scratch
- greedy permuting L1 weights with https://github.com/official-stockfish/Stockfish/pull/4620
- leb128 compression with https://github.com/glinscott/nnue-pytorch/pull/251
- greedy 2- and 3- cycle permuting with https://github.com/official-stockfish/Stockfish/pull/4640

The 5 training steps were:

1. 400 epochs, lambda 1.0, lr 9.75e-4
   UHOx2-wIsRight-multinet-dfrc-n5000-largeGensfen-d9.binpack (178G)
     nodes5000pv2_UHO.binpack
     data_pv-2_diff-100_nodes-5000.binpack
     wrongIsRight_nodes5000pv2.binpack
     multinet_pv-2_diff-100_nodes-5000.binpack
     dfrc_n5000.binpack
     large_gensfen_multipvdiff_100_d9.binpack
   ep399 chosen as start model for step2

2. 800 epochs, end-lambda 0.75, skip 16
   LeelaFarseer-T78juntoaugT79marT80dec.binpack (141G)
     T60T70wIsRightFarseerT60T74T75T76.binpack
     test78-junjulaug2022-16tb7p.no-db.min.binpack
     test79-mar2022-16tb7p.no-db.min.binpack
     test80-dec2022-16tb7p.no-db.min.binpack
   ep559 chosen as start model for step3

3. 800 epochs, end-lambda 0.725, skip 20
   leela96-dfrc99-v2-T80dectofeb-sk20-mar-v6-T77decT78janfebT79apr.binpack (223G)
     leela96-filt-v2.min.binpack
     dfrc99-16tb7p-eval-filt-v2.min.binpack
     test80-dec2022-16tb7p-filter-v6-sk20.min-mar2023.binpack
     test80-jan2023-16tb7p-filter-v6-sk20.min-mar2023.binpack
     test80-feb2023-16tb7p-filter-v6-sk20.min-mar2023.binpack
     test80-mar2023-2tb7p-filter-v6.min.binpack
     test77-dec2021-16tb7p.no-db.min.binpack
     test78-janfeb2022-16tb7p.no-db.min.binpack
     test79-apr2022-16tb7p.no-db.min.binpack
   ep499 chosen as start model for step4

4. 800 epochs, end-lambda 0.7, skip 24
   0dd1cebea57 dataset https://github.com/official-stockfish/Stockfish/pull/4606
   ep599 chosen as start model for step5

5. 800 epochs, end-lambda 0.7, skip 28
   same dataset as step4
   ep619 became nn-1b951f8b449d.nnue

For the final step5 training:

python3 easy_train.py \
  --experiment-name L1-2048-S5-sameData-sk28-S4-0dd1cebea57-shuffled-S3-leela96-dfrc99-v2-T80dectofeb-sk20-mar-v6-T77decT78janfebT79apr-sk20-S2-LeelaFarseerT78T79T80-ep399-S1-UHOx2-wIsRight-multinet-dfrc-n5000-largeGensfen-d9 \
  --training-dataset /data/leela96-dfrc99-T60novdec-v2-T80juntonovjanfebT79aprmayT78jantosepT77dec-v6dd-T80apr.binpack \
  --early-fen-skipping 28 \
  --nnue-pytorch-branch linrock/nnue-pytorch/misc-fixes-L1-2048 \
  --engine-test-branch linrock/Stockfish/L1-2048 \
  --start-from-engine-test-net False \
  --start-from-model /data/experiments/experiment_L1-2048-S4-0dd1cebea57-shuffled-S3-leela96-dfrc99-v2-T80dectofeb-sk20-mar-v6-T77decT78janfebT79apr-sk20-S2-LeelaFarseerT78T79T80-ep399-S1-UHOx2-wIsRight-multinet-dfrc-n5000-largeGensfen-d9/training/run_0/nn-epoch599.nnue
  --max_epoch 800 \
  --lr 4.375e-4 \
  --gamma 0.995 \
  --start-lambda 1.0 \
  --end-lambda 0.7 \
  --tui False \
  --seed $RANDOM \
  --gpus 0

SF training data components for the step1 dataset:
https://drive.google.com/drive/folders/1yLCEmioC3Xx9KQr4T7uB6GnLm5icAYGU

Leela training data for steps 2-5 can be found at:
https://robotmoon.com/nnue-training-data/

Due to larger L1 size and slower inference, the speed penalty loses elo
at STC. Measurements from 100 bench runs at depth 13 with x86-64-modern
on Intel Core i5-1038NG7 2.00GHz:

sf_base =  1240730  +/-   3443 (95%)
sf_test =  1153341  +/-   2832 (95%)
diff    =   -87388  +/-   1616 (95%)
speedup = -7.04330% +/- 0.130% (95%)

Local elo at 25k nodes per move (vs. L1-1536 nn-fdc1d0fe6455.nnue):
nn-epoch619.nnue : 21.1 +/- 3.2

Failed STC:
https://tests.stockfishchess.org/tests/view/6498ee93dc7002ce609cf979
LLR: -2.95 (-2.94,2.94) <0.00,2.00>
Total: 11680 W: 3058 L: 3299 D: 5323
Ptnml(0-2): 44, 1422, 3149, 1181, 44

LTC:
https://tests.stockfishchess.org/tests/view/649b32f5dc7002ce609d20cf
Elo: 0.68 ± 1.5 (95%) LOS: 80.5%
Total: 40000 W: 10887 L: 10809 D: 18304
Ptnml(0-2): 36, 3938, 11958, 4048, 20
nElo: 1.50 ± 3.4 (95%) PairsRatio: 1.02

Passed VLTC 180+1.8:
https://tests.stockfishchess.org/tests/view/64992b43dc7002ce609cfd20
LLR: 3.06 (-2.94,2.94) <0.00,2.00>
Total: 38086 W: 10612 L: 10338 D: 17136
Ptnml(0-2): 9, 3316, 12115, 3598, 5

Passed VLTC SMP 60+0.6 th 8:
https://tests.stockfishchess.org/tests/view/649a21fedc7002ce609d0c7d
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 38936 W: 11091 L: 10820 D: 17025
Ptnml(0-2): 1, 2948, 13305, 3207, 7

closes https://github.com/official-stockfish/Stockfish/pull/4646

Bench: 2505168
2023-07-01 13:34:30 +02:00
Stéphane Nicolet
e355c70594 Document the LEB128 patch
Add some comments and harmonize style for the LEB128 patch.

closes https://github.com/official-stockfish/Stockfish/pull/4642

No functional change
2023-07-01 13:01:28 +02:00
maxim
a46087ee30 Compressed network parameters
Implemented LEB128 (de)compression for the feature transformer.
Reduces embedded network size from 70 MiB to 39 Mib.

The new nn-78bacfcee510.nnue corresponds to the master net compressed.

closes https://github.com/official-stockfish/Stockfish/pull/4617

No functional change
2023-06-19 21:37:23 +02:00
Andreas Matthies
7922e07af8 Fix for MSVC compilation.
MSVC needs two more explicit casts to compile new affine_transform_sparse_input.
See https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_castsi256_ps
and https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_castsi128_ps

closes https://github.com/official-stockfish/Stockfish/pull/4616

No functional change
2023-06-13 08:45:25 +02:00
AndrovT
38e61663d8 Use block sparse input for the first layer.
Use block sparse input for the first fully connected layer on architectures with at least SSSE3.

Depending on the CPU architecture, this yields a speedup of up to 10%, e.g.

```
Result of 100 runs of 'bench 16 1 13 default depth NNUE'

base (...ockfish-base) =     959345  +/- 7477
test (...ckfish-patch) =    1054340  +/- 9640
diff                   =     +94995  +/- 3999

speedup        = +0.0990
P(speedup > 0) =  1.0000

CPU: 8 x AMD Ryzen 7 5700U with Radeon Graphics
Hyperthreading: on
```

Passed STC:
https://tests.stockfishchess.org/tests/view/6485aa0965ffe077ca12409c
LLR: 2.93 (-2.94,2.94) <0.00,2.00>
Total: 8864 W: 2479 L: 2223 D: 4162
Ptnml(0-2): 13, 829, 2504, 1061, 25

This commit includes a net with reordered weights, to increase the likelihood of block sparse inputs,
but otherwise equivalent to the previous master net (nn-ea57bea57e32.nnue).

Activation data collected with https://github.com/AndrovT/Stockfish/tree/log-activations, running bench 16 1 13 varied_1000.epd depth NNUE on this data. Net parameters permuted with https://gist.github.com/AndrovT/9e3fbaebb7082734dc84d27e02094cb3.

closes https://github.com/official-stockfish/Stockfish/pull/4612

No functional change
2023-06-12 20:41:27 +02:00
Linmiao Xu
c1fff71650 Update NNUE architecture to SFNNv6 with larger L1 size of 1536
Created by training a new net from scratch with L1 size increased from 1024 to 1536.
Thanks to Vizvezdenec for the idea of exploring larger net sizes after recent
training data improvements.

A new net was first trained with lambda 1.0 and constant LR 8.75e-4. Then a strong net
from a later epoch in the training run was chosen for retraining with start-lambda 1.0
and initial LR 4.375e-4 decaying with gamma 0.995. Retraining was performed a total of
3 times, for this 4-step process:

1. 400 epochs, lambda 1.0 on filtered T77+T79 v6 deduplicated data
2. 800 epochs, end-lambda 0.75 on T60T70wIsRightFarseerT60T74T75T76.binpack
3. 800 epochs, end-lambda 0.75 and early-fen-skipping 28 on the master dataset
4. 800 epochs, end-lambda 0.7 and early-fen-skipping 28 on the master dataset

In the training sequence that reached the new nn-8d69132723e2.nnue net,
the epochs used for the 3x retraining runs were:

1. epoch 379 trained on T77T79-filter-v6-dd.min.binpack
2. epoch 679 trained on T60T70wIsRightFarseerT60T74T75T76.binpack
3. epoch 799 trained on the master dataset

For training from scratch:

python3 easy_train.py \
  --experiment-name new-L1-1536-T77T79-filter-v6dd \
  --training-dataset /data/T77T79-filter-v6-dd.min.binpack \
  --max_epoch 400 \
  --lambda 1.0 \
  --start-from-engine-test-net False \
  --engine-test-branch linrock/Stockfish/L1-1536 \
  --nnue-pytorch-branch linrock/Stockfish/misc-fixes-L1-1536 \
  --tui False \
  --gpus "0," \
  --seed $RANDOM

Retraining commands were similar to each other. For the 3rd retraining run:

python3 easy_train.py \
  --experiment-name L1-1536-T77T79-v6dd-Re1-LeelaFarseer-Re2-masterDataset-Re3-sameData \
  --training-dataset /data/leela96-dfrc99-v2-T60novdecT80juntonovjanfebT79aprmayT78jantosepT77dec-v6dd.binpack \
  --early-fen-skipping 28 \
  --max_epoch 800 \
  --start-lambda 1.0 \
  --end-lambda 0.7 \
  --lr 4.375e-4 \
  --gamma 0.995 \
  --start-from-engine-test-net False \
  --start-from-model /data/L1-1536-T77T79-v6dd-Re1-LeelaFarseer-Re2-masterDataset-nn-epoch799.nnue \
  --engine-test-branch linrock/Stockfish/L1-1536 \
  --nnue-pytorch-branch linrock/nnue-pytorch/misc-fixes-L1-1536 \
  --tui False \
  --gpus "0," \
  --seed $RANDOM

The T77+T79 data used is a subset of the master dataset available at:
https://robotmoon.com/nnue-training-data/

T60T70wIsRightFarseerT60T74T75T76.binpack is available at:
https://drive.google.com/drive/folders/1S9-ZiQa_3ApmjBtl2e8SyHxj4zG4V8gG

Local elo at 25k nodes per move vs. nn-e1fb1ade4432.nnue (L1 size 1024):
nn-epoch759.nnue : 26.9 +/- 1.6

Failed STC
https://tests.stockfishchess.org/tests/view/64742485d29264e4cfa75f97
LLR: -2.94 (-2.94,2.94) <0.00,2.00>
Total: 13728 W: 3588 L: 3829 D: 6311
Ptnml(0-2): 71, 1661, 3610, 1482, 40

Failing LTC
https://tests.stockfishchess.org/tests/view/64752d7c4a36543c4c9f3618
LLR: -1.91 (-2.94,2.94) <0.50,2.50>
Total: 35424 W: 9522 L: 9603 D: 16299
Ptnml(0-2): 24, 3579, 10585, 3502, 22

Passed VLTC 180+1.8
https://tests.stockfishchess.org/tests/view/64752df04a36543c4c9f3638
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 47616 W: 13174 L: 12863 D: 21579
Ptnml(0-2): 13, 4261, 14952, 4566, 16

Passed VLTC SMP 60+0.6 th 8
https://tests.stockfishchess.org/tests/view/647446ced29264e4cfa761e5
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 19942 W: 5694 L: 5451 D: 8797
Ptnml(0-2): 6, 1504, 6707, 1749, 5

closes https://github.com/official-stockfish/Stockfish/pull/4593

bench 2222567
2023-05-31 08:51:22 +02:00
Maxim Masiutin
2f2f45f9f4 Made two specializations for affine transform easier to understand.
Added AVX-512 for the specialization for small inputs

closes https://github.com/official-stockfish/Stockfish/pull/4502

No functional change
2023-04-10 09:29:52 +02:00
Sebastian Buchwald
43108a6198 Reuse existing functions to read/write array of network parameters
closes https://github.com/official-stockfish/Stockfish/pull/4463

No functional change
2023-03-29 21:36:27 +02:00
MinetaS
587bc647d7 Remove non_pawn_material in NNUE::evaluate
After "Use NNUE complexity in search, retune related parameters" commit,
the effect of non-pawn material adjustment has been nearly diminished.
This patch removes pos.non_pawn_material as a simplification, which
passed non-regression tests with both STC and LTC.

Passed non-regression STC:
LLR: 2.95 (-2.94,2.94) <-1.75,0.25>
Total: 75152 W: 20030 L: 19856 D: 35266
Ptnml(0-2): 215, 8281, 20459, 8357, 264
https://tests.stockfishchess.org/tests/view/641ab471db43ab2ba6f7bc58

Passed non-regression LTC:
LLR: 2.94 (-2.94,2.94) <-1.75,0.25>
Total: 193864 W: 51870 L: 51829 D: 90165
Ptnml(0-2): 86, 18968, 58794, 18987, 97
https://tests.stockfishchess.org/tests/view/641b4fe6db43ab2ba6f7db96

closes https://github.com/official-stockfish/Stockfish/pull/4461

Bench: 5020718
2023-03-25 09:25:49 +01:00
disservin
af4b62a593 NNUE namespace cleanup
This patch moves the nnue namespace in the appropiate header that correspondes with the definition.
It also makes navigation a bit easier.

closes https://github.com/official-stockfish/Stockfish/pull/4445

No functional change
2023-03-19 11:27:15 +01:00
pb00067
f0556dcbe3 Small cleanups
remove some unneeded assignments, typos, incorrect comments, add authors entry.

closes https://github.com/official-stockfish/Stockfish/pull/4417

no functional change
2023-03-14 08:38:02 +01:00
Sebastian Buchwald
564456a6a8 Unify type alias declarations
The commit unifies the declaration of type aliases by replacing all
typedefs with corresponding using statements.

closing https://github.com/official-stockfish/Stockfish/pull/4412

No functional change
2023-02-27 08:29:47 +01:00
Sebastian Buchwald
29b5ad5dea Fix typo in method name
closes https://github.com/official-stockfish/Stockfish/pull/4404

No functional change
2023-02-24 20:12:53 +01:00
Joost VandeVondele
08385527dd Introduce a function to compute NNUE accumulator
This patch introduces `hint_common_parent_position()` to signal that potentially several child nodes will require an NNUE eval. By populating explicitly the accumulator, these subsequent evaluations can be performed more efficiently.

This was based on the observation that calculating the evaluation in an excluded move position yielded a significant Elo gain, even though the evaluation itself was already available (work by pb00067).

Sopel wrote the code to perform just the accumulator update. This PR is based on cleaned up code that

passed STC:
https://tests.stockfishchess.org/tests/view/63f62f9be74a12625bcd4aa0
 LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 110368 W: 29607 L: 29167 D: 51594
Ptnml(0-2): 41, 10551, 33572, 10967, 53

and in an the earlier (equivalent) version

passed STC:
https://tests.stockfishchess.org/tests/view/63f3c3fee74a12625bcce2a6
LLR: 2.95 (-2.94,2.94) <0.00,2.00>
Total: 47552 W: 12786 L: 12467 D: 22299
Ptnml(0-2): 120, 5107, 12997, 5438, 114

passed LTC:
https://tests.stockfishchess.org/tests/view/63f45cc2e74a12625bccfa63
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 110368 W: 29607 L: 29167 D: 51594
Ptnml(0-2): 41, 10551, 33572, 10967, 53

closes https://github.com/official-stockfish/Stockfish/pull/4402

Bench: 3726250
2023-02-23 13:25:35 +01:00
Sebastian Buchwald
77dfcbedce Remove unused macros
closes https://github.com/official-stockfish/Stockfish/pull/4397

No functional change
2023-02-23 13:24:37 +01:00
Sebastian Buchwald
b4ad3a3c4b Add support for ARM dot product instructions
The sdot instruction computes (and accumulates) a signed dot product,
which is quite handy for Stockfish's NNUE code. The instruction is
optional for Armv8.2 and Armv8.3, and mandatory for Armv8.4 and above.

The commit adds a new 'arm-dotprod' architecture with enabled dot
product support. It also enables dot product support for the existing
'apple-silicon' architecture, which is at least Armv8.5.

The following local speed test was performed on an Apple M1 with
ARCH=apple-silicon. I had to remove CPU pinning from the benchmark
script. However, the results were still consistent: Checking both
binaries against themselves reported a speedup of +0.0000 and +0.0005,
respectively.

```
Result of 100 runs
==================
base (...ish.037ef3e1) =    1917997  +/- 7152
test (...fish.dotprod) =    2159682  +/- 9066
diff                   =    +241684  +/- 2923

speedup        = +0.1260
P(speedup > 0) =  1.0000

CPU: 10 x arm
Hyperthreading: off
```

Fixes #4193

closes https://github.com/official-stockfish/Stockfish/pull/4400

No functional change
2023-02-23 13:22:03 +01:00
MinetaS
2c36d1e7e7 Fix overflow in add_dpbusd_epi32x2
This patch fixes 16bit overflow in *_add_dpbusd_epi32x2 functions,
that can be triggered in rare cases depending on the NNUE weights.

While the code leads to some slowdown on affected architectures
(most notably avx2), the fix is simpler than some of the other
options discussed in
https://github.com/official-stockfish/Stockfish/pull/4394

Code suggested by Sopel97.

Result of "bench 4096 1 30 default depth nnue":

| Architecture        | master    | patch (gcc) | patch (clang) |
|---------------------|-----------|-------------|---------------|
| x86-64-vnni512      | 762122798 | 762122798   | 762122798     |
| x86-64-avx512       | 769723503 | 762122798   | 762122798     |
| x86-64-bmi2         | 769723503 | 762122798   | 762122798     |
| x86-64-ssse3        | 769723503 | 762122798   | 762122798     |
| x86-64              | 762122798 | 762122798   | 762122798     |

Following architectures will experience ~4% slowdown due to an
additional instruction in the middle of hot path:

* x86-64-avx512
* x86-64-bmi2
* x86-64-avx2
* x86-64-sse41-popcnt (x86-64-modern)
* x86-64-ssse3
* x86-32-sse41-popcnt

This patch clearly loses Elo against master with both STC and LTC.

Failed non-regression STC (256bit fix only):
LLR: -2.95 (-2.94,2.94) <-1.75,0.25>
Total: 33528 W: 8769 L: 9049 D: 15710
Ptnml(0-2): 96, 3616, 9600, 3376, 76
https://tests.stockfishchess.org/tests/view/63e6a5b44299542b1e26a485

60+0.6 @ 30000 games:
Elo: -1.67 +-1.7 (95%) LOS: 2.8%
Total: 30000 W: 7848 L: 7992 D: 14160
Ptnml(0-2): 12, 2847, 9436, 2683, 22
nElo: -3.84 +-3.9 (95%) PairsRatio: 0.95
https://tests.stockfishchess.org/tests/view/63e7ac716d0e1db55f35a660

However, a test against nn-a3dc078bafc7.nnue, which is the latest "safe"
network not causing the bug, passed with regular bounds.

Passed STC:
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 160456 W: 42658 L: 42175 D: 75623
Ptnml(0-2): 487, 17638, 43469, 18173, 461
https://tests.stockfishchess.org/tests/view/63e89836d62a5d02b0fa82c8

closes https://github.com/official-stockfish/Stockfish/pull/4391
closes https://github.com/official-stockfish/Stockfish/pull/4394

No functional change
2023-02-18 13:23:18 +01:00
Sebastian Buchwald
2f67409506 Remove redundant const qualifiers
The const qualifiers are already implied by the constexpr qualifiers.

closes https://github.com/official-stockfish/Stockfish/pull/4359

No functional change
2023-01-28 16:49:27 +01:00
Sebastian Buchwald
2167942b6e Simplify functions to read/write network parameters
closes https://github.com/official-stockfish/Stockfish/pull/4358

No functional change
2023-01-28 16:47:52 +01:00
Sebastian Buchwald
da5bcec481 Fix asm modifiers in add_dpbusd_epi32x2 implementations
The accumulator should be an earlyclobber because it is written before
all input operands are read. Otherwise, the asm code computes a wrong
result if the accumulator shares a register with one of the other input
operands (which happens if we pass in the same expression for the
accumulator and the operand).

Closes https://github.com/official-stockfish/Stockfish/pull/4339

No functional change
2023-01-22 10:51:02 +01:00
Sebastian Buchwald
4f4e652eca Avoid unnecessary string copies
closes https://github.com/official-stockfish/Stockfish/pull/4326

also fixes typo, closes https://github.com/official-stockfish/Stockfish/pull/4332

No functional change
2023-01-09 20:32:58 +01:00
Sebastian Buchwald
e9e7a7b83f Replace some std::string occurrences with std::string_view
std::string_view is more lightweight than std::string. Furthermore,
std::string_view variables can be declared constexpr.

closes https://github.com/official-stockfish/Stockfish/pull/4328

No functional change
2023-01-09 20:28:24 +01:00
Stefano Di Martino
5a88c5bb9b Modernize code base a little bit
Removed sprintf() which generated a warning, because of security reasons.
Replace NULL with nullptr
Replace typedef with using
Do not inherit from std::vector. Use composition instead.
optimize mutex-unlocking

closes https://github.com/official-stockfish/Stockfish/pull/4327

No functional change
2023-01-09 20:25:13 +01:00