1
0
Fork 0
mirror of https://github.com/sockspls/badfish synced 2025-05-02 17:49:35 +00:00
Commit graph

27 commits

Author SHA1 Message Date
Viren6
0716b845fd Update NNUE architecture to SFNNv9 and net nn-ae6a388e4a1a.nnue
Part 1: PyTorch Training, linrock

Trained with a 10-stage sequence from scratch, starting in May 2023:
https://github.com/linrock/nnue-tools/blob/master/exp-sequences/3072-10stage-SFNNv9.yml

While the training methods were similar to the L1-2560 training sequence,
the last two stages introduced min-v2 binpacks,
where bestmove capture and in-check position scores were not zeroed during minimization,
for compatibility with skipping SEE >= 0 positions and future research.

Training data can be found at:
https://robotmoon.com/nnue-training-data

This net was tested at epoch 679 of the 10th training stage:
https://tests.stockfishchess.org/tests/view/65f32e460ec64f0526c48dbc

Part 2: SPSA Training, Viren6

The net was then SPSA tuned.
This consisted of the output weights (32 * 8) and biases (8)
as well as the L3 biases (32 * 8) and L2 biases (16 * 8), totalling 648 params in total.

The SPSA tune can be found here:
https://tests.stockfishchess.org/tests/view/65fc33ba0ec64f0526c512e3

With the help of Disservin , the initial weights were extracted with:
https://github.com/Viren6/Stockfish/tree/new228

The net was saved with the tuned weights using:
https://github.com/Viren6/Stockfish/tree/new241

Earlier nets of the SPSA failed STC compared to the base 3072 net of part 1:
https://tests.stockfishchess.org/tests/view/65ff356e0ec64f0526c53c98
Therefore it is suspected that the SPSA at VVLTC has
added extra scaling on top of the scaling of increasing the L1 size.

Passed VVLTC 1:
https://tests.stockfishchess.org/tests/view/6604a9020ec64f0526c583da
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 53042 W: 13554 L: 13256 D: 26232
Ptnml(0-2): 12, 5147, 15903, 5449, 10

Passed VVLTC 2:
https://tests.stockfishchess.org/tests/view/660ad1b60ec64f0526c5dd23
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 17506 W: 4574 L: 4315 D: 8617
Ptnml(0-2): 1, 1567, 5362, 1818, 5

STC Elo estimate:
https://tests.stockfishchess.org/tests/view/660b834d01aaec5069f87cb0
Elo: -7.66 ± 3.8 (95%) LOS: 0.0%
Total: 9618 W: 2440 L: 2652 D: 4526
Ptnml(0-2): 80, 1281, 2261, 1145, 42
nElo: -13.94 ± 6.9 (95%) PairsRatio: 0.87

closes https://tests.stockfishchess.org/tests/view/660b834d01aaec5069f87cb0

bench 1823302

Co-Authored-By: Linmiao Xu <lin@robotmoon.com>
2024-04-02 08:49:48 +02:00
Disservin
1a26d698de Refactor Network Usage
Continuing from PR #4968, this update improves how Stockfish handles network
usage, making it easier to manage and modify networks in the future.

With the introduction of a dedicated Network class, creating networks has become
straightforward. See uci.cpp:
```cpp
NN::NetworkBig({EvalFileDefaultNameBig, "None", ""}, NN::embeddedNNUEBig)
```

The new `Network` encapsulates all network-related logic, significantly reducing
the complexity previously required to support multiple network types, such as
the distinction between small and big networks #4915.

Non-Regression STC:
https://tests.stockfishchess.org/tests/view/65edd26c0ec64f0526c43584
LLR: 2.94 (-2.94,2.94) <-1.75,0.25>
Total: 33760 W: 8887 L: 8661 D: 16212
Ptnml(0-2): 143, 3795, 8808, 3961, 173

Non-Regression SMP STC:
https://tests.stockfishchess.org/tests/view/65ed71970ec64f0526c42fdd
LLR: 2.96 (-2.94,2.94) <-1.75,0.25>
Total: 59088 W: 15121 L: 14931 D: 29036
Ptnml(0-2): 110, 6640, 15829, 6880, 85

Compiled with `make -j profile-build`
```
bash ./bench_parallel.sh ./stockfish ./stockfish-nnue 13 50

sf_base =  1568540 +/-   7637 (95%)
sf_test =  1573129 +/-   7301 (95%)
diff    =     4589 +/-   8720 (95%)
speedup = 0.29260% +/- 0.556% (95%)
```

Compiled with `make -j build`
```
bash ./bench_parallel.sh ./stockfish ./stockfish-nnue 13 50

sf_base =  1472653 +/-   7293 (95%)
sf_test =  1491928 +/-   7661 (95%)
diff    =    19275 +/-   7154 (95%)
speedup = 1.30886% +/- 0.486% (95%)
```

closes https://github.com/official-stockfish/Stockfish/pull/5100

No functional change
2024-03-12 16:41:08 +01:00
Disservin
99cdb920fc Cleanup Evalfile handling
This cleans up the EvalFile handling after the merge of #4915,
which has become a bit confusing on what it is actually doing.

closes https://github.com/official-stockfish/Stockfish/pull/4971

No functional change
2024-01-08 18:33:38 +01:00
Linmiao Xu
584d9efedc Dual NNUE with L1-128 smallnet
Credit goes to @mstembera for:
- writing the code enabling dual NNUE:
  https://github.com/official-stockfish/Stockfish/pull/4898
- the idea of trying L1-128 trained exclusively on high simple eval
  positions

The L1-128 smallnet is:
- epoch 399 of a single-stage training from scratch
- trained only on positions from filtered data with high material
  difference
  - defined by abs(simple_eval) > 1000

```yaml
experiment-name: 128--S1-only-hse-v2

training-dataset:
  - /data/hse/S3/dfrc99-16tb7p-eval-filt-v2.min.high-simple-eval-1k.binpack
  - /data/hse/S3/leela96-filt-v2.min.high-simple-eval-1k.binpack
  - /data/hse/S3/test80-apr2022-16tb7p.min.high-simple-eval-1k.binpack

  - /data/hse/S7/test60-2020-2tb7p.v6-3072.high-simple-eval-1k.binpack
  - /data/hse/S7/test60-novdec2021-12tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack

  - /data/hse/S7/test77-nov2021-2tb7p.v6-3072.min.high-simple-eval-1k.binpack
  - /data/hse/S7/test77-dec2021-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack
  - /data/hse/S7/test77-jan2022-2tb7p.high-simple-eval-1k.binpack

  - /data/hse/S7/test78-jantomay2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack
  - /data/hse/S7/test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack

  - /data/hse/S7/test79-apr2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack
  - /data/hse/S7/test79-may2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack

  # T80 2022
  - /data/hse/S7/test80-may2022-16tb7p.high-simple-eval-1k.binpack
  - /data/hse/S7/test80-jun2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack
  - /data/hse/S7/test80-jul2022-16tb7p.v6-dd.min.high-simple-eval-1k.binpack
  - /data/hse/S7/test80-aug2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack
  - /data/hse/S7/test80-sep2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack
  - /data/hse/S7/test80-oct2022-16tb7p.v6-dd.high-simple-eval-1k.binpack
  - /data/hse/S7/test80-nov2022-16tb7p-v6-dd.min.high-simple-eval-1k.binpack

  # T80 2023
  - /data/hse/S7/test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack
  - /data/hse/S7/test80-feb2023-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack
  - /data/hse/S7/test80-mar2023-2tb7p.v6-sk16.min.high-simple-eval-1k.binpack
  - /data/hse/S7/test80-apr2023-2tb7p-filter-v6-sk16.min.high-simple-eval-1k.binpack
  - /data/hse/S7/test80-may2023-2tb7p.v6.min.high-simple-eval-1k.binpack
  - /data/hse/S7/test80-jun2023-2tb7p.v6-3072.min.high-simple-eval-1k.binpack
  - /data/hse/S7/test80-jul2023-2tb7p.v6-3072.min.high-simple-eval-1k.binpack
  - /data/hse/S7/test80-aug2023-2tb7p.v6.min.high-simple-eval-1k.binpack
  - /data/hse/S7/test80-sep2023-2tb7p.high-simple-eval-1k.binpack
  - /data/hse/S7/test80-oct2023-2tb7p.high-simple-eval-1k.binpack

start-from-engine-test-net: False

nnue-pytorch-branch: linrock/nnue-pytorch/L1-128
engine-test-branch: linrock/Stockfish/L1-128-nolazy
engine-base-branch: linrock/Stockfish/L1-128

num-epochs: 500
lambda: 1.0
```

Experiment yaml configs converted to easy_train.sh commands with:
https://github.com/linrock/nnue-tools/blob/4339954/yaml_easy_train.py

Binpacks interleaved at training time with:
https://github.com/official-stockfish/nnue-pytorch/pull/259

Data filtered for high simple eval positions with:
https://github.com/linrock/nnue-data/blob/32d6a68/filter_high_simple_eval_plain.py
https://github.com/linrock/Stockfish/blob/61dbfe/src/tools/transform.cpp#L626-L655

Training data can be found at:
https://robotmoon.com/nnue-training-data/

Local elo at 25k nodes per move of
L1-128 smallnet (nnue-only eval) vs. L1-128 trained on standard S1 data:
nn-epoch399.nnue : -318.1 +/- 2.1

Passed STC:
https://tests.stockfishchess.org/tests/view/6574cb9d95ea6ba1fcd49e3b
LLR: 2.93 (-2.94,2.94) <0.00,2.00>
Total: 62432 W: 15875 L: 15521 D: 31036
Ptnml(0-2): 177, 7331, 15872, 7633, 203

Passed LTC:
https://tests.stockfishchess.org/tests/view/6575da2d4d789acf40aaac6e
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 64830 W: 16118 L: 15738 D: 32974
Ptnml(0-2): 43, 7129, 17697, 7497, 49

closes https://github.com/official-stockfish/Stockfish/pulls

Bench: 1330050

Co-Authored-By: mstembera <5421953+mstembera@users.noreply.github.com>
2024-01-07 21:15:52 +01:00
FauziAkram
8b4583bce7 Remove redundant int cast
Remove a redundant int cast in the calculation of fwdOut. The variable
OutputType is already defined as std::int32_t, which is an integer type, making
the cast unnecessary.

closes https://github.com/official-stockfish/Stockfish/pull/4961

No functional change
2024-01-04 15:56:53 +01:00
Disservin
444f03ee95 Update copyright year
closes https://github.com/official-stockfish/Stockfish/pull/4954

No functional change
2024-01-04 15:47:10 +01:00
Joost VandeVondele
ec02714b62 Cleanup comments and some code reorg.
passed STC:
https://tests.stockfishchess.org/tests/view/6536dc7dcc309ae83955b04d
LLR: 2.93 (-2.94,2.94) <-1.75,0.25>
Total: 58048 W: 14693 L: 14501 D: 28854
Ptnml(0-2): 200, 6399, 15595, 6669, 161

closes https://github.com/official-stockfish/Stockfish/pull/4846

No functional change
2023-10-24 17:43:05 +02:00
Disservin
2d0237db3f add clang-format
This introduces clang-format to enforce a consistent code style for Stockfish.

Having a documented and consistent style across the code will make contributing easier
for new developers, and will make larger changes to the codebase easier to make.

To facilitate formatting, this PR includes a Makefile target (`make format`) to format the code,
this requires clang-format (version 17 currently) to be installed locally.

Installing clang-format is straightforward on most OS and distros
(e.g. with https://apt.llvm.org/, brew install clang-format, etc), as this is part of quite commonly
used suite of tools and compilers (llvm / clang).

Additionally, a CI action is present that will verify if the code requires formatting,
and comment on the PR as needed. Initially, correct formatting is not required, it will be
done by maintainers as part of the merge or in later commits, but obviously this is encouraged.

fixes https://github.com/official-stockfish/Stockfish/issues/3608
closes https://github.com/official-stockfish/Stockfish/pull/4790

Co-Authored-By: Joost VandeVondele <Joost.VandeVondele@gmail.com>
2023-10-22 16:06:27 +02:00
Linmiao Xu
70ba9de85c Update NNUE architecture to SFNNv8: L1-2560 nn-ac1dbea57aa3.nnue
Creating this net involved:
- a 6-stage training process from scratch. The datasets used in stages 1-5 were fully minimized.
- permuting L1 weights with https://github.com/official-stockfish/nnue-pytorch/pull/254

A strong epoch after each training stage was chosen for the next. The 6 stages were:

```
1. 400 epochs, lambda 1.0, default LR and gamma
   UHOx2-wIsRight-multinet-dfrc-n5000 (135G)
     nodes5000pv2_UHO.binpack
     data_pv-2_diff-100_nodes-5000.binpack
     wrongIsRight_nodes5000pv2.binpack
     multinet_pv-2_diff-100_nodes-5000.binpack
     dfrc_n5000.binpack

2. 800 epochs, end-lambda 0.75, LR 4.375e-4, gamma 0.995, skip 12
   LeelaFarseer-T78juntoaugT79marT80dec.binpack (141G)
     T60T70wIsRightFarseerT60T74T75T76.binpack
     test78-junjulaug2022-16tb7p.no-db.min.binpack
     test79-mar2022-16tb7p.no-db.min.binpack
     test80-dec2022-16tb7p.no-db.min.binpack

3. 800 epochs, end-lambda 0.725, LR 4.375e-4, gamma 0.995, skip 20
   leela93-v1-dfrc99-v2-T78juntosepT80jan-v6dd-T78janfebT79aprT80aprmay.min.binpack
     leela93-filt-v1.min.binpack
     dfrc99-16tb7p-filt-v2.min.binpack
     test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack
     test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack
     test78-janfeb2022-16tb7p.min.binpack
     test79-apr2022-16tb7p.min.binpack
     test80-apr2022-16tb7p.min.binpack
     test80-may2022-16tb7p.min.binpack

4. 800 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 24
   leela96-dfrc99-v2-T78juntosepT79mayT80junsepnovjan-v6dd-T80mar23-v6-T60novdecT77decT78aprmayT79aprT80may23.min.binpack
     leela96-filt-v2.min.binpack
     dfrc99-16tb7p-filt-v2.min.binpack
     test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack
     test79-may2022-16tb7p.filter-v6-dd.min.binpack
     test80-jun2022-16tb7p.filter-v6-dd.min.binpack
     test80-sep2022-16tb7p.filter-v6-dd.min.binpack
     test80-nov2022-16tb7p.filter-v6-dd.min.binpack
     test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack
     test80-mar2023-2tb7p.v6-sk16.min.binpack
     test60-novdec2021-16tb7p.min.binpack
     test77-dec2021-16tb7p.min.binpack
     test78-aprmay2022-16tb7p.min.binpack
     test79-apr2022-16tb7p.min.binpack
     test80-may2023-2tb7p.min.binpack

5. 960 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28
   Increased max-epoch to 960 near the end of the first 800 epochs
   5af11540bbfe dataset: https://github.com/official-stockfish/Stockfish/pull/4635

6. 1000 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28
   Increased max-epoch to 1000 near the end of the first 800 epochs
   1ee1aba5ed dataset: https://github.com/official-stockfish/Stockfish/pull/4782
```

L1 weights permuted with:
```bash
python3 serialize.py $nnue $nnue_permuted \
  --features=HalfKAv2_hm \
  --ft_optimize \
  --ft_optimize_data=/data/fishpack32.binpack \
  --ft_optimize_count=10000
```

Speed measurements from 100 bench runs at depth 13 with profile-build x86-64-avx2:
```
sf_base =  1329051 +/-   2224 (95%)
sf_test =  1163344 +/-   2992 (95%)
diff    =  -165706 +/-   4913 (95%)
speedup = -12.46807% +/- 0.370% (95%)
```

Training data can be found at:
https://robotmoon.com/nnue-training-data/

Local elo at 25k nodes per move (vs. L1-2048 nn-1ee1aba5ed4c.nnue)
ep959 : 16.2 +/- 2.3

Failed 10+0.1 STC:
https://tests.stockfishchess.org/tests/view/6501beee2cd016da89abab21
LLR: -2.92 (-2.94,2.94) <0.00,2.00>
Total: 13184 W: 3285 L: 3535 D: 6364
Ptnml(0-2): 85, 1662, 3334, 1440, 71

Failed 180+1.8 VLTC:
https://tests.stockfishchess.org/tests/view/6505cf9a72620bc881ea908e
LLR: -2.94 (-2.94,2.94) <0.00,2.00>
Total: 64248 W: 16224 L: 16374 D: 31650
Ptnml(0-2): 26, 6788, 18640, 6650, 20

Passed 60+0.6 th 8 VLTC SMP (STC bounds):
https://tests.stockfishchess.org/tests/view/65084a4618698b74c2e541dc
LLR: 2.95 (-2.94,2.94) <0.00,2.00>
Total: 90630 W: 23372 L: 23033 D: 44225
Ptnml(0-2): 13, 8490, 27968, 8833, 11

Passed 60+0.6 th 8 VLTC SMP:
https://tests.stockfishchess.org/tests/view/6501d45d2cd016da89abacdb
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 137804 W: 35764 L: 35276 D: 66764
Ptnml(0-2): 31, 13006, 42326, 13522, 17

closes https://github.com/official-stockfish/Stockfish/pull/4795

bench 1246812
2023-09-22 19:26:16 +02:00
Disservin
3c0e86a91e Cleanup includes
Reorder a few includes, include "position.h" where it was previously missing
and apply include-what-you-use suggestions. Also make the order of the includes
consistent, in the following way:

1. Related header (for .cpp files)
2. A blank line
3. C/C++ headers
4. A blank line
5. All other header files

closes https://github.com/official-stockfish/Stockfish/pull/4763
fixes https://github.com/official-stockfish/Stockfish/issues/4707

No functional change
2023-09-03 08:24:51 +02:00
Linmiao Xu
915532181f Update NNUE architecture to SFNNv7 with larger L1 size of 2048
Creating this net involved:
- a 5-step training process from scratch
- greedy permuting L1 weights with https://github.com/official-stockfish/Stockfish/pull/4620
- leb128 compression with https://github.com/glinscott/nnue-pytorch/pull/251
- greedy 2- and 3- cycle permuting with https://github.com/official-stockfish/Stockfish/pull/4640

The 5 training steps were:

1. 400 epochs, lambda 1.0, lr 9.75e-4
   UHOx2-wIsRight-multinet-dfrc-n5000-largeGensfen-d9.binpack (178G)
     nodes5000pv2_UHO.binpack
     data_pv-2_diff-100_nodes-5000.binpack
     wrongIsRight_nodes5000pv2.binpack
     multinet_pv-2_diff-100_nodes-5000.binpack
     dfrc_n5000.binpack
     large_gensfen_multipvdiff_100_d9.binpack
   ep399 chosen as start model for step2

2. 800 epochs, end-lambda 0.75, skip 16
   LeelaFarseer-T78juntoaugT79marT80dec.binpack (141G)
     T60T70wIsRightFarseerT60T74T75T76.binpack
     test78-junjulaug2022-16tb7p.no-db.min.binpack
     test79-mar2022-16tb7p.no-db.min.binpack
     test80-dec2022-16tb7p.no-db.min.binpack
   ep559 chosen as start model for step3

3. 800 epochs, end-lambda 0.725, skip 20
   leela96-dfrc99-v2-T80dectofeb-sk20-mar-v6-T77decT78janfebT79apr.binpack (223G)
     leela96-filt-v2.min.binpack
     dfrc99-16tb7p-eval-filt-v2.min.binpack
     test80-dec2022-16tb7p-filter-v6-sk20.min-mar2023.binpack
     test80-jan2023-16tb7p-filter-v6-sk20.min-mar2023.binpack
     test80-feb2023-16tb7p-filter-v6-sk20.min-mar2023.binpack
     test80-mar2023-2tb7p-filter-v6.min.binpack
     test77-dec2021-16tb7p.no-db.min.binpack
     test78-janfeb2022-16tb7p.no-db.min.binpack
     test79-apr2022-16tb7p.no-db.min.binpack
   ep499 chosen as start model for step4

4. 800 epochs, end-lambda 0.7, skip 24
   0dd1cebea57 dataset https://github.com/official-stockfish/Stockfish/pull/4606
   ep599 chosen as start model for step5

5. 800 epochs, end-lambda 0.7, skip 28
   same dataset as step4
   ep619 became nn-1b951f8b449d.nnue

For the final step5 training:

python3 easy_train.py \
  --experiment-name L1-2048-S5-sameData-sk28-S4-0dd1cebea57-shuffled-S3-leela96-dfrc99-v2-T80dectofeb-sk20-mar-v6-T77decT78janfebT79apr-sk20-S2-LeelaFarseerT78T79T80-ep399-S1-UHOx2-wIsRight-multinet-dfrc-n5000-largeGensfen-d9 \
  --training-dataset /data/leela96-dfrc99-T60novdec-v2-T80juntonovjanfebT79aprmayT78jantosepT77dec-v6dd-T80apr.binpack \
  --early-fen-skipping 28 \
  --nnue-pytorch-branch linrock/nnue-pytorch/misc-fixes-L1-2048 \
  --engine-test-branch linrock/Stockfish/L1-2048 \
  --start-from-engine-test-net False \
  --start-from-model /data/experiments/experiment_L1-2048-S4-0dd1cebea57-shuffled-S3-leela96-dfrc99-v2-T80dectofeb-sk20-mar-v6-T77decT78janfebT79apr-sk20-S2-LeelaFarseerT78T79T80-ep399-S1-UHOx2-wIsRight-multinet-dfrc-n5000-largeGensfen-d9/training/run_0/nn-epoch599.nnue
  --max_epoch 800 \
  --lr 4.375e-4 \
  --gamma 0.995 \
  --start-lambda 1.0 \
  --end-lambda 0.7 \
  --tui False \
  --seed $RANDOM \
  --gpus 0

SF training data components for the step1 dataset:
https://drive.google.com/drive/folders/1yLCEmioC3Xx9KQr4T7uB6GnLm5icAYGU

Leela training data for steps 2-5 can be found at:
https://robotmoon.com/nnue-training-data/

Due to larger L1 size and slower inference, the speed penalty loses elo
at STC. Measurements from 100 bench runs at depth 13 with x86-64-modern
on Intel Core i5-1038NG7 2.00GHz:

sf_base =  1240730  +/-   3443 (95%)
sf_test =  1153341  +/-   2832 (95%)
diff    =   -87388  +/-   1616 (95%)
speedup = -7.04330% +/- 0.130% (95%)

Local elo at 25k nodes per move (vs. L1-1536 nn-fdc1d0fe6455.nnue):
nn-epoch619.nnue : 21.1 +/- 3.2

Failed STC:
https://tests.stockfishchess.org/tests/view/6498ee93dc7002ce609cf979
LLR: -2.95 (-2.94,2.94) <0.00,2.00>
Total: 11680 W: 3058 L: 3299 D: 5323
Ptnml(0-2): 44, 1422, 3149, 1181, 44

LTC:
https://tests.stockfishchess.org/tests/view/649b32f5dc7002ce609d20cf
Elo: 0.68 ± 1.5 (95%) LOS: 80.5%
Total: 40000 W: 10887 L: 10809 D: 18304
Ptnml(0-2): 36, 3938, 11958, 4048, 20
nElo: 1.50 ± 3.4 (95%) PairsRatio: 1.02

Passed VLTC 180+1.8:
https://tests.stockfishchess.org/tests/view/64992b43dc7002ce609cfd20
LLR: 3.06 (-2.94,2.94) <0.00,2.00>
Total: 38086 W: 10612 L: 10338 D: 17136
Ptnml(0-2): 9, 3316, 12115, 3598, 5

Passed VLTC SMP 60+0.6 th 8:
https://tests.stockfishchess.org/tests/view/649a21fedc7002ce609d0c7d
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 38936 W: 11091 L: 10820 D: 17025
Ptnml(0-2): 1, 2948, 13305, 3207, 7

closes https://github.com/official-stockfish/Stockfish/pull/4646

Bench: 2505168
2023-07-01 13:34:30 +02:00
AndrovT
38e61663d8 Use block sparse input for the first layer.
Use block sparse input for the first fully connected layer on architectures with at least SSSE3.

Depending on the CPU architecture, this yields a speedup of up to 10%, e.g.

```
Result of 100 runs of 'bench 16 1 13 default depth NNUE'

base (...ockfish-base) =     959345  +/- 7477
test (...ckfish-patch) =    1054340  +/- 9640
diff                   =     +94995  +/- 3999

speedup        = +0.0990
P(speedup > 0) =  1.0000

CPU: 8 x AMD Ryzen 7 5700U with Radeon Graphics
Hyperthreading: on
```

Passed STC:
https://tests.stockfishchess.org/tests/view/6485aa0965ffe077ca12409c
LLR: 2.93 (-2.94,2.94) <0.00,2.00>
Total: 8864 W: 2479 L: 2223 D: 4162
Ptnml(0-2): 13, 829, 2504, 1061, 25

This commit includes a net with reordered weights, to increase the likelihood of block sparse inputs,
but otherwise equivalent to the previous master net (nn-ea57bea57e32.nnue).

Activation data collected with https://github.com/AndrovT/Stockfish/tree/log-activations, running bench 16 1 13 varied_1000.epd depth NNUE on this data. Net parameters permuted with https://gist.github.com/AndrovT/9e3fbaebb7082734dc84d27e02094cb3.

closes https://github.com/official-stockfish/Stockfish/pull/4612

No functional change
2023-06-12 20:41:27 +02:00
Linmiao Xu
c1fff71650 Update NNUE architecture to SFNNv6 with larger L1 size of 1536
Created by training a new net from scratch with L1 size increased from 1024 to 1536.
Thanks to Vizvezdenec for the idea of exploring larger net sizes after recent
training data improvements.

A new net was first trained with lambda 1.0 and constant LR 8.75e-4. Then a strong net
from a later epoch in the training run was chosen for retraining with start-lambda 1.0
and initial LR 4.375e-4 decaying with gamma 0.995. Retraining was performed a total of
3 times, for this 4-step process:

1. 400 epochs, lambda 1.0 on filtered T77+T79 v6 deduplicated data
2. 800 epochs, end-lambda 0.75 on T60T70wIsRightFarseerT60T74T75T76.binpack
3. 800 epochs, end-lambda 0.75 and early-fen-skipping 28 on the master dataset
4. 800 epochs, end-lambda 0.7 and early-fen-skipping 28 on the master dataset

In the training sequence that reached the new nn-8d69132723e2.nnue net,
the epochs used for the 3x retraining runs were:

1. epoch 379 trained on T77T79-filter-v6-dd.min.binpack
2. epoch 679 trained on T60T70wIsRightFarseerT60T74T75T76.binpack
3. epoch 799 trained on the master dataset

For training from scratch:

python3 easy_train.py \
  --experiment-name new-L1-1536-T77T79-filter-v6dd \
  --training-dataset /data/T77T79-filter-v6-dd.min.binpack \
  --max_epoch 400 \
  --lambda 1.0 \
  --start-from-engine-test-net False \
  --engine-test-branch linrock/Stockfish/L1-1536 \
  --nnue-pytorch-branch linrock/Stockfish/misc-fixes-L1-1536 \
  --tui False \
  --gpus "0," \
  --seed $RANDOM

Retraining commands were similar to each other. For the 3rd retraining run:

python3 easy_train.py \
  --experiment-name L1-1536-T77T79-v6dd-Re1-LeelaFarseer-Re2-masterDataset-Re3-sameData \
  --training-dataset /data/leela96-dfrc99-v2-T60novdecT80juntonovjanfebT79aprmayT78jantosepT77dec-v6dd.binpack \
  --early-fen-skipping 28 \
  --max_epoch 800 \
  --start-lambda 1.0 \
  --end-lambda 0.7 \
  --lr 4.375e-4 \
  --gamma 0.995 \
  --start-from-engine-test-net False \
  --start-from-model /data/L1-1536-T77T79-v6dd-Re1-LeelaFarseer-Re2-masterDataset-nn-epoch799.nnue \
  --engine-test-branch linrock/Stockfish/L1-1536 \
  --nnue-pytorch-branch linrock/nnue-pytorch/misc-fixes-L1-1536 \
  --tui False \
  --gpus "0," \
  --seed $RANDOM

The T77+T79 data used is a subset of the master dataset available at:
https://robotmoon.com/nnue-training-data/

T60T70wIsRightFarseerT60T74T75T76.binpack is available at:
https://drive.google.com/drive/folders/1S9-ZiQa_3ApmjBtl2e8SyHxj4zG4V8gG

Local elo at 25k nodes per move vs. nn-e1fb1ade4432.nnue (L1 size 1024):
nn-epoch759.nnue : 26.9 +/- 1.6

Failed STC
https://tests.stockfishchess.org/tests/view/64742485d29264e4cfa75f97
LLR: -2.94 (-2.94,2.94) <0.00,2.00>
Total: 13728 W: 3588 L: 3829 D: 6311
Ptnml(0-2): 71, 1661, 3610, 1482, 40

Failing LTC
https://tests.stockfishchess.org/tests/view/64752d7c4a36543c4c9f3618
LLR: -1.91 (-2.94,2.94) <0.50,2.50>
Total: 35424 W: 9522 L: 9603 D: 16299
Ptnml(0-2): 24, 3579, 10585, 3502, 22

Passed VLTC 180+1.8
https://tests.stockfishchess.org/tests/view/64752df04a36543c4c9f3638
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 47616 W: 13174 L: 12863 D: 21579
Ptnml(0-2): 13, 4261, 14952, 4566, 16

Passed VLTC SMP 60+0.6 th 8
https://tests.stockfishchess.org/tests/view/647446ced29264e4cfa761e5
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 19942 W: 5694 L: 5451 D: 8797
Ptnml(0-2): 6, 1504, 6707, 1749, 5

closes https://github.com/official-stockfish/Stockfish/pull/4593

bench 2222567
2023-05-31 08:51:22 +02:00
Sebastian Buchwald
2167942b6e Simplify functions to read/write network parameters
closes https://github.com/official-stockfish/Stockfish/pull/4358

No functional change
2023-01-28 16:47:52 +01:00
Sebastian Buchwald
b60f9cc451 Update copyright years
Happy New Year!

closes https://github.com/official-stockfish/Stockfish/pull/4315

No functional change
2023-01-02 19:07:38 +01:00
Tomasz Sobczyk
c079acc26f Update NNUE architecture to SFNNv5. Update network to nn-3c0aa92af1da.nnue.
Architecture changes:

    Duplicated activation after the 1024->15 layer with squared crelu (so 15->15*2). As proposed by vondele.

Trainer changes:

    Added bias to L1 factorization, which was previously missing (no measurable improvement but at least neutral in principle)
    For retraining linearly reduce lambda parameter from 1.0 at epoch 0 to 0.75 at epoch 800.
    reduce max_skipping_rate from 15 to 10 (compared to vondele's outstanding PR)

Note: This network was trained with a ~0.8% error in quantization regarding the newly added activation function.
      This will be fixed in the released trainer version. Expect a trainer PR tomorrow.

Note: The inference implementation cuts a corner to merge results from two activation functions.
       This could possibly be resolved nicer in the future. AVX2 implementation likely not necessary, but NEON is missing.

First training session invocation:

python3 train.py \
    ../nnue-pytorch-training/data/nodes5000pv2_UHO.binpack \
    ../nnue-pytorch-training/data/nodes5000pv2_UHO.binpack \
    --gpus "$3," \
    --threads 4 \
    --num-workers 8 \
    --batch-size 16384 \
    --progress_bar_refresh_rate 20 \
    --random-fen-skipping 3 \
    --features=HalfKAv2_hm^ \
    --lambda=1.0 \
    --max_epochs=400 \
    --default_root_dir ../nnue-pytorch-training/experiment_$1/run_$2

Second training session invocation:

python3 train.py \
    ../nnue-pytorch-training/data/T60T70wIsRightFarseerT60T74T75T76.binpack \
    ../nnue-pytorch-training/data/T60T70wIsRightFarseerT60T74T75T76.binpack \
    --gpus "$3," \
    --threads 4 \
    --num-workers 8 \
    --batch-size 16384 \
    --progress_bar_refresh_rate 20 \
    --random-fen-skipping 3 \
    --features=HalfKAv2_hm^ \
    --start-lambda=1.0 \
    --end-lambda=0.75 \
    --gamma=0.995 \
    --lr=4.375e-4 \
    --max_epochs=800 \
    --resume-from-model /data/sopel/nnue/nnue-pytorch-training/data/exp367/nn-exp367-run3-epoch399.pt \
    --default_root_dir ../nnue-pytorch-training/experiment_$1/run_$2

Passed STC:
LLR: 2.95 (-2.94,2.94) <0.00,2.50>
Total: 27288 W: 7445 L: 7178 D: 12665
Ptnml(0-2): 159, 3002, 7054, 3271, 158
https://tests.stockfishchess.org/tests/view/627e8c001919125939623644

Passed LTC:
LLR: 2.95 (-2.94,2.94) <0.50,3.00>
Total: 21792 W: 5969 L: 5727 D: 10096
Ptnml(0-2): 25, 2152, 6294, 2406, 19
https://tests.stockfishchess.org/tests/view/627f2a855734b18b2e2ece47

closes https://github.com/official-stockfish/Stockfish/pull/4020

Bench: 6481017
2022-05-14 12:47:22 +02:00
Tomasz Sobczyk
174b038bf3 Use dynamic allocation for evaluation scratch TLS buffer.
fixes #3946 an issue related with the toolchain as found in xcode 12 on macOS,
related to previous commit 5f781d36.

closes https://github.com/official-stockfish/Stockfish/pull/3950

No functional change
2022-03-01 17:51:02 +01:00
mstembera
5f781d366e Clean up and simplify some nnue code.
Remove some unnecessary code and it's execution during inference. Also the change on line 49 in nnue_architecture.h results in a more efficient SIMD code path through ClippedReLU::propagate().

passed STC:
https://tests.stockfishchess.org/tests/view/6217d3bfda649bba32ef25d5
LLR: 2.94 (-2.94,2.94) <-2.25,0.25>
Total: 12056 W: 3281 L: 3092 D: 5683
Ptnml(0-2): 55, 1213, 3312, 1384, 64

passed STC SMP:
https://tests.stockfishchess.org/tests/view/6217f344da649bba32ef295e
LLR: 2.94 (-2.94,2.94) <-2.25,0.25>
Total: 27376 W: 7295 L: 7137 D: 12944
Ptnml(0-2): 52, 2859, 7715, 3003, 59

closes https://github.com/official-stockfish/Stockfish/pull/3944

No functional change

bench: 6820724
2022-02-25 08:37:57 +01:00
Tomasz Sobczyk
cb9c2594fc Update architecture to "SFNNv4". Update network to nn-6877cd24400e.nnue.
Architecture:

The diagram of the "SFNNv4" architecture:
https://user-images.githubusercontent.com/8037982/153455685-cbe3a038-e158-4481-844d-9d5fccf5c33a.png

The most important architectural changes are the following:

* 1024x2 [activated] neurons are pairwise, elementwise multiplied (not quite pairwise due to implementation details, see diagram), which introduces a non-linearity that exhibits similar benefits to previously tested sigmoid activation (quantmoid4), while being slightly faster.
* The following layer has therefore 2x less inputs, which we compensate by having 2 more outputs. It is possible that reducing the number of outputs might be beneficial (as we had it as low as 8 before). The layer is now 1024->16.
* The 16 outputs are split into 15 and 1. The 1-wide output is added to the network output (after some necessary scaling due to quantization differences). The 15-wide is activated and follows the usual path through a set of linear layers. The additional 1-wide output is at least neutral, but has shown a slightly positive trend in training compared to networks without it (all 16 outputs through the usual path), and allows possibly an additional stage of lazy evaluation to be introduced in the future.

Additionally, the inference code was rewritten and no longer uses a recursive implementation. This was necessitated by the splitting of the 16-wide intermediate result into two, which was impossible to do with the old implementation with ugly hacks. This is hopefully overall for the better.

First session:

The first session was training a network from scratch (random initialization). The exact trainer used was slightly different (older) from the one used in the second session, but it should not have a measurable effect. The purpose of this session is to establish a strong network base for the second session. Small deviations in strength do not harm the learnability in the second session.

The training was done using the following command:

python3 train.py \
    /home/sopel/nnue/nnue-pytorch-training/data/nodes5000pv2_UHO.binpack \
    /home/sopel/nnue/nnue-pytorch-training/data/nodes5000pv2_UHO.binpack \
    --gpus "$3," \
    --threads 4 \
    --num-workers 4 \
    --batch-size 16384 \
    --progress_bar_refresh_rate 20 \
    --random-fen-skipping 3 \
    --features=HalfKAv2_hm^ \
    --lambda=1.0 \
    --gamma=0.992 \
    --lr=8.75e-4 \
    --max_epochs=400 \
    --default_root_dir ../nnue-pytorch-training/experiment_$1/run_$2

Every 20th net was saved and its playing strength measured against some baseline at 25k nodes per move with pure NNUE evaluation (modified binary). The exact setup is not important as long as it's consistent. The purpose is to sift good candidates from bad ones.

The dataset can be found https://drive.google.com/file/d/1UQdZN_LWQ265spwTBwDKo0t1WjSJKvWY/view

Second session:

The second training session was done starting from the best network (as determined by strength testing) from the first session. It is important that it's resumed from a .pt model and NOT a .ckpt model. The conversion can be performed directly using serialize.py

The LR schedule was modified to use gamma=0.995 instead of gamma=0.992 and LR=4.375e-4 instead of LR=8.75e-4 to flatten the LR curve and allow for longer training. The training was then running for 800 epochs instead of 400 (though it's possibly mostly noise after around epoch 600).

The training was done using the following command:

The training was done using the following command:

python3 train.py \
        /data/sopel/nnue/nnue-pytorch-training/data/T60T70wIsRightFarseerT60T74T75T76.binpack \
        /data/sopel/nnue/nnue-pytorch-training/data/T60T70wIsRightFarseerT60T74T75T76.binpack \
        --gpus "$3," \
        --threads 4 \
        --num-workers 4 \
        --batch-size 16384 \
        --progress_bar_refresh_rate 20 \
        --random-fen-skipping 3 \
        --features=HalfKAv2_hm^ \
        --lambda=1.0 \
        --gamma=0.995 \
        --lr=4.375e-4 \
        --max_epochs=800 \
        --resume-from-model /data/sopel/nnue/nnue-pytorch-training/data/exp295/nn-epoch399.pt \
        --default_root_dir ../nnue-pytorch-training/experiment_$1/run_$run_id

In particular note that we now use lambda=1.0 instead of lambda=0.8 (previous nets), because tests show that WDL-skipping introduced by vondele performs better with lambda=1.0. Nets were being saved every 20th epoch. In total 16 runs were made with these settings and the best nets chosen according to playing strength at 25k nodes per move with pure NNUE evaluation - these are the 4 nets that have been put on fishtest.

The dataset can be found either at ftp://ftp.chessdb.cn/pub/sopel/data_sf/T60T70wIsRightFarseerT60T74T75T76.binpack in its entirety (download might be painfully slow because hosted in China) or can be assembled in the following way:

Get the 5640ad48ae/script/interleave_binpacks.py script.
Download T60T70wIsRightFarseer.binpack https://drive.google.com/file/d/1_sQoWBl31WAxNXma2v45004CIVltytP8/view
Download farseerT74.binpack http://trainingdata.farseer.org/T74-May13-End.7z
Download farseerT75.binpack http://trainingdata.farseer.org/T75-June3rd-End.7z
Download farseerT76.binpack http://trainingdata.farseer.org/T76-Nov10th-End.7z
Run python3 interleave_binpacks.py T60T70wIsRightFarseer.binpack farseerT74.binpack farseerT75.binpack farseerT76.binpack T60T70wIsRightFarseerT60T74T75T76.binpack

Tests:

STC: https://tests.stockfishchess.org/tests/view/6203fb85d71106ed12a407b7
LLR: 2.94 (-2.94,2.94) <0.00,2.50>
Total: 16952 W: 4775 L: 4521 D: 7656
Ptnml(0-2): 133, 1818, 4318, 2076, 131

LTC: https://tests.stockfishchess.org/tests/view/62041e68d71106ed12a40e85
LLR: 2.94 (-2.94,2.94) <0.50,3.00>
Total: 14944 W: 4138 L: 3907 D: 6899
Ptnml(0-2): 21, 1499, 4202, 1728, 22

closes https://github.com/official-stockfish/Stockfish/pull/3927

Bench: 4919707
2022-02-10 19:54:31 +01:00
Brad Knox
ad926d34c0 Update copyright years
Happy New Year!

closes https://github.com/official-stockfish/Stockfish/pull/3881

No functional change
2022-01-06 15:45:45 +01:00
Tomasz Sobczyk
d61d38586e New NNUE architecture and net
Introduces a new NNUE network architecture and associated network parameters

The summary of the changes:

* Position for each perspective mirrored such that the king is on e..h files. Cuts the feature transformer size in half, while preserving enough knowledge to be good. See https://docs.google.com/document/d/1gTlrr02qSNKiXNZ_SuO4-RjK4MXBiFlLE6jvNqqMkAY/edit#heading=h.b40q4rb1w7on.
* The number of neurons after the feature transformer increased two-fold, to 1024x2. This is possibly mostly due to the now very optimized feature transformer update code.
* The number of neurons after the second layer is reduced from 16 to 8, to reduce the speed impact. This, perhaps surprisingly, doesn't harm the strength much. See https://docs.google.com/document/d/1gTlrr02qSNKiXNZ_SuO4-RjK4MXBiFlLE6jvNqqMkAY/edit#heading=h.6qkocr97fezq

The AffineTransform code did not work out-of-the box with the smaller number of neurons after the second layer, so some temporary changes have been made to add a special case for InputDimensions == 8. Also additional 0 padding is added to the output for some archs that cannot process inputs by <=8 (SSE2, NEON). VNNI uses an implementation that can keep all outputs in the registers while reducing the number of loads by 3 for each 16 inputs, thanks to the reduced number of output neurons. However GCC is particularily bad at optimization here (and perhaps why the current way the affine transform is done even passed sprt) (see https://docs.google.com/document/d/1gTlrr02qSNKiXNZ_SuO4-RjK4MXBiFlLE6jvNqqMkAY/edit# for details) and more work will be done on this in the following days. I expect the current VNNI implementation to be improved and extended to other architectures.

The network was trained with a slightly modified version of the pytorch trainer (https://github.com/glinscott/nnue-pytorch); the changes are in https://github.com/glinscott/nnue-pytorch/pull/143

The training utilized 2 datasets.

    dataset A - https://drive.google.com/file/d/1VlhnHL8f-20AXhGkILujnNXHwy9T-MQw/view?usp=sharing
    dataset B - as described in ba01f4b954

The training process was as following:

    train on dataset A for 350 epochs, take the best net in terms of elo at 20k nodes per move (it's fine to take anything from later stages of training).
    convert the .ckpt to .pt
    --resume-from-model from the .pt file, train on dataset B for <600 epochs, take the best net. Lambda=0.8, applied before the loss function.

The first training command:

python3 train.py \
    ../nnue-pytorch-training/data/large_gensfen_multipvdiff_100_d9.binpack \
    ../nnue-pytorch-training/data/large_gensfen_multipvdiff_100_d9.binpack \
    --gpus "$3," \
    --threads 1 \
    --num-workers 1 \
    --batch-size 16384 \
    --progress_bar_refresh_rate 20 \
    --smart-fen-skipping \
    --random-fen-skipping 3 \
    --features=HalfKAv2_hm^ \
    --lambda=1.0 \
    --max_epochs=600 \
    --default_root_dir ../nnue-pytorch-training/experiment_$1/run_$2

The second training command:

python3 serialize.py \
    --features=HalfKAv2_hm^ \
    ../nnue-pytorch-training/experiment_131/run_6/default/version_0/checkpoints/epoch-499.ckpt \
    ../nnue-pytorch-training/experiment_$1/base/base.pt

python3 train.py \
    ../nnue-pytorch-training/data/michael_commit_b94a65.binpack \
    ../nnue-pytorch-training/data/michael_commit_b94a65.binpack \
    --gpus "$3," \
    --threads 1 \
    --num-workers 1 \
    --batch-size 16384 \
    --progress_bar_refresh_rate 20 \
    --smart-fen-skipping \
    --random-fen-skipping 3 \
    --features=HalfKAv2_hm^ \
    --lambda=0.8 \
    --max_epochs=600 \
    --resume-from-model ../nnue-pytorch-training/experiment_$1/base/base.pt \
    --default_root_dir ../nnue-pytorch-training/experiment_$1/run_$2

STC: https://tests.stockfishchess.org/tests/view/611120b32a8a49ac5be798c4

LLR: 2.97 (-2.94,2.94) <-0.50,2.50>
Total: 22480 W: 2434 L: 2251 D: 17795
Ptnml(0-2): 101, 1736, 7410, 1865, 128

LTC: https://tests.stockfishchess.org/tests/view/611152b32a8a49ac5be798ea

LLR: 2.93 (-2.94,2.94) <0.50,3.50>
Total: 9776 W: 442 L: 333 D: 9001
Ptnml(0-2): 5, 295, 4180, 402, 6

closes https://github.com/official-stockfish/Stockfish/pull/3646

bench: 5189338
2021-08-15 12:05:43 +02:00
Tomasz Sobczyk
e8d64af123 New NNUE architecture and net
Introduces a new NNUE network architecture and associated network parameters,
as obtained by a new pytorch trainer.

The network is already very strong at short TC, without regression at longer TC,
and has potential for further improvements.

https://tests.stockfishchess.org/tests/view/60a159c65085663412d0921d
TC: 10s+0.1s, 1 thread
ELO: 21.74 +-3.4 (95%) LOS: 100.0%
Total: 10000 W: 1559 L: 934 D: 7507
Ptnml(0-2): 38, 701, 2972, 1176, 113

https://tests.stockfishchess.org/tests/view/60a187005085663412d0925b
TC: 60s+0.6s, 1 thread
ELO: 5.85 +-1.7 (95%) LOS: 100.0%
Total: 20000 W: 1381 L: 1044 D: 17575
Ptnml(0-2): 27, 885, 7864, 1172, 52

https://tests.stockfishchess.org/tests/view/60a2beede229097940a03806
TC: 20s+0.2s, 8 threads
LLR: 2.93 (-2.94,2.94) <0.50,3.50>
Total: 34272 W: 1610 L: 1452 D: 31210
Ptnml(0-2): 30, 1285, 14350, 1439, 32

https://tests.stockfishchess.org/tests/view/60a2d687e229097940a03c72
TC: 60s+0.6s, 8 threads
LLR: 2.94 (-2.94,2.94) <-2.50,0.50>
Total: 45544 W: 1262 L: 1214 D: 43068
Ptnml(0-2): 12, 1129, 20442, 1177, 12

The network has been trained (by vondele) using the https://github.com/glinscott/nnue-pytorch/ trainer (started by glinscott),
specifically the branch https://github.com/Sopel97/nnue-pytorch/tree/experiment_56.
The data used are in 64 billion positions (193GB total) generated and scored with the current master net
d8: https://drive.google.com/file/d/1hOOYSDKgOOp38ZmD0N4DV82TOLHzjUiF/view?usp=sharing
d9: https://drive.google.com/file/d/1VlhnHL8f-20AXhGkILujnNXHwy9T-MQw/view?usp=sharing
d10: https://drive.google.com/file/d/1ZC5upzBYMmMj1gMYCkt6rCxQG0GnO3Kk/view?usp=sharing
fishtest_d9: https://drive.google.com/file/d/1GQHt0oNgKaHazwJFTRbXhlCN3FbUedFq/view?usp=sharing

This network also contains a few architectural changes with respect to the current master:

    Size changed from 256x2-32-32-1 to 512x2-16-32-1
        ~15-20% slower
        ~2x larger
        adds a special path for 16 valued ClippedReLU
        fixes affine transform code for 16 inputs/outputs, buy using InputDimensions instead of PaddedInputDimensions
            this is safe now because the inputs are processed in groups of 4 in the current affine transform code
    The feature set changed from HalfKP to HalfKAv2
        Includes information about the kings like HalfKA
        Packs king features better, resulting in 8% size reduction compared to HalfKA
    The board is flipped for the black's perspective, instead of rotated like in the current master
    PSQT values for each feature
        the feature transformer now outputs a part that is fowarded directly to the output and allows learning piece values more directly than the previous network architecture. The effect is visible for high imbalance positions, where the current master network outputs evaluations skewed towards zero.
        8 PSQT values per feature, chosen based on (popcount(pos.pieces()) - 1) / 4
        initialized to classical material values on the start of the training
    8 subnetworks (512x2->16->32->1), chosen based on (popcount(pos.pieces()) - 1) / 4
        only one subnetwork is evaluated for any position, no or marginal speed loss

A diagram of the network is available: https://user-images.githubusercontent.com/8037982/118656988-553a1700-b7eb-11eb-82ef-56a11cbebbf2.png
A more complete description: https://github.com/glinscott/nnue-pytorch/blob/master/docs/nnue.md

closes https://github.com/official-stockfish/Stockfish/pull/3474

Bench: 3806488
2021-05-18 18:06:23 +02:00
Tomasz Sobczyk
b748b46714 Cleanup and simplify NNUE code.
A lot of optimizations happend since the NNUE was introduced
and since then some parts of the code were left unused. This
got to the point where asserts were have to be made just to
let people know that modifying something will not have any
effects or may even break everything due to the assumptions
being made. Removing these parts removes those inexisting
"false dependencies". Additionally:

 * append_changed_indices now takes the king pos and stateinfo
   explicitly, no more misleading pos parameter
 * IndexList is removed in favor of a generic ValueList.
   Feature transformer just instantiates the type it needs.
 * The update cost and refresh requirement is deferred to the
   feature set once again, but now doesn't go through the whole
   FeatureSet machinery and just calls HalfKP directly.
 * accumulator no longer has a singular dimension.
 * The PS constants and the PieceSquareIndex array are made local
   to the HalfKP feature set because they are specific to it and
   DO differ for other feature sets.
 * A few names are changed to more descriptive

Passed STC non-regression:
https://tests.stockfishchess.org/tests/view/608421dd95e7f1852abd2790
LLR: 2.95 (-2.94,2.94) <-2.50,0.50>
Total: 180008 W: 16186 L: 16258 D: 147564
Ptnml(0-2): 587, 12593, 63725, 12503, 596

closes https://github.com/official-stockfish/Stockfish/pull/3441

No functional change
2021-04-25 13:16:30 +02:00
Tomasz Sobczyk
fbbd4adc3c Unify naming convention of the NNUE code
matches the rest of the stockfish code base

closes https://github.com/official-stockfish/Stockfish/pull/3437

No functional change
2021-04-24 12:49:29 +02:00
Dieter Dobbelaere
7ffae17f85 Add Stockfish namespace.
fixes #3350 and is a small cleanup that might make it easier to use SF
in separate projects, like a NNUE trainer or similar.

closes https://github.com/official-stockfish/Stockfish/pull/3370

No functional change.
2021-03-07 14:26:54 +01:00
Joost VandeVondele
c4d67d77c9 Update copyright years
No functional change
2021-01-08 17:04:23 +01:00
nodchip
84f3e86790 Add NNUE evaluation
This patch ports the efficiently updatable neural network (NNUE) evaluation to Stockfish.

Both the NNUE and the classical evaluations are available, and can be used to
assign a value to a position that is later used in alpha-beta (PVS) search to find the
best move. The classical evaluation computes this value as a function of various chess
concepts, handcrafted by experts, tested and tuned using fishtest. The NNUE evaluation
computes this value with a neural network based on basic inputs. The network is optimized
and trained on the evalutions of millions of positions at moderate search depth.

The NNUE evaluation was first introduced in shogi, and ported to Stockfish afterward.
It can be evaluated efficiently on CPUs, and exploits the fact that only parts
of the neural network need to be updated after a typical chess move.
[The nodchip repository](https://github.com/nodchip/Stockfish) provides additional
tools to train and develop the NNUE networks.

This patch is the result of contributions of various authors, from various communities,
including: nodchip, ynasu87, yaneurao (initial port and NNUE authors), domschl, FireFather,
rqs, xXH4CKST3RXx, tttak, zz4032, joergoster, mstembera, nguyenpham, erbsenzaehler,
dorzechowski, and vondele.

This new evaluation needed various changes to fishtest and the corresponding infrastructure,
for which tomtor, ppigazzini, noobpwnftw, daylen, and vondele are gratefully acknowledged.

The first networks have been provided by gekkehenker and sergiovieri, with the latter
net (nn-97f742aaefcd.nnue) being the current default.

The evaluation function can be selected at run time with the `Use NNUE` (true/false) UCI option,
provided the `EvalFile` option points the the network file (depending on the GUI, with full path).

The performance of the NNUE evaluation relative to the classical evaluation depends somewhat on
the hardware, and is expected to improve quickly, but is currently on > 80 Elo on fishtest:

60000 @ 10+0.1 th 1
https://tests.stockfishchess.org/tests/view/5f28fe6ea5abc164f05e4c4c
ELO: 92.77 +-2.1 (95%) LOS: 100.0%
Total: 60000 W: 24193 L: 8543 D: 27264
Ptnml(0-2): 609, 3850, 9708, 10948, 4885

40000 @ 20+0.2 th 8
https://tests.stockfishchess.org/tests/view/5f290229a5abc164f05e4c58
ELO: 89.47 +-2.0 (95%) LOS: 100.0%
Total: 40000 W: 12756 L: 2677 D: 24567
Ptnml(0-2): 74, 1583, 8550, 7776, 2017

At the same time, the impact on the classical evaluation remains minimal, causing no significant
regression:

sprt @ 10+0.1 th 1
https://tests.stockfishchess.org/tests/view/5f2906a2a5abc164f05e4c5b
LLR: 2.94 (-2.94,2.94) {-6.00,-4.00}
Total: 34936 W: 6502 L: 6825 D: 21609
Ptnml(0-2): 571, 4082, 8434, 3861, 520

sprt @ 60+0.6 th 1
https://tests.stockfishchess.org/tests/view/5f2906cfa5abc164f05e4c5d
LLR: 2.93 (-2.94,2.94) {-6.00,-4.00}
Total: 10088 W: 1232 L: 1265 D: 7591
Ptnml(0-2): 49, 914, 3170, 843, 68

The needed networks can be found at https://tests.stockfishchess.org/nns
It is recommended to use the default one as indicated by the `EvalFile` UCI option.

Guidelines for testing new nets can be found at
https://github.com/glinscott/fishtest/wiki/Creating-my-first-test#nnue-net-tests

Integration has been discussed in various issues:
https://github.com/official-stockfish/Stockfish/issues/2823
https://github.com/official-stockfish/Stockfish/issues/2728

The integration branch will be closed after the merge:
https://github.com/official-stockfish/Stockfish/pull/2825
https://github.com/official-stockfish/Stockfish/tree/nnue-player-wip

closes https://github.com/official-stockfish/Stockfish/pull/2912

This will be an exciting time for computer chess, looking forward to seeing the evolution of
this approach.

Bench: 4746616
2020-08-06 16:37:45 +02:00