The commit adds a CI workflow that uses the included-what-you-use (IWYU)
tool to check for missing or superfluous includes in .cpp files and
their corresponding .h files. This means that some .h files (especially
in the nnue folder) are not checked yet.
The CI setup looks like this:
- We build IWYU from source to include some yet unreleased fixes.
This IWYU version targets LLVM 17. Thus, we get the latest release
candidate of LLVM 17 from LLVM's nightly packages.
- The Makefile now has an analyze target that just build the object
files (without linking)
- The CI uses the analyze target with the IWYU tool as compiler to
analyze the compiled .cpp file and its corresponding .h file.
- If IWYU suggests a change the build fails (-Xiwyu --error).
- To avoid false positives we use LLVM's libc++ as standard library
- We have a custom mappings file that adds some mappings that are
missing in IWYU's default mappings
We also had to add one IWYU pragma to prevent a false positive in
movegen.h.
https://github.com/official-stockfish/Stockfish/pull/4783
No functional change
Created by retraining the master net on a dataset composed by:
- adding Leela data from T60 jul-dec 2020, T77 nov 2021, T80 jun-jul 2023
- deduplicating and unminimizing parts of the dataset before interleaving
Trained initially with max epoch 800, then increased near the end of training
twice. First to 960, then 1200. After training, post-processing involved:
- greedy permuting L1 weights with https://github.com/official-stockfish/Stockfish/pull/4620
- greedy 2- and 3- cycle permuting with https://github.com/official-stockfish/Stockfish/pull/4640
python3 easy_train.py \
--experiment-name 2048-retrain-S6-sk28 \
--training-dataset /data/S6.binpack \
--early-fen-skipping 28 \
--start-from-engine-test-net True \
--max_epoch 1200 \
--lr 4.375e-4 \
--gamma 0.995 \
--start-lambda 1.0 \
--end-lambda 0.7 \
--tui False \
--seed $RANDOM \
--gpus 0
In the list of datasets below, periods in the filename represent the sequence of
steps applied to arrive at the particular binpack. For example:
test77-dec2021-16tb7p.filter-v6-dd.min-mar2023.unminimized.binpack
1. test77 dec2021 data rescored with 16 TB of syzygy tablebases during data conversion
2. filtered with csv_filter_v6_dd.py - v6 filtering and deduplication in one step
3. minimized with the original mar2023 implementation of `minimize_binpack` in
the tools branch
4. unminimized by removing all positions with score == 32002 (`VALUE_NONE`)
Binpacks were:
- filtered with: https://github.com/linrock/nnue-data
- unminimized with: https://github.com/linrock/Stockfish/tree/tools-unminify
- deduplicated with: https://github.com/linrock/Stockfish/tree/tools-dd
DATASETS=(
leela96-filt-v2.min.unminimized.binpack
dfrc99-16tb7p-eval-filt-v2.min.unminimized.binpack
# most of the 0dd1cebea57 v6-dd dataset (without test80-jul2022)
# https://github.com/official-stockfish/Stockfish/pull/4606
test60-novdec2021-12tb7p.filter-v6-dd.min-mar2023.unminimized.binpack
test77-dec2021-16tb7p.filter-v6-dd.min-mar2023.unminimized.binpack
test78-jantomay2022-16tb7p.filter-v6-dd.min-mar2023.unminimized.binpack
test78-juntosep2022-16tb7p.filter-v6-dd.min-mar2023.unminimized.binpack
test79-apr2022-16tb7p.filter-v6-dd.min-mar2023.unminimized.binpack
test79-may2022-16tb7p.filter-v6-dd.min-mar2023.unminimized.binpack
test80-jun2022-16tb7p.filter-v6-dd.min-mar2023.unminimized.binpack
test80-aug2022-16tb7p.filter-v6-dd.min-mar2023.unminimized.binpack
test80-sep2022-16tb7p.filter-v6-dd.min-mar2023.unminimized.binpack
test80-oct2022-16tb7p.filter-v6-dd.min.binpack
test80-nov2022-16tb7p.filter-v6-dd.min.binpack
test80-jan2023-3of3-16tb7p.filter-v6-dd.min-mar2023.unminimized.binpack
test80-feb2023-16tb7p.filter-v6-dd.min-mar2023.unminimized.binpack
# older Leela data, recently converted
test60-octnovdec2020-2tb7p.min.unminimized.binpack
test60-julaugsep2020-2tb7p.min.binpack
test77-nov2021-2tb7p.min.dd.binpack
# newer Leela data
test80-mar2023-2tb7p.min.unminimized.binpack
test80-apr2023-2tb7p.filter-v6-sk16.min.unminimized.binpack
test80-may2023-2tb7p.min.dd.binpack
test80-jun2023-2tb7p.min.binpack
test80-jul2023-2tb7p.binpack
)
python3 interleave_binpacks.py ${DATASETS[@]} /data/S6.binpack
Training data can be found at:
https://robotmoon.com/nnue-training-data/
Local elo at 25k nodes per move:
nn-epoch1059 : 2.7 +/- 1.6
Passed STC:
https://tests.stockfishchess.org/tests/view/64fc8d705dab775b5359db42
LLR: 2.93 (-2.94,2.94) <0.00,2.00>
Total: 168352 W: 43216 L: 42704 D: 82432
Ptnml(0-2): 599, 19672, 43134, 20160, 611
Passed LTC:
https://tests.stockfishchess.org/tests/view/64fd44a75dab775b5359f065
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 154194 W: 39436 L: 38881 D: 75877
Ptnml(0-2): 78, 16577, 43238, 17120, 84
closes https://github.com/official-stockfish/Stockfish/pull/4782
Bench: 1603079
This patch implements the pure materialistic evaluation called simple_eval()
to gain a speed-up during Stockfish search.
We use the so-called lazy evaluation trick: replace the accurate but slow
NNUE network evaluation by the super-fast simple_eval() if the position
seems to be already won (high material advantage). To guard against some
of the most obvious blunders introduced by this idea, this patch uses the
following features which will raise the lazy evaluation threshold in some
situations:
- avoid lazy evals on shuffling branches in the search tree
- avoid lazy evals if the position at root already has a material imbalance
- avoid lazy evals if the search value at root is already winning/losing.
Moreover, we add a small random noise to the simple_eval() term. This idea
(stochastic mobility in the minimax tree) was worth about 200 Elo in the pure
simple_eval() player on Lichess.
Overall, the current implementation in this patch evaluates about 2% of the
leaves in the search tree lazily.
--------------------------------------------
STC:
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 60352 W: 15585 L: 15234 D: 29533
Ptnml(0-2): 216, 6906, 15578, 7263, 213
https://tests.stockfishchess.org/tests/view/64f1d9bcbd9967ffae366209
LTC:
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 35106 W: 8990 L: 8678 D: 17438
Ptnml(0-2): 14, 3668, 9887, 3960, 24
https://tests.stockfishchess.org/tests/view/64f25204f5b0c54e3f04c0e7
verification run at VLTC:
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 74362 W: 19088 L: 18716 D: 36558
Ptnml(0-2): 6, 7226, 22348, 7592, 9
https://tests.stockfishchess.org/tests/view/64f2ecdbf5b0c54e3f04d3ae
All three tests above were run with adjudication off, we also verified that
there was no regression on matetracker (thanks Disservin!).
----------------------------------------------
closes https://github.com/official-stockfish/Stockfish/pull/4771
Bench: 1393714
faster permutation of master net weights
Activation data taken from https://drive.google.com/drive/folders/1Ec9YuuRx4N03GPnVPoQOW70eucOKngQe?usp=sharing
Permutation found using 836387a0e5/ftperm.py
See also https://github.com/glinscott/nnue-pytorch/pull/254
The algorithm greedily selects 2- and 3-cycles that can be permuted to increase the number of runs of zeroes. The percent of zero runs from the master net increased from 68.46 to 70.11 from 2-cycles and only increased to 70.32 when considering 3-cycles. Interestingly, allowing both halves of L1 to intermix when creating zero runs can give another 0.5% zero-run density increase with this method.
Measured speedup:
```
CPU: 16 x AMD Ryzen 9 3950X 16-Core Processor
Result of 50 runs
base (./stockfish.master ) = 1561556 +/- 5439
test (./stockfish.patch ) = 1575788 +/- 5427
diff = +14231 +/- 2636
speedup = +0.0091
P(speedup > 0) = 1.0000
```
closes https://github.com/official-stockfish/Stockfish/pull/4640
No functional change
Implemented LEB128 (de)compression for the feature transformer.
Reduces embedded network size from 70 MiB to 39 Mib.
The new nn-78bacfcee510.nnue corresponds to the master net compressed.
closes https://github.com/official-stockfish/Stockfish/pull/4617
No functional change
Use block sparse input for the first fully connected layer on architectures with at least SSSE3.
Depending on the CPU architecture, this yields a speedup of up to 10%, e.g.
```
Result of 100 runs of 'bench 16 1 13 default depth NNUE'
base (...ockfish-base) = 959345 +/- 7477
test (...ckfish-patch) = 1054340 +/- 9640
diff = +94995 +/- 3999
speedup = +0.0990
P(speedup > 0) = 1.0000
CPU: 8 x AMD Ryzen 7 5700U with Radeon Graphics
Hyperthreading: on
```
Passed STC:
https://tests.stockfishchess.org/tests/view/6485aa0965ffe077ca12409c
LLR: 2.93 (-2.94,2.94) <0.00,2.00>
Total: 8864 W: 2479 L: 2223 D: 4162
Ptnml(0-2): 13, 829, 2504, 1061, 25
This commit includes a net with reordered weights, to increase the likelihood of block sparse inputs,
but otherwise equivalent to the previous master net (nn-ea57bea57e32.nnue).
Activation data collected with https://github.com/AndrovT/Stockfish/tree/log-activations, running bench 16 1 13 varied_1000.epd depth NNUE on this data. Net parameters permuted with https://gist.github.com/AndrovT/9e3fbaebb7082734dc84d27e02094cb3.
closes https://github.com/official-stockfish/Stockfish/pull/4612
No functional change
Created by retraining an earlier epoch (ep659) of the experiment that led to the first SFNNv6 net:
- First retrained on the nn-0dd1cebea573 dataset
- Then retrained with skip 20 on a smaller dataset containing unfiltered Leela data
- And then retrained again with skip 27 on the nn-0dd1cebea573 dataset
The equivalent 7-step training sequence from scratch that led here was:
1. max-epoch 400, lambda 1.0, constant LR 9.75e-4, T79T77-filter-v6-dd.min.binpack
ep379 chosen for retraining in step2
2. max-epoch 800, end-lambda 0.75, T60T70wIsRightFarseerT60T74T75T76.binpack
ep679 chosen for retraining in step3
3. max-epoch 800, end-lambda 0.75, skip 28, nn-e1fb1ade4432 dataset
ep799 chosen for retraining in step4
4. max-epoch 800, end-lambda 0.7, skip 28, nn-e1fb1ade4432 dataset
ep759 became nn-8d69132723e2.nnue (first SFNNv6 net)
ep659 chosen for retraining in step5
5. max-epoch 800, end-lambda 0.7, skip 28, nn-0dd1cebea573 dataset
ep759 chosen for retraining in step6
6. max-epoch 800, end-lambda 0.7, skip 20, leela-dfrc-v2-T77decT78janfebT79aprT80apr.binpack
ep639 chosen for retraining in step7
7. max-epoch 800, end-lambda 0.7, skip 27, nn-0dd1cebea573 dataset
ep619 became nn-ea57bea57e32.nnue
For the last retraining (step7):
python3 easy_train.py
--experiment-name L1-1536-Re6-masterShuffled-ep639-sk27-Re5-leela-dfrc-v2-T77toT80small-Re4-masterShuffled-ep659-Re3-sameAs-Re2-leela96-dfrc99-16t-v2-T60novdecT80juntonovjanfebT79aprmayT78jantosepT77dec-v6dd-Re1-LeelaFarseer-new-T77T79 \
--training-dataset /data/leela96-dfrc99-T60novdec-v2-T80juntonovjanfebT79aprmayT78jantosepT77dec-v6dd-T80apr.binpack \
--nnue-pytorch-branch linrock/nnue-pytorch/misc-fixes-L1-1536 \
--early-fen-skipping 27 \
--start-lambda 1.0 \
--end-lambda 0.7 \
--max_epoch 800 \
--start-from-engine-test-net False \
--start-from-model /data/L1-1536-Re5-leela-dfrc-v2-T77toT80small-epoch639.nnue \
--lr 4.375e-4 \
--gamma 0.995 \
--tui False \
--seed $RANDOM \
--gpus "0,"
For preparing the step6 leela-dfrc-v2-T77decT78janfebT79aprT80apr.binpack dataset:
python3 interleave_binpacks.py \
leela96-filt-v2.binpack \
dfrc99-16tb7p-eval-filt-v2.binpack \
test77-dec2021-16tb7p.no-db.min-mar2023.binpack \
test78-janfeb2022-16tb7p.no-db.min-mar2023.binpack \
test79-apr2022-16tb7p-filter-v6-dd.binpack \
test80-apr2022-16tb7p.no-db.min-mar2023.binpack \
/data/leela-dfrc-v2-T77decT78janfebT79aprT80apr.binpack
The unfiltered Leela data used for the step6 dataset can be found at:
https://robotmoon.com/nnue-training-data
Local elo at 25k nodes per move:
nn-epoch619.nnue : 2.3 +/- 1.9
Passed STC:
https://tests.stockfishchess.org/tests/view/6480d43c6e6ce8d9fc6d7cc8
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 40992 W: 11017 L: 10706 D: 19269
Ptnml(0-2): 113, 4400, 11170, 4689, 124
Passed LTC:
https://tests.stockfishchess.org/tests/view/648119ac6e6ce8d9fc6d8208
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 129174 W: 35059 L: 34579 D: 59536
Ptnml(0-2): 66, 12548, 38868, 13050, 55
closes https://github.com/official-stockfish/Stockfish/pull/4611
bench: 2370027
Created by retraining an earlier epoch of the experiment leading to the first SFNNv6 net
on a more-randomized version of the nn-e1fb1ade4432.nnue dataset mixed with unfiltered
T80 apr2023 data. Trained using early-fen-skipping 28 and max-epoch 960.
The trainer settings and epochs used in the 5-step training sequence leading here were:
1. train from scratch for 400 epochs, lambda 1.0, constant LR 9.75e-4, T79T77-filter-v6-dd.min.binpack
2. retrain ep379, max-epoch 800, end-lambda 0.75, T60T70wIsRightFarseerT60T74T75T76.binpack
3. retrain ep679, max-epoch 800, end-lambda 0.75, skip 28, nn-e1fb1ade4432 dataset
4. retrain ep799, max-epoch 800, end-lambda 0.7, skip 28, nn-e1fb1ade4432 dataset
5. retrain ep439, max-epoch 960, end-lambda 0.7, skip 28, shuffled nn-e1fb1ade4432 + T80 apr2023
This net was epoch 559 of the final (step 5) retraining:
```bash
python3 easy_train.py \
--experiment-name L1-1536-Re4-leela96-dfrc99-T60novdec-v2-T80juntonovjanfebT79aprmayT78jantosepT77dec-v6dd-T80apr-shuffled-sk28 \
--training-dataset /data/leela96-dfrc99-T60novdec-v2-T80juntonovjanfebT79aprmayT78jantosepT77dec-v6dd-T80apr.binpack \
--nnue-pytorch-branch linrock/nnue-pytorch/misc-fixes-L1-1536 \
--early-fen-skipping 28 \
--start-lambda 1.0 \
--end-lambda 0.7 \
--max_epoch 960 \
--start-from-engine-test-net False \
--start-from-model /data/L1-1536-Re3-nn-epoch439.nnue \
--engine-test-branch linrock/Stockfish/L1-1536 \
--lr 4.375e-4 \
--gamma 0.995 \
--tui False \
--seed $RANDOM \
--gpus "0,"
```
During data preparation, most binpacks were unminimized by removing positions with
score 32002 (`VALUE_NONE`). This makes the tradeoff of increasing dataset filesize
on disk to increase the randomness of positions in interleaved datasets.
The code used for unminimizing is at:
https://github.com/linrock/Stockfish/tree/tools-unminify
For preparing the dataset used in this experiment:
```bash
python3 interleave_binpacks.py \
leela96-filt-v2.binpack \
dfrc99-16tb7p-eval-filt-v2.binpack \
filt-v6-dd-min/test60-novdec2021-12tb7p-filter-v6-dd.min-mar2023.unmin.binpack \
filt-v6-dd-min/test80-aug2022-16tb7p-filter-v6-dd.min-mar2023.unmin.binpack \
filt-v6-dd-min/test80-sep2022-16tb7p-filter-v6-dd.min-mar2023.unmin.binpack \
filt-v6-dd-min/test80-jun2022-16tb7p-filter-v6-dd.min-mar2023.unmin.binpack \
filt-v6-dd/test80-jul2022-16tb7p-filter-v6-dd.binpack \
filt-v6-dd/test80-oct2022-16tb7p-filter-v6-dd.binpack \
filt-v6-dd/test80-nov2022-16tb7p-filter-v6-dd.binpack \
filt-v6-dd-min/test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.unmin.binpack \
filt-v6-dd-min/test80-feb2023-16tb7p-filter-v6-dd.min-mar2023.unmin.binpack \
filt-v6-dd/test79-apr2022-16tb7p-filter-v6-dd.binpack \
filt-v6-dd/test79-may2022-16tb7p-filter-v6-dd.binpack \
filt-v6-dd-min/test78-jantomay2022-16tb7p-filter-v6-dd.min-mar2023.unmin.binpack \
filt-v6-dd/test78-juntosep2022-16tb7p-filter-v6-dd.binpack \
filt-v6-dd/test77-dec2021-16tb7p-filter-v6-dd.binpack \
test80-apr2023-2tb7p.binpack \
/data/leela96-dfrc99-T60novdec-v2-T80juntonovjanfebT79aprmayT78jantosepT77dec-v6dd-T80apr.binpack
```
T80 apr2023 data was converted using lc0-rescorer with ~2tb of tablebases and can be found at:
https://robotmoon.com/nnue-training-data/
Local elo at 25k nodes per move vs. nn-e1fb1ade4432.nnue (L1 size 1024):
nn-epoch559.nnue : 25.7 +/- 1.6
Passed STC:
https://tests.stockfishchess.org/tests/view/647cd3b87cf638f0f53f9cbb
LLR: 2.95 (-2.94,2.94) <0.00,2.00>
Total: 59200 W: 16000 L: 15660 D: 27540
Ptnml(0-2): 159, 6488, 15996, 6768, 189
Passed LTC:
https://tests.stockfishchess.org/tests/view/647d58de726f6b400e4085d8
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 58800 W: 16002 L: 15657 D: 27141
Ptnml(0-2): 44, 5607, 17748, 5962, 39
closes https://github.com/official-stockfish/Stockfish/pull/4606
bench 2141197
Created by training a new net from scratch with L1 size increased from 1024 to 1536.
Thanks to Vizvezdenec for the idea of exploring larger net sizes after recent
training data improvements.
A new net was first trained with lambda 1.0 and constant LR 8.75e-4. Then a strong net
from a later epoch in the training run was chosen for retraining with start-lambda 1.0
and initial LR 4.375e-4 decaying with gamma 0.995. Retraining was performed a total of
3 times, for this 4-step process:
1. 400 epochs, lambda 1.0 on filtered T77+T79 v6 deduplicated data
2. 800 epochs, end-lambda 0.75 on T60T70wIsRightFarseerT60T74T75T76.binpack
3. 800 epochs, end-lambda 0.75 and early-fen-skipping 28 on the master dataset
4. 800 epochs, end-lambda 0.7 and early-fen-skipping 28 on the master dataset
In the training sequence that reached the new nn-8d69132723e2.nnue net,
the epochs used for the 3x retraining runs were:
1. epoch 379 trained on T77T79-filter-v6-dd.min.binpack
2. epoch 679 trained on T60T70wIsRightFarseerT60T74T75T76.binpack
3. epoch 799 trained on the master dataset
For training from scratch:
python3 easy_train.py \
--experiment-name new-L1-1536-T77T79-filter-v6dd \
--training-dataset /data/T77T79-filter-v6-dd.min.binpack \
--max_epoch 400 \
--lambda 1.0 \
--start-from-engine-test-net False \
--engine-test-branch linrock/Stockfish/L1-1536 \
--nnue-pytorch-branch linrock/Stockfish/misc-fixes-L1-1536 \
--tui False \
--gpus "0," \
--seed $RANDOM
Retraining commands were similar to each other. For the 3rd retraining run:
python3 easy_train.py \
--experiment-name L1-1536-T77T79-v6dd-Re1-LeelaFarseer-Re2-masterDataset-Re3-sameData \
--training-dataset /data/leela96-dfrc99-v2-T60novdecT80juntonovjanfebT79aprmayT78jantosepT77dec-v6dd.binpack \
--early-fen-skipping 28 \
--max_epoch 800 \
--start-lambda 1.0 \
--end-lambda 0.7 \
--lr 4.375e-4 \
--gamma 0.995 \
--start-from-engine-test-net False \
--start-from-model /data/L1-1536-T77T79-v6dd-Re1-LeelaFarseer-Re2-masterDataset-nn-epoch799.nnue \
--engine-test-branch linrock/Stockfish/L1-1536 \
--nnue-pytorch-branch linrock/nnue-pytorch/misc-fixes-L1-1536 \
--tui False \
--gpus "0," \
--seed $RANDOM
The T77+T79 data used is a subset of the master dataset available at:
https://robotmoon.com/nnue-training-data/
T60T70wIsRightFarseerT60T74T75T76.binpack is available at:
https://drive.google.com/drive/folders/1S9-ZiQa_3ApmjBtl2e8SyHxj4zG4V8gG
Local elo at 25k nodes per move vs. nn-e1fb1ade4432.nnue (L1 size 1024):
nn-epoch759.nnue : 26.9 +/- 1.6
Failed STC
https://tests.stockfishchess.org/tests/view/64742485d29264e4cfa75f97
LLR: -2.94 (-2.94,2.94) <0.00,2.00>
Total: 13728 W: 3588 L: 3829 D: 6311
Ptnml(0-2): 71, 1661, 3610, 1482, 40
Failing LTC
https://tests.stockfishchess.org/tests/view/64752d7c4a36543c4c9f3618
LLR: -1.91 (-2.94,2.94) <0.50,2.50>
Total: 35424 W: 9522 L: 9603 D: 16299
Ptnml(0-2): 24, 3579, 10585, 3502, 22
Passed VLTC 180+1.8
https://tests.stockfishchess.org/tests/view/64752df04a36543c4c9f3638
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 47616 W: 13174 L: 12863 D: 21579
Ptnml(0-2): 13, 4261, 14952, 4566, 16
Passed VLTC SMP 60+0.6 th 8
https://tests.stockfishchess.org/tests/view/647446ced29264e4cfa761e5
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 19942 W: 5694 L: 5451 D: 8797
Ptnml(0-2): 6, 1504, 6707, 1749, 5
closes https://github.com/official-stockfish/Stockfish/pull/4593
bench 2222567
Created by retraining nn-dabb1ed23026.nnue with a dataset composed of:
* The previous best dataset (nn-1ceb1a57d117.nnue dataset)
* Adding de-duplicated T80 data from feb2023 and the last 10 days of jan2023, filtered with v6-dd
Initially trained with the same options as the recent master net (nn-1ceb1a57d117.nnue).
Around epoch 890, training was manually stopped and max epoch increased to 1000.
```
python3 easy_train.py \
--experiment-name leela96-dfrc99-T60novdec-v2-T80augsep-v6-T80junjuloctnovjanfebT79aprmayT78jantosepT77dec-v6dd \
--training-dataset /data/leela96-dfrc99-T60novdec-v2-T80augsep-v6-T80junjuloctnovjanfebT79aprmayT78jantosepT77dec-v6dd.binpack \
--nnue-pytorch-branch linrock/nnue-pytorch/misc-fixes \
--start-from-engine-test-net True \
--early-fen-skipping 30 \
--start-lambda 1.0 \
--end-lambda 0.7 \
--max_epoch 900 \
--lr 4.375e-4 \
--gamma 0.995 \
--tui False \
--gpus "0," \
--seed $RANDOM
```
The same v6-dd filtering and binpack minimizer was used for preparing the recent nn-1ceb1a57d117.nnue dataset.
```
python3 interleave_binpacks.py \
leela96-filt-v2.binpack \
dfrc99-filt-v2.binpack \
T60-nov2021-12tb7p-eval-filt-v2.binpack \
T60-dec2021-12tb7p-eval-filt-v2.binpack \
filt-v6/test80-aug2022-16tb7p-filter-v6.min-mar2023.binpack \
filt-v6/test80-sep2022-16tb7p-filter-v6.min-mar2023.binpack \
filt-v6-dd/test80-jun2022-16tb7p-filter-v6-dd.min-mar2023.binpack \
filt-v6-dd/test80-jul2022-16tb7p-filter-v6-dd.binpack \
filt-v6-dd/test80-oct2022-16tb7p-filter-v6-dd.binpack \
filt-v6-dd/test80-nov2022-16tb7p-filter-v6-dd.binpack \
filt-v6-dd/test80-jan2022-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack \
filt-v6-dd/test80-feb2023-16tb7p-filter-v6-dd.min-mar2023.binpack \
filt-v6-dd/test79-apr2022-16tb7p-filter-v6-dd.binpack \
filt-v6-dd/test79-may2022-16tb7p-filter-v6-dd.binpack \
filt-v6-dd/test78-jantomay2022-16tb7p-filter-v6-dd.min-mar2023.binpack \
filt-v6-dd/test78-juntosep2022-16tb7p-filter-v6-dd.binpack \
filt-v6-dd/test77-dec2021-16tb7p-filter-v6-dd.binpack \
/data/leela96-dfrc99-T60novdec-v2-T80augsep-v6-T80junjuloctnovjanfebT79aprmayT78jantosepT77dec-v6dd.binpack
```
Links for downloading the training data components can be found at:
https://robotmoon.com/nnue-training-data/
Local elo at 25k nodes per move:
nn-epoch919.nnue : 2.6 +/- 2.8
Passed STC vs. nn-dabb1ed23026.nnue
https://tests.stockfishchess.org/tests/view/644420df94ff3db5625f2af5
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 125960 W: 33898 L: 33464 D: 58598
Ptnml(0-2): 351, 13920, 34021, 14320, 368
Passed LTC vs. nn-1ceb1a57d117.nnue
https://tests.stockfishchess.org/tests/view/64469f128d30316529b3dc46
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 24544 W: 6817 L: 6542 D: 11185
Ptnml(0-2): 8, 2252, 7488, 2505, 19
closes https://github.com/official-stockfish/Stockfish/pull/4546
bench 3714847
* Extending v6 filtering to data from T77 dec2021, T79 may2022, and T80 nov2022
* Reducing the number of duplicate positions, prioritizing position scores seen later in time
* Using a binpack minimizer to reduce the overall data size
Trained the same way as the previous master net, aside from the dataset changes:
```
python3 easy_train.py \
--experiment-name leela96-dfrc99-T60novdec-v2-T80augsep-v6-T80junjuloctnovT79aprmayT78jantosepT77dec-v6dd \
--training-dataset /data/leela96-dfrc99-T60novdec-v2-T80augsep-v6-T80junjuloctnovT79aprmayT78jantosepT77dec-v6dd.binpack \
--nnue-pytorch-branch linrock/nnue-pytorch/misc-fixes \
--start-from-engine-test-net True \
--early-fen-skipping 30 \
--start-lambda 1.0 \
--end-lambda 0.7 \
--max_epoch 900 \
--lr 4.375e-4 \
--gamma 0.995 \
--tui False \
--gpus "0," \
--seed $RANDOM
```
The new v6-dd filtering reduces duplicate positions by iterating over hourly data files within leela test runs, starting with the most recent, then keeping positions the first time they're seen and ignoring positions that are seen again. This ordering was done with the assumption that position scores seen later in time are generally more accurate than scores seen earlier in the test run. Positions are de-duplicated based on piece orientations, the first token in fen strings.
The binpack minimizer was run with default settings after first merging monthly data into single binpacks.
```
python3 interleave_binpacks.py \
leela96-filt-v2.binpack \
dfrc99-filt-v2.binpack \
T60-nov2021-12tb7p-eval-filt-v2.binpack \
T60-dec2021-12tb7p-eval-filt-v2.binpack \
filt-v6/test80-aug2022-16tb7p-filter-v6.min-mar2023.binpack \
filt-v6/test80-sep2022-16tb7p-filter-v6.min-mar2023.binpack \
filt-v6-dd/test80-jun2022-16tb7p-filter-v6-dd.min-mar2023.binpack \
filt-v6-dd/test80-jul2022-16tb7p-filter-v6-dd.binpack \
filt-v6-dd/test80-oct2022-16tb7p-filter-v6-dd.binpack \
filt-v6-dd/test80-nov2022-16tb7p-filter-v6-dd.binpack \
filt-v6-dd/test79-apr2022-16tb7p-filter-v6-dd.binpack \
filt-v6-dd/test79-may2022-16tb7p-filter-v6-dd.binpack \
filt-v6-dd/test78-jantomay2022-16tb7p-filter-v6-dd.min-mar2023.binpack \
filt-v6-dd/test78-juntosep2022-16tb7p-filter-v6-dd.binpack \
filt-v6-dd/test77-dec2021-16tb7p-filter-v6-dd.binpack \
/data/leela96-dfrc99-T60novdec-v2-T80augsep-v6-T80junjuloctnovT79aprmayT78jantosepT77dec-v6dd.binpack
```
The code for v6-dd filtering is available along with training data preparation scripts at:
https://github.com/linrock/nnue-data
Links for downloading the training data components:
https://robotmoon.com/nnue-training-data/
The binpack minimizer is from: #4447
Local elo at 25k nodes per move:
nn-epoch859.nnue : 1.2 +/- 2.6
Passed STC:
https://tests.stockfishchess.org/tests/view/643aad7db08900ff1bc5a832
LLR: 2.93 (-2.94,2.94) <0.00,2.00>
Total: 565040 W: 150225 L: 149162 D: 265653
Ptnml(0-2): 1875, 62137, 153229, 63608, 1671
Passed LTC:
https://tests.stockfishchess.org/tests/view/643ecf2fa43cf30e719d2042
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 1014840 W: 274645 L: 272456 D: 467739
Ptnml(0-2): 515, 98565, 306970, 100956, 414
closes https://github.com/official-stockfish/Stockfish/pull/4545
bench 3476305
Created by retraining the master net with these modifications:
* New filtering methods for existing data from T80 sep+oct2022, T79 apr2022, T78 jun+jul+aug+sep2022, T77 dec2021
* Adding new filtered data from T80 aug2022 and T78 apr+may2022
* Increasing early-fen-skipping from 28 to 30
```
python3 easy_train.py \
--experiment-name leela96-dfrc99-T80novT79mayT60novdec-v2-T80augsepoctT79aprT78aprtosep-v6-T77dec-v3-sk30 \
--training-dataset /data/leela96-dfrc99-T80novT79mayT60novdec-v2-T80augsepoctT79aprT78aprtosep-v6-T77dec-v3.binpack \
--nnue-pytorch-branch linrock/nnue-pytorch/misc-fixes \
--start-from-engine-test-net True \
--early-fen-skipping 30 \
--max_epoch 900 \
--start-lambda 1.0 \
--end-lambda 0.7 \
--lr 4.375e-4 \
--gamma 0.995 \
--tui False \
--gpus "0," \
--seed $RANDOM
```
The v3 filtering used for data from T77dec 2021 differs from v2 filtering in that:
* To improve binpack compression, positions after ply 28 were skipped during training by setting position scores to VALUE_NONE (32002) instead of removing them entirely
* All early-game positions with ply <= 28 were removed to maximize binpack compression
* Only bestmove captures at d6pv2 search were skipped, not 2nd bestmove captures
* Binpack compression was repaired for the remaining positions by effectively replacing bestmoves with "played moves" to maintain contiguous sequences of positions in the training game data
After improving binpack compression, The T77 dec2021 data size was reduced from 95G to 19G.
The v6 filtering used for data from T80augsepoctT79aprT78aprtosep 2022 differs from v2 in that:
* All positions with only one legal move were removed
* Tighter score differences at d6pv2 search were used to remove more positions with only one good move than before
* d6pv2 search was not used to remove positions where the best 2 moves were captures
```
python3 interleave_binpacks.py \
nn-547-dataset/leela96-eval-filt-v2.binpack \
nn-547-dataset/dfrc99-eval-filt-v2.binpack \
nn-547-dataset/test80-nov2022-12tb7p-eval-filt-v2-d6.binpack \
nn-547-dataset/T79-may2022-12tb7p-eval-filt-v2.binpack \
nn-547-dataset/T60-nov2021-12tb7p-eval-filt-v2.binpack \
nn-547-dataset/T60-dec2021-12tb7p-eval-filt-v2.binpack \
filt-v6/test80-aug2022-16tb7p-filter-v6.binpack \
filt-v6/test80-sep2022-16tb7p-filter-v6.binpack \
filt-v6/test80-oct2022-16tb7p-filter-v6.binpack \
filt-v6/test79-apr2022-16tb7p-filter-v6.binpack \
filt-v6/test78-aprmay2022-16tb7p-filter-v6.binpack \
filt-v6/test78-junjulaug2022-16tb7p-filter-v6.binpack \
filt-v6/test78-sep2022-16tb7p-filter-v6.binpack \
filt-v3/test77-dec2021-16tb7p-filt-v3.binpack \
/data/leela96-dfrc99-T80novT79mayT60novdec-v2-T80augsepoctT79aprT78aprtosep-v6-T77dec-v3.binpack
```
The code for the new data filtering methods is available at:
https://github.com/linrock/Stockfish/tree/nnue-data-v3/nnue-data
The code for giving hexword names to .nnue files is at:
https://github.com/linrock/nnue-namer
Links for downloading the training data components can be found at:
https://robotmoon.com/nnue-training-data/
Local elo at 25k nodes per move:
nn-epoch779.nnue : 0.6 +/- 3.1
Passed STC:
https://tests.stockfishchess.org/tests/view/64212412db43ab2ba6f8efb0
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 82256 W: 22185 L: 21809 D: 38262
Ptnml(0-2): 286, 9065, 22067, 9407, 303
Passed LTC:
https://tests.stockfishchess.org/tests/view/64223726db43ab2ba6f91d6c
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 30840 W: 8437 L: 8149 D: 14254
Ptnml(0-2): 14, 2891, 9323, 3177, 15
closes https://github.com/official-stockfish/Stockfish/pull/4465
bench 5101970
This patch introduces `hint_common_parent_position()` to signal that potentially several child nodes will require an NNUE eval. By populating explicitly the accumulator, these subsequent evaluations can be performed more efficiently.
This was based on the observation that calculating the evaluation in an excluded move position yielded a significant Elo gain, even though the evaluation itself was already available (work by pb00067).
Sopel wrote the code to perform just the accumulator update. This PR is based on cleaned up code that
passed STC:
https://tests.stockfishchess.org/tests/view/63f62f9be74a12625bcd4aa0
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 110368 W: 29607 L: 29167 D: 51594
Ptnml(0-2): 41, 10551, 33572, 10967, 53
and in an the earlier (equivalent) version
passed STC:
https://tests.stockfishchess.org/tests/view/63f3c3fee74a12625bcce2a6
LLR: 2.95 (-2.94,2.94) <0.00,2.00>
Total: 47552 W: 12786 L: 12467 D: 22299
Ptnml(0-2): 120, 5107, 12997, 5438, 114
passed LTC:
https://tests.stockfishchess.org/tests/view/63f45cc2e74a12625bccfa63
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 110368 W: 29607 L: 29167 D: 51594
Ptnml(0-2): 41, 10551, 33572, 10967, 53
closes https://github.com/official-stockfish/Stockfish/pull/4402
Bench: 3726250
Created by retraining the master net on a dataset composed of:
* Most of the previous best dataset filtered to remove positions likely having only one good move
* Adding training data from Leela T77 dec2021 rescored with 16tb of 7-piece tablebases
Trained with end lambda 0.7 and max epoch 900. Positions with ply <= 28 were removed from most of the previous best dataset before training began. A new nnue-pytorch trainer param for skipping early plies was used to skip plies <= 24 in the unfiltered and additional Leela T77 parts of the dataset.
```
python easy_train.py \
--experiment-name leela96-dfrc99-T80octnovT79aprmayT60novdec-eval-filt-v2-T78augsep-12tb-T77dec-16tb-lambda7-sk24 \
--training-dataset /data/leela96-dfrc99-T80octnovT79aprmayT60novdec-eval-filt-v2-T78augsep-12tb-T77dec-16tb.binpack \
--nnue-pytorch-branch linrock/nnue-pytorch/easy-train-early-fen-skipping \
--early-fen-skipping 24 \
--gpus "0," \
--start-from-engine-test-net True \
--start-lambda 1.0 \
--end-lambda 0.7 \
--gamma 0.995 \
--lr 4.375e-4 \
--tui False \
--seed $RANDOM \
--max_epoch 900
```
The depth6 multipv2 search filtering method is the same as the one used for filtering recent best datasets, with a lower eval difference threshold to remove slightly more positions than before. These parts of the dataset were filtered:
* 96% of T60T70wIsRightFarseerT60T74T75T76.binpack
* 99% of dfrc_n5000.binpack
* T80 oct + nov 2022 data, no positions with castling flags, rescored with ~600gb 7p tablebases
* T79 apr + may 2022 data, rescored with 12tb 7p tablebases
* T60 nov + dec 2021 data, rescored with 12tb 7p tablebases
These parts of the dataset were not filtered. Positions with ply <= 24 were skipped during training:
* T78 aug + sep 2022 data, rescored with 12tb 7p tablebases
* 84% of T77 dec 2021 data, rescored with 16tb 7p tablebases
The code and exact evaluation thresholds used for data filtering can be found at:
https://github.com/linrock/Stockfish/tree/tools-filter-multipv2-eval-diff-t2/src/filter
The exact training data used can be found at:
https://robotmoon.com/nnue-training-data/
Local elo at 25k nodes per move:
nn-epoch859.nnue : 3.5 +/ 1.2
Passed STC:
LLR: 2.95 (-2.94,2.94) <0.00,2.00>
https://tests.stockfishchess.org/tests/view/63dfeefc73223e7f52ad769f
Total: 219744 W: 58572 L: 58002 D: 103170
Ptnml(0-2): 609, 24446, 59284, 24832, 701
Passed LTC:
https://tests.stockfishchess.org/tests/view/63e268fc73223e7f52ade7b6
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 91256 W: 24528 L: 24121 D: 42607
Ptnml(0-2): 48, 8863, 27390, 9288, 39
closes https://github.com/official-stockfish/Stockfish/pull/4387
bench 3841998
Created by retraining the master net with Leela T78 data from Aug+Sep 2022 added to the previous best dataset. Trained with end lambda 0.7 and started with max epoch 800. All positions with ply <= 28 were skipped:
```
python easy_train.py \
--experiment-name leela95-dfrc96-filt-only-T80octnov-T60novdecT78augsepT79aprmay-12tb7p-sk28-lambda7 \
--training-dataset /data/leela95-dfrc96-filt-only-T80octnov-T60novdecT78augsepT79aprmay-12tb7p.binpack \
--nnue-pytorch-branch linrock/nnue-pytorch/misc-fixes-skip-ply-lteq-28 \
--start-from-engine-test-net True \
--gpus "0," \
--start-lambda 1.0 \
--end-lambda 0.7 \
--gamma 0.995 \
--lr 4.375e-4 \
--tui False \
--seed $RANDOM \
--max_epoch 800
```
Around epoch 750, training was manually paused and max epoch increased to 950 before resuming. The additional Leela training data from T78 was prepared in the same way as the previous best dataset.
The exact training data used can be found at:
https://robotmoon.com/nnue-training-data/
While the local elo ratings during this experiment were much lower than in recent master nets, several later epochs had a consistent elo above zero, and this was hypothesized to represent potential strength at slower time controls.
Local elo at 25k nodes per move
leela95-dfrc96-filt-only-T80octnov-T60novdecT78augsepT79aprmay-12tb7p-sk28-lambda7
nn-epoch819.nnue : 0.4 +/- 1.1 (nn-bc24c101ada0.nnue)
nn-epoch799.nnue : 0.3 +/- 1.2
nn-epoch759.nnue : 0.3 +/- 1.1
nn-epoch839.nnue : 0.2 +/- 1.4
Passed STC
https://tests.stockfishchess.org/tests/view/63cabf6f0eefe8694a0c6013
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 41608 W: 11161 L: 10848 D: 19599
Ptnml(0-2): 116, 4496, 11281, 4781, 130
Passed LTC
https://tests.stockfishchess.org/tests/view/63cb1856344bb01c191af263
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 76760 W: 20517 L: 20137 D: 36106
Ptnml(0-2): 34, 7435, 23070, 7799, 42
closes https://github.com/official-stockfish/Stockfish/pull/4351
bench 3941848
Created by retraining the master net on a dataset composed of:
* The Leela-dfrc_n5000.binpack dataset filtered with depth6 multipv2 search to remove positions with only one good move, in addition to removing positions where either of the two best moves are captures
* The same Leela T80 oct+nov 2022 training data used in recent best datasets
* Additional Leela training data from T60 nov+dec 2021 and T79 apr+may 2022
Trained with end lambda 0.7 and started with max epoch 800. All positions with ply <= 28 were skipped:
```
python easy_train.py \
--experiment-name leela95-dfrc96-mpv-eval-fonly-T80octnov-T79aprmayT60novdec-12tb7p-sk28-lambda7 \
--training-dataset /data/leela95-dfrc96-mpv-eval-fonly-T80octnov-T79aprmayT60novdec-12tb7p.binpack \
--nnue-pytorch-branch linrock/nnue-pytorch/misc-fixes-skip-ply-lteq-28 \
--start-from-engine-test-net True \
--gpus "0," \
--start-lambda 1.0 \
--end-lambda 0.7 \
--gamma 0.995 \
--lr 4.375e-4 \
--tui False \
--seed $RANDOM \
--max_epoch 800
```
Around epoch 780, training was manually paused and max epoch increased to 920 before resuming.
During depth6 multipv2 data filtering, positions were considered to have only one good move if the score of the best move was significantly better than the 2nd best move in a way that changes the outcome of the game:
* the best move leads to a significant advantage while the 2nd best move equalizes or loses
* the best move is about equal while the 2nd best move loses
The modified stockfish branch and exact score thresholds used for filtering are at:
https://github.com/linrock/Stockfish/tree/tools-filter-multipv2-eval-diff/src/filter
About 95% of the Leela portion and 96% of the DFRC portion of the Leela-dfrc_n5000.binpack dataset was filtered. Unfiltered parts of the dataset were left out.
The additional Leela training data from T60 nov+dec 2021 and T79 apr+may 2022 was WDL-rescored with about 12TB of syzygy 7-piece tablebases where the material difference is less than around 6 pawns. Best moves were exported to .plain data files during data conversion with the lc0 rescorer.
The exact training data can be found at:
https://robotmoon.com/nnue-training-data/
Local elo at 25k nodes per move
experiment_leela95-dfrc96-mpv-eval-fonly-T80octnov-T79aprmayT60novdec-12tb7p-sk28-lambda7
run_0/nn-epoch899.nnue : 3.8 +/- 1.6
Passed STC
https://tests.stockfishchess.org/tests/view/63bed1f540aa064159b9c89b
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 103344 W: 27392 L: 26991 D: 48961
Ptnml(0-2): 333, 11223, 28099, 11744, 273
Passed LTC
https://tests.stockfishchess.org/tests/view/63c010415705810de2deb3ec
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 21712 W: 5891 L: 5619 D: 10202
Ptnml(0-2): 12, 2022, 6511, 2304, 7
closes https://github.com/official-stockfish/Stockfish/pull/4338
bench 4106793
This is a later epoch (epoch 859) from the same experiment run that trained yesterday's master net nn-60fa44e376d9.nnue (epoch 779). The experiment was manually paused around epoch 790 and unpaused with max epoch increased to 900 mainly to get more local elo data without letting the GPU idle.
nn-60fa44e376d9.nnue is from #4314
nn-335a9b2d8a80.nnue is from #4295
Local elo vs. nn-335a9b2d8a80.nnue at 25k nodes per move:
experiment_leela93-dfrc99-filt-only-T80-oct-nov-skip28
run_0/nn-epoch779.nnue (nn-60fa44e376d9.nnue) : 5.0 +/- 1.2
run_0/nn-epoch859.nnue (nn-a3dc078bafc7.nnue) : 5.6 +/- 1.6
Passed STC vs. nn-335a9b2d8a80.nnue
https://tests.stockfishchess.org/tests/view/63ae10495bd1e5f27f13d94f
LLR: 2.95 (-2.94,2.94) <0.00,2.00>
Total: 37536 W: 10088 L: 9781 D: 17667
Ptnml(0-2): 110, 4006, 10223, 4325, 104
An LTC test vs. nn-335a9b2d8a80.nnue was paused due to nn-60fa44e376d9.nnue passing LTC first:
https://tests.stockfishchess.org/tests/view/63ae5d34331d5fca5113703b
Passed LTC vs. nn-60fa44e376d9.nnue
https://tests.stockfishchess.org/tests/view/63af1e41465d2b022dbce4e7
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 148704 W: 39672 L: 39155 D: 69877
Ptnml(0-2): 59, 14443, 44843, 14936, 71
closes https://github.com/official-stockfish/Stockfish/pull/4319
bench 3984365
Created by retraining the master net on the previous best dataset with additional filtering. No new data was added.
More of the Leela-dfrc_n5000.binpack part of the dataset was pre-filtered with depth6 multipv2 search to remove bestmove captures. About 93% of the previous Leela/SF data and 99% of the SF dfrc data was filtered. Unfiltered parts of the dataset were left out. The new Leela T80 oct+nov data is the same as before. All early game positions with ply count <= 28 were skipped during training by modifying the training data loader in nnue-pytorch.
Trained in a similar way as recent master nets, with a different nnue-pytorch branch for early ply skipping:
python3 easy_train.py \
--experiment-name=leela93-dfrc99-filt-only-T80-oct-nov-skip28 \
--training-dataset=/data/leela93-dfrc99-filt-only-T80-oct-nov.binpack \
--start-from-engine-test-net True \
--nnue-pytorch-branch=linrock/nnue-pytorch/misc-fixes-skip-ply-lteq-28 \
--gpus="0," \
--start-lambda=1.0 \
--end-lambda=0.75 \
--gamma=0.995 \
--lr=4.375e-4 \
--tui=False \
--seed=$RANDOM \
--max_epoch=800 \
--network-testing-threads 20 \
--num-workers 6
For the exact training data used: https://robotmoon.com/nnue-training-data/
Details about the previous best dataset: #4295
Local testing at a fixed 25k nodes:
experiment_leela93-dfrc99-filt-only-T80-oct-nov-skip28
Local Elo: run_0/nn-epoch779.nnue : 5.1 +/- 1.5
Passed STC
https://tests.stockfishchess.org/tests/view/63adb3acae97a464904fd4e8
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 36504 W: 9847 L: 9538 D: 17119
Ptnml(0-2): 108, 3981, 9784, 4252, 127
Passed LTC
https://tests.stockfishchess.org/tests/view/63ae0ae25bd1e5f27f13d884
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 36592 W: 10017 L: 9717 D: 16858
Ptnml(0-2): 17, 3461, 11037, 3767, 14
closes https://github.com/official-stockfish/Stockfish/pull/4314
bench 4015511
Created by retraining the master net with a combination of:
the previous best dataset (Leela-dfrc_n5000.binpack), with about half the dataset filtered using depth6 multipv2 search to throw away positions where either of the 2 best moves are captures
Leela T80 Oct and Nov training data rescored with best moves, adding ~9.5 billion positions
Trained effectively the same way as the previous master net:
python3 easy_train.py \
--experiment-name=leela-dfrc-filtered-T80-oct-nov \
--training-dataset=/data/leela-dfrc-filtered-T80-oct-nov.binpack \
--start-from-engine-test-net True \
--gpus="0," \
--start-lambda=1.0 \
--end-lambda=0.75 \
--gamma=0.995 \
--lr=4.375e-4 \
--tui=False \
--seed=$RANDOM \
--max_epoch=800 \
--auto-exit-timeout-on-training-finished=900 \
--network-testing-threads 20 \
--num-workers 6
Local testing at a fixed 25k nodes:
experiments/experiment_leela-dfrc-filtered-T80-oct-nov/training/run_0/nn-epoch779.nnue
localElo: run_0/nn-epoch779.nnue : 4.7 +/- 3.1
The new Leela T80 part of the dataset was prepared by downloading test80 training data from all of Oct 2022 and Nov 2022, rescoring with syzygy 6-piece tablebases and ~600 GB of 7-piece tablebases, saving best moves to exported .plain files, removing all positions with castling flags, then converting to binpacks and using interleave_binpacks.py to merge them together. Scripts used in this data conversion process are available at:
https://github.com/linrock/lc0-data-converter
Filtering binpack data using depth6 multipv2 search was done by modifying transform.cpp in the tools branch:
https://github.com/linrock/Stockfish/tree/tools-filter-multipv2-no-rescore
Links for downloading the training data (total size: 338 GB) are available at:
https://robotmoon.com/nnue-training-data/
Passed STC:
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 30544 W: 8244 L: 7947 D: 14353
Ptnml(0-2): 93, 3243, 8302, 3542, 92
https://tests.stockfishchess.org/tests/view/63a0d377264a0cf18f86f82b
Passed LTC:
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 32464 W: 8866 L: 8573 D: 15025
Ptnml(0-2): 19, 3054, 9794, 3345, 20
https://tests.stockfishchess.org/tests/view/63a10bc9fb452d3c44b1e016
closes https://github.com/official-stockfish/Stockfish/pull/4295
Bench 3554904
First things first...
this PR is being made from court. Today, Tord and Stéphane, with broad support
of the developer community are defending their complaint, filed in Munich, against ChessBase.
With their products Houdini 6 and Fat Fritz 2, both Stockfish derivatives,
ChessBase violated repeatedly the Stockfish GPLv3 license. Tord and Stéphane have terminated
their license with ChessBase permanently. Today we have the opportunity to present
our evidence to the judge and enforce that termination. To read up, have a look at our blog post
https://stockfishchess.org/blog/2022/public-court-hearing-soon/ and
https://stockfishchess.org/blog/2021/our-lawsuit-against-chessbase/
This PR introduces a net trained with an enhanced data set and a modified loss function in the trainer.
A slight adjustment for the scaling was needed to get a pass on standard chess.
passed STC:
https://tests.stockfishchess.org/tests/view/62c0527a49b62510394bd610
LLR: 2.94 (-2.94,2.94) <0.00,2.50>
Total: 135008 W: 36614 L: 36152 D: 62242
Ptnml(0-2): 640, 15184, 35407, 15620, 653
passed LTC:
https://tests.stockfishchess.org/tests/view/62c17e459e7d9997a12d458e
LLR: 2.94 (-2.94,2.94) <0.50,3.00>
Total: 28864 W: 8007 L: 7749 D: 13108
Ptnml(0-2): 47, 2810, 8466, 3056, 53
Local testing at a fixed 25k nodes resulted in
Test run1026/easy_train_data/experiments/experiment_2/training/run_0/nn-epoch799.nnue
localElo: 4.2 +- 1.6
The real strength of the net is in FRC and DFRC chess where it gains significantly.
Tested at STC with slightly different scaling:
FRC:
https://tests.stockfishchess.org/tests/view/62c13a4002ba5d0a774d20d4
Elo: 29.78 +-3.4 (95%) LOS: 100.0%
Total: 10000 W: 2007 L: 1152 D: 6841
Ptnml(0-2): 31, 686, 2804, 1355, 124
nElo: 59.24 +-6.9 (95%) PairsRatio: 2.06
DFRC:
https://tests.stockfishchess.org/tests/view/62c13a5702ba5d0a774d20d9
Elo: 55.25 +-3.9 (95%) LOS: 100.0%
Total: 10000 W: 2984 L: 1407 D: 5609
Ptnml(0-2): 51, 636, 2266, 1779, 268
nElo: 96.95 +-7.2 (95%) PairsRatio: 2.98
Tested at LTC with identical scaling:
FRC:
https://tests.stockfishchess.org/tests/view/62c26a3c9e7d9997a12d6caf
Elo: 16.20 +-2.5 (95%) LOS: 100.0%
Total: 10000 W: 1192 L: 726 D: 8082
Ptnml(0-2): 10, 403, 3727, 831, 29
nElo: 44.12 +-6.7 (95%) PairsRatio: 2.08
DFRC:
https://tests.stockfishchess.org/tests/view/62c26a539e7d9997a12d6cb2
Elo: 40.94 +-3.0 (95%) LOS: 100.0%
Total: 10000 W: 2215 L: 1042 D: 6743
Ptnml(0-2): 10, 410, 3053, 1451, 76
nElo: 92.77 +-6.9 (95%) PairsRatio: 3.64
This is due to the mixing in a significant fraction of DFRC training data in the final training round. The net is
trained using the easy_train.py script in the following way:
```
python easy_train.py \
--training-dataset=../Leela-dfrc_n5000.binpack \
--experiment-name=2 \
--nnue-pytorch-branch=vondele/nnue-pytorch/lossScan4 \
--additional-training-arg=--param-index=2 \
--start-lambda=1.0 \
--end-lambda=0.75 \
--gamma=0.995 \
--lr=4.375e-4 \
--start-from-engine-test-net True \
--tui=False \
--seed=$RANDOM \
--max_epoch=800 \
--auto-exit-timeout-on-training-finished=900 \
--network-testing-threads 8 \
--num-workers 12
```
where the data set used (Leela-dfrc_n5000.binpack) is a combination of our previous best data set (mix of Leela and some SF data) and DFRC data, interleaved to form:
The data is available in https://drive.google.com/drive/folders/1S9-ZiQa_3ApmjBtl2e8SyHxj4zG4V8gG?usp=sharing
Leela mix: https://drive.google.com/file/d/1JUkMhHSfgIYCjfDNKZUMYZt6L5I7Ra6G/view?usp=sharing
DFRC: https://drive.google.com/file/d/17vDaff9LAsVo_1OfsgWAIYqJtqR8aHlm/view?usp=sharing
The training branch used is
https://github.com/vondele/nnue-pytorch/commits/lossScan4
A PR to the main trainer repo will be made later. This contains a revised loss function, now computing the loss from the score based on the win rate model, which is a more accurate representation than what we had before. Scaling constants are tweaked there as well.
closes https://github.com/official-stockfish/Stockfish/pull/4100
Bench: 5186781
train a net using training data with a
heavier weight on positions having 16 pieces on the board. More specifically,
with a relative weight of `i * (32-i)/(16 * 16)+1` (where i is the number of pieces on the board).
This is done with the trainer branch https://github.com/glinscott/nnue-pytorch/pull/173
The command used is:
```
python train.py $datafile $datafile $restarttype $restartfile --gpus 1 --threads 4 --num-workers 12 --random-fen-skipping=3 --batch-size 16384 --progress_bar_refresh_rate 300 --smart-fen-skipping --features=HalfKAv2_hm^ --lambda=1.00 --max_epochs=$epochs --seed $RANDOM --default_root_dir exp/run_$i
```
The datafile is T60T70wIsRightFarseerT60T74T75T76.binpack, the restart is from the master net.
passed STC:
LLR: 2.94 (-2.94,2.94) <0.00,2.50>
Total: 22728 W: 6197 L: 5945 D: 10586
Ptnml(0-2): 105, 2453, 6001, 2695, 110
https://tests.stockfishchess.org/tests/view/625cf944ff677a888877cd90
passed LTC:
LLR: 2.94 (-2.94,2.94) <0.50,3.00>
Total: 35664 W: 9535 L: 9264 D: 16865
Ptnml(0-2): 30, 3524, 10455, 3791, 32
https://tests.stockfishchess.org/tests/view/625d3c32ff677a888877d7ca
closes https://github.com/official-stockfish/Stockfish/pull/3989
Bench: 7269563
Architecture:
The diagram of the "SFNNv4" architecture:
https://user-images.githubusercontent.com/8037982/153455685-cbe3a038-e158-4481-844d-9d5fccf5c33a.png
The most important architectural changes are the following:
* 1024x2 [activated] neurons are pairwise, elementwise multiplied (not quite pairwise due to implementation details, see diagram), which introduces a non-linearity that exhibits similar benefits to previously tested sigmoid activation (quantmoid4), while being slightly faster.
* The following layer has therefore 2x less inputs, which we compensate by having 2 more outputs. It is possible that reducing the number of outputs might be beneficial (as we had it as low as 8 before). The layer is now 1024->16.
* The 16 outputs are split into 15 and 1. The 1-wide output is added to the network output (after some necessary scaling due to quantization differences). The 15-wide is activated and follows the usual path through a set of linear layers. The additional 1-wide output is at least neutral, but has shown a slightly positive trend in training compared to networks without it (all 16 outputs through the usual path), and allows possibly an additional stage of lazy evaluation to be introduced in the future.
Additionally, the inference code was rewritten and no longer uses a recursive implementation. This was necessitated by the splitting of the 16-wide intermediate result into two, which was impossible to do with the old implementation with ugly hacks. This is hopefully overall for the better.
First session:
The first session was training a network from scratch (random initialization). The exact trainer used was slightly different (older) from the one used in the second session, but it should not have a measurable effect. The purpose of this session is to establish a strong network base for the second session. Small deviations in strength do not harm the learnability in the second session.
The training was done using the following command:
python3 train.py \
/home/sopel/nnue/nnue-pytorch-training/data/nodes5000pv2_UHO.binpack \
/home/sopel/nnue/nnue-pytorch-training/data/nodes5000pv2_UHO.binpack \
--gpus "$3," \
--threads 4 \
--num-workers 4 \
--batch-size 16384 \
--progress_bar_refresh_rate 20 \
--random-fen-skipping 3 \
--features=HalfKAv2_hm^ \
--lambda=1.0 \
--gamma=0.992 \
--lr=8.75e-4 \
--max_epochs=400 \
--default_root_dir ../nnue-pytorch-training/experiment_$1/run_$2
Every 20th net was saved and its playing strength measured against some baseline at 25k nodes per move with pure NNUE evaluation (modified binary). The exact setup is not important as long as it's consistent. The purpose is to sift good candidates from bad ones.
The dataset can be found https://drive.google.com/file/d/1UQdZN_LWQ265spwTBwDKo0t1WjSJKvWY/view
Second session:
The second training session was done starting from the best network (as determined by strength testing) from the first session. It is important that it's resumed from a .pt model and NOT a .ckpt model. The conversion can be performed directly using serialize.py
The LR schedule was modified to use gamma=0.995 instead of gamma=0.992 and LR=4.375e-4 instead of LR=8.75e-4 to flatten the LR curve and allow for longer training. The training was then running for 800 epochs instead of 400 (though it's possibly mostly noise after around epoch 600).
The training was done using the following command:
The training was done using the following command:
python3 train.py \
/data/sopel/nnue/nnue-pytorch-training/data/T60T70wIsRightFarseerT60T74T75T76.binpack \
/data/sopel/nnue/nnue-pytorch-training/data/T60T70wIsRightFarseerT60T74T75T76.binpack \
--gpus "$3," \
--threads 4 \
--num-workers 4 \
--batch-size 16384 \
--progress_bar_refresh_rate 20 \
--random-fen-skipping 3 \
--features=HalfKAv2_hm^ \
--lambda=1.0 \
--gamma=0.995 \
--lr=4.375e-4 \
--max_epochs=800 \
--resume-from-model /data/sopel/nnue/nnue-pytorch-training/data/exp295/nn-epoch399.pt \
--default_root_dir ../nnue-pytorch-training/experiment_$1/run_$run_id
In particular note that we now use lambda=1.0 instead of lambda=0.8 (previous nets), because tests show that WDL-skipping introduced by vondele performs better with lambda=1.0. Nets were being saved every 20th epoch. In total 16 runs were made with these settings and the best nets chosen according to playing strength at 25k nodes per move with pure NNUE evaluation - these are the 4 nets that have been put on fishtest.
The dataset can be found either at ftp://ftp.chessdb.cn/pub/sopel/data_sf/T60T70wIsRightFarseerT60T74T75T76.binpack in its entirety (download might be painfully slow because hosted in China) or can be assembled in the following way:
Get the 5640ad48ae/script/interleave_binpacks.py script.
Download T60T70wIsRightFarseer.binpack https://drive.google.com/file/d/1_sQoWBl31WAxNXma2v45004CIVltytP8/view
Download farseerT74.binpack http://trainingdata.farseer.org/T74-May13-End.7z
Download farseerT75.binpack http://trainingdata.farseer.org/T75-June3rd-End.7z
Download farseerT76.binpack http://trainingdata.farseer.org/T76-Nov10th-End.7z
Run python3 interleave_binpacks.py T60T70wIsRightFarseer.binpack farseerT74.binpack farseerT75.binpack farseerT76.binpack T60T70wIsRightFarseerT60T74T75T76.binpack
Tests:
STC: https://tests.stockfishchess.org/tests/view/6203fb85d71106ed12a407b7
LLR: 2.94 (-2.94,2.94) <0.00,2.50>
Total: 16952 W: 4775 L: 4521 D: 7656
Ptnml(0-2): 133, 1818, 4318, 2076, 131
LTC: https://tests.stockfishchess.org/tests/view/62041e68d71106ed12a40e85
LLR: 2.94 (-2.94,2.94) <0.50,3.00>
Total: 14944 W: 4138 L: 3907 D: 6899
Ptnml(0-2): 21, 1499, 4202, 1728, 22
closes https://github.com/official-stockfish/Stockfish/pull/3927
Bench: 4919707