faster permutation of master net weights
Activation data taken from https://drive.google.com/drive/folders/1Ec9YuuRx4N03GPnVPoQOW70eucOKngQe?usp=sharing
Permutation found using 836387a0e5/ftperm.py
See also https://github.com/glinscott/nnue-pytorch/pull/254
The algorithm greedily selects 2- and 3-cycles that can be permuted to increase the number of runs of zeroes. The percent of zero runs from the master net increased from 68.46 to 70.11 from 2-cycles and only increased to 70.32 when considering 3-cycles. Interestingly, allowing both halves of L1 to intermix when creating zero runs can give another 0.5% zero-run density increase with this method.
Measured speedup:
```
CPU: 16 x AMD Ryzen 9 3950X 16-Core Processor
Result of 50 runs
base (./stockfish.master ) = 1561556 +/- 5439
test (./stockfish.patch ) = 1575788 +/- 5427
diff = +14231 +/- 2636
speedup = +0.0091
P(speedup > 0) = 1.0000
```
closes https://github.com/official-stockfish/Stockfish/pull/4640
No functional change
using github actions, create a prerelease for the latest commit to master.
As such a development version will be available on github, in addition to the latest release.
closes https://github.com/official-stockfish/Stockfish/pull/4622
No functional change
Implemented LEB128 (de)compression for the feature transformer.
Reduces embedded network size from 70 MiB to 39 Mib.
The new nn-78bacfcee510.nnue corresponds to the master net compressed.
closes https://github.com/official-stockfish/Stockfish/pull/4617
No functional change
Use block sparse input for the first fully connected layer on architectures with at least SSSE3.
Depending on the CPU architecture, this yields a speedup of up to 10%, e.g.
```
Result of 100 runs of 'bench 16 1 13 default depth NNUE'
base (...ockfish-base) = 959345 +/- 7477
test (...ckfish-patch) = 1054340 +/- 9640
diff = +94995 +/- 3999
speedup = +0.0990
P(speedup > 0) = 1.0000
CPU: 8 x AMD Ryzen 7 5700U with Radeon Graphics
Hyperthreading: on
```
Passed STC:
https://tests.stockfishchess.org/tests/view/6485aa0965ffe077ca12409c
LLR: 2.93 (-2.94,2.94) <0.00,2.00>
Total: 8864 W: 2479 L: 2223 D: 4162
Ptnml(0-2): 13, 829, 2504, 1061, 25
This commit includes a net with reordered weights, to increase the likelihood of block sparse inputs,
but otherwise equivalent to the previous master net (nn-ea57bea57e32.nnue).
Activation data collected with https://github.com/AndrovT/Stockfish/tree/log-activations, running bench 16 1 13 varied_1000.epd depth NNUE on this data. Net parameters permuted with https://gist.github.com/AndrovT/9e3fbaebb7082734dc84d27e02094cb3.
closes https://github.com/official-stockfish/Stockfish/pull/4612
No functional change
Created by retraining an earlier epoch (ep659) of the experiment that led to the first SFNNv6 net:
- First retrained on the nn-0dd1cebea573 dataset
- Then retrained with skip 20 on a smaller dataset containing unfiltered Leela data
- And then retrained again with skip 27 on the nn-0dd1cebea573 dataset
The equivalent 7-step training sequence from scratch that led here was:
1. max-epoch 400, lambda 1.0, constant LR 9.75e-4, T79T77-filter-v6-dd.min.binpack
ep379 chosen for retraining in step2
2. max-epoch 800, end-lambda 0.75, T60T70wIsRightFarseerT60T74T75T76.binpack
ep679 chosen for retraining in step3
3. max-epoch 800, end-lambda 0.75, skip 28, nn-e1fb1ade4432 dataset
ep799 chosen for retraining in step4
4. max-epoch 800, end-lambda 0.7, skip 28, nn-e1fb1ade4432 dataset
ep759 became nn-8d69132723e2.nnue (first SFNNv6 net)
ep659 chosen for retraining in step5
5. max-epoch 800, end-lambda 0.7, skip 28, nn-0dd1cebea573 dataset
ep759 chosen for retraining in step6
6. max-epoch 800, end-lambda 0.7, skip 20, leela-dfrc-v2-T77decT78janfebT79aprT80apr.binpack
ep639 chosen for retraining in step7
7. max-epoch 800, end-lambda 0.7, skip 27, nn-0dd1cebea573 dataset
ep619 became nn-ea57bea57e32.nnue
For the last retraining (step7):
python3 easy_train.py
--experiment-name L1-1536-Re6-masterShuffled-ep639-sk27-Re5-leela-dfrc-v2-T77toT80small-Re4-masterShuffled-ep659-Re3-sameAs-Re2-leela96-dfrc99-16t-v2-T60novdecT80juntonovjanfebT79aprmayT78jantosepT77dec-v6dd-Re1-LeelaFarseer-new-T77T79 \
--training-dataset /data/leela96-dfrc99-T60novdec-v2-T80juntonovjanfebT79aprmayT78jantosepT77dec-v6dd-T80apr.binpack \
--nnue-pytorch-branch linrock/nnue-pytorch/misc-fixes-L1-1536 \
--early-fen-skipping 27 \
--start-lambda 1.0 \
--end-lambda 0.7 \
--max_epoch 800 \
--start-from-engine-test-net False \
--start-from-model /data/L1-1536-Re5-leela-dfrc-v2-T77toT80small-epoch639.nnue \
--lr 4.375e-4 \
--gamma 0.995 \
--tui False \
--seed $RANDOM \
--gpus "0,"
For preparing the step6 leela-dfrc-v2-T77decT78janfebT79aprT80apr.binpack dataset:
python3 interleave_binpacks.py \
leela96-filt-v2.binpack \
dfrc99-16tb7p-eval-filt-v2.binpack \
test77-dec2021-16tb7p.no-db.min-mar2023.binpack \
test78-janfeb2022-16tb7p.no-db.min-mar2023.binpack \
test79-apr2022-16tb7p-filter-v6-dd.binpack \
test80-apr2022-16tb7p.no-db.min-mar2023.binpack \
/data/leela-dfrc-v2-T77decT78janfebT79aprT80apr.binpack
The unfiltered Leela data used for the step6 dataset can be found at:
https://robotmoon.com/nnue-training-data
Local elo at 25k nodes per move:
nn-epoch619.nnue : 2.3 +/- 1.9
Passed STC:
https://tests.stockfishchess.org/tests/view/6480d43c6e6ce8d9fc6d7cc8
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 40992 W: 11017 L: 10706 D: 19269
Ptnml(0-2): 113, 4400, 11170, 4689, 124
Passed LTC:
https://tests.stockfishchess.org/tests/view/648119ac6e6ce8d9fc6d8208
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 129174 W: 35059 L: 34579 D: 59536
Ptnml(0-2): 66, 12548, 38868, 13050, 55
closes https://github.com/official-stockfish/Stockfish/pull/4611
bench: 2370027
Created by retraining an earlier epoch of the experiment leading to the first SFNNv6 net
on a more-randomized version of the nn-e1fb1ade4432.nnue dataset mixed with unfiltered
T80 apr2023 data. Trained using early-fen-skipping 28 and max-epoch 960.
The trainer settings and epochs used in the 5-step training sequence leading here were:
1. train from scratch for 400 epochs, lambda 1.0, constant LR 9.75e-4, T79T77-filter-v6-dd.min.binpack
2. retrain ep379, max-epoch 800, end-lambda 0.75, T60T70wIsRightFarseerT60T74T75T76.binpack
3. retrain ep679, max-epoch 800, end-lambda 0.75, skip 28, nn-e1fb1ade4432 dataset
4. retrain ep799, max-epoch 800, end-lambda 0.7, skip 28, nn-e1fb1ade4432 dataset
5. retrain ep439, max-epoch 960, end-lambda 0.7, skip 28, shuffled nn-e1fb1ade4432 + T80 apr2023
This net was epoch 559 of the final (step 5) retraining:
```bash
python3 easy_train.py \
--experiment-name L1-1536-Re4-leela96-dfrc99-T60novdec-v2-T80juntonovjanfebT79aprmayT78jantosepT77dec-v6dd-T80apr-shuffled-sk28 \
--training-dataset /data/leela96-dfrc99-T60novdec-v2-T80juntonovjanfebT79aprmayT78jantosepT77dec-v6dd-T80apr.binpack \
--nnue-pytorch-branch linrock/nnue-pytorch/misc-fixes-L1-1536 \
--early-fen-skipping 28 \
--start-lambda 1.0 \
--end-lambda 0.7 \
--max_epoch 960 \
--start-from-engine-test-net False \
--start-from-model /data/L1-1536-Re3-nn-epoch439.nnue \
--engine-test-branch linrock/Stockfish/L1-1536 \
--lr 4.375e-4 \
--gamma 0.995 \
--tui False \
--seed $RANDOM \
--gpus "0,"
```
During data preparation, most binpacks were unminimized by removing positions with
score 32002 (`VALUE_NONE`). This makes the tradeoff of increasing dataset filesize
on disk to increase the randomness of positions in interleaved datasets.
The code used for unminimizing is at:
https://github.com/linrock/Stockfish/tree/tools-unminify
For preparing the dataset used in this experiment:
```bash
python3 interleave_binpacks.py \
leela96-filt-v2.binpack \
dfrc99-16tb7p-eval-filt-v2.binpack \
filt-v6-dd-min/test60-novdec2021-12tb7p-filter-v6-dd.min-mar2023.unmin.binpack \
filt-v6-dd-min/test80-aug2022-16tb7p-filter-v6-dd.min-mar2023.unmin.binpack \
filt-v6-dd-min/test80-sep2022-16tb7p-filter-v6-dd.min-mar2023.unmin.binpack \
filt-v6-dd-min/test80-jun2022-16tb7p-filter-v6-dd.min-mar2023.unmin.binpack \
filt-v6-dd/test80-jul2022-16tb7p-filter-v6-dd.binpack \
filt-v6-dd/test80-oct2022-16tb7p-filter-v6-dd.binpack \
filt-v6-dd/test80-nov2022-16tb7p-filter-v6-dd.binpack \
filt-v6-dd-min/test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.unmin.binpack \
filt-v6-dd-min/test80-feb2023-16tb7p-filter-v6-dd.min-mar2023.unmin.binpack \
filt-v6-dd/test79-apr2022-16tb7p-filter-v6-dd.binpack \
filt-v6-dd/test79-may2022-16tb7p-filter-v6-dd.binpack \
filt-v6-dd-min/test78-jantomay2022-16tb7p-filter-v6-dd.min-mar2023.unmin.binpack \
filt-v6-dd/test78-juntosep2022-16tb7p-filter-v6-dd.binpack \
filt-v6-dd/test77-dec2021-16tb7p-filter-v6-dd.binpack \
test80-apr2023-2tb7p.binpack \
/data/leela96-dfrc99-T60novdec-v2-T80juntonovjanfebT79aprmayT78jantosepT77dec-v6dd-T80apr.binpack
```
T80 apr2023 data was converted using lc0-rescorer with ~2tb of tablebases and can be found at:
https://robotmoon.com/nnue-training-data/
Local elo at 25k nodes per move vs. nn-e1fb1ade4432.nnue (L1 size 1024):
nn-epoch559.nnue : 25.7 +/- 1.6
Passed STC:
https://tests.stockfishchess.org/tests/view/647cd3b87cf638f0f53f9cbb
LLR: 2.95 (-2.94,2.94) <0.00,2.00>
Total: 59200 W: 16000 L: 15660 D: 27540
Ptnml(0-2): 159, 6488, 15996, 6768, 189
Passed LTC:
https://tests.stockfishchess.org/tests/view/647d58de726f6b400e4085d8
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 58800 W: 16002 L: 15657 D: 27141
Ptnml(0-2): 44, 5607, 17748, 5962, 39
closes https://github.com/official-stockfish/Stockfish/pull/4606
bench 2141197
Created by training a new net from scratch with L1 size increased from 1024 to 1536.
Thanks to Vizvezdenec for the idea of exploring larger net sizes after recent
training data improvements.
A new net was first trained with lambda 1.0 and constant LR 8.75e-4. Then a strong net
from a later epoch in the training run was chosen for retraining with start-lambda 1.0
and initial LR 4.375e-4 decaying with gamma 0.995. Retraining was performed a total of
3 times, for this 4-step process:
1. 400 epochs, lambda 1.0 on filtered T77+T79 v6 deduplicated data
2. 800 epochs, end-lambda 0.75 on T60T70wIsRightFarseerT60T74T75T76.binpack
3. 800 epochs, end-lambda 0.75 and early-fen-skipping 28 on the master dataset
4. 800 epochs, end-lambda 0.7 and early-fen-skipping 28 on the master dataset
In the training sequence that reached the new nn-8d69132723e2.nnue net,
the epochs used for the 3x retraining runs were:
1. epoch 379 trained on T77T79-filter-v6-dd.min.binpack
2. epoch 679 trained on T60T70wIsRightFarseerT60T74T75T76.binpack
3. epoch 799 trained on the master dataset
For training from scratch:
python3 easy_train.py \
--experiment-name new-L1-1536-T77T79-filter-v6dd \
--training-dataset /data/T77T79-filter-v6-dd.min.binpack \
--max_epoch 400 \
--lambda 1.0 \
--start-from-engine-test-net False \
--engine-test-branch linrock/Stockfish/L1-1536 \
--nnue-pytorch-branch linrock/Stockfish/misc-fixes-L1-1536 \
--tui False \
--gpus "0," \
--seed $RANDOM
Retraining commands were similar to each other. For the 3rd retraining run:
python3 easy_train.py \
--experiment-name L1-1536-T77T79-v6dd-Re1-LeelaFarseer-Re2-masterDataset-Re3-sameData \
--training-dataset /data/leela96-dfrc99-v2-T60novdecT80juntonovjanfebT79aprmayT78jantosepT77dec-v6dd.binpack \
--early-fen-skipping 28 \
--max_epoch 800 \
--start-lambda 1.0 \
--end-lambda 0.7 \
--lr 4.375e-4 \
--gamma 0.995 \
--start-from-engine-test-net False \
--start-from-model /data/L1-1536-T77T79-v6dd-Re1-LeelaFarseer-Re2-masterDataset-nn-epoch799.nnue \
--engine-test-branch linrock/Stockfish/L1-1536 \
--nnue-pytorch-branch linrock/nnue-pytorch/misc-fixes-L1-1536 \
--tui False \
--gpus "0," \
--seed $RANDOM
The T77+T79 data used is a subset of the master dataset available at:
https://robotmoon.com/nnue-training-data/
T60T70wIsRightFarseerT60T74T75T76.binpack is available at:
https://drive.google.com/drive/folders/1S9-ZiQa_3ApmjBtl2e8SyHxj4zG4V8gG
Local elo at 25k nodes per move vs. nn-e1fb1ade4432.nnue (L1 size 1024):
nn-epoch759.nnue : 26.9 +/- 1.6
Failed STC
https://tests.stockfishchess.org/tests/view/64742485d29264e4cfa75f97
LLR: -2.94 (-2.94,2.94) <0.00,2.00>
Total: 13728 W: 3588 L: 3829 D: 6311
Ptnml(0-2): 71, 1661, 3610, 1482, 40
Failing LTC
https://tests.stockfishchess.org/tests/view/64752d7c4a36543c4c9f3618
LLR: -1.91 (-2.94,2.94) <0.50,2.50>
Total: 35424 W: 9522 L: 9603 D: 16299
Ptnml(0-2): 24, 3579, 10585, 3502, 22
Passed VLTC 180+1.8
https://tests.stockfishchess.org/tests/view/64752df04a36543c4c9f3638
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 47616 W: 13174 L: 12863 D: 21579
Ptnml(0-2): 13, 4261, 14952, 4566, 16
Passed VLTC SMP 60+0.6 th 8
https://tests.stockfishchess.org/tests/view/647446ced29264e4cfa761e5
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 19942 W: 5694 L: 5451 D: 8797
Ptnml(0-2): 6, 1504, 6707, 1749, 5
closes https://github.com/official-stockfish/Stockfish/pull/4593
bench 2222567
This change removes one of the constants in the calculation of optimism. It also changes the 2 constants used with the scale value so that they are independent, instead of applying a constant to the scale and then adjusting it again when it is applied to the optimism. This might make the tuning of these constants cleaner and more reliable in the future.
STC 10+0.1 (accidentally run as an Elo gainer:
LLR: 2.93 (-2.94,2.94) <0.00,2.00>
Total: 154080 W: 41119 L: 40651 D: 72310
Ptnml(0-2): 375, 16840, 42190, 17212, 423
https://tests.stockfishchess.org/tests/live_elo/64653eabf3b1a4e86c317f77
LTC 60+0.6:
LLR: 2.95 (-2.94,2.94) <-1.75,0.25>
Total: 217434 W: 58382 L: 58363 D: 100689
Ptnml(0-2): 66, 21075, 66419, 21088, 69
https://tests.stockfishchess.org/tests/live_elo/6465d077f3b1a4e86c318d6c
closes https://github.com/official-stockfish/Stockfish/pull/4576
bench: 3190961