BadFish

mirror of https://github.com/sockspls/badfish synced 2025-07-13 04:29:15 +00:00

Author	SHA1	Message	Date
Sebastian Buchwald	4f0fecad8a	Use C++17 variable templates for type traits The C++17 variable templates are slightly more readable and allow us to remove the typename keyword in a few cases. closes https://github.com/official-stockfish/Stockfish/pull/4806 No functional change	2023-09-29 22:22:40 +02:00
Linmiao Xu	70ba9de85c	Update NNUE architecture to SFNNv8: L1-2560 nn-ac1dbea57aa3.nnue Creating this net involved: - a 6-stage training process from scratch. The datasets used in stages 1-5 were fully minimized. - permuting L1 weights with https://github.com/official-stockfish/nnue-pytorch/pull/254 A strong epoch after each training stage was chosen for the next. The 6 stages were: ``` 1. 400 epochs, lambda 1.0, default LR and gamma UHOx2-wIsRight-multinet-dfrc-n5000 (135G) nodes5000pv2_UHO.binpack data_pv-2_diff-100_nodes-5000.binpack wrongIsRight_nodes5000pv2.binpack multinet_pv-2_diff-100_nodes-5000.binpack dfrc_n5000.binpack 2. 800 epochs, end-lambda 0.75, LR 4.375e-4, gamma 0.995, skip 12 LeelaFarseer-T78juntoaugT79marT80dec.binpack (141G) T60T70wIsRightFarseerT60T74T75T76.binpack test78-junjulaug2022-16tb7p.no-db.min.binpack test79-mar2022-16tb7p.no-db.min.binpack test80-dec2022-16tb7p.no-db.min.binpack 3. 800 epochs, end-lambda 0.725, LR 4.375e-4, gamma 0.995, skip 20 leela93-v1-dfrc99-v2-T78juntosepT80jan-v6dd-T78janfebT79aprT80aprmay.min.binpack leela93-filt-v1.min.binpack dfrc99-16tb7p-filt-v2.min.binpack test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack test78-janfeb2022-16tb7p.min.binpack test79-apr2022-16tb7p.min.binpack test80-apr2022-16tb7p.min.binpack test80-may2022-16tb7p.min.binpack 4. 800 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 24 leela96-dfrc99-v2-T78juntosepT79mayT80junsepnovjan-v6dd-T80mar23-v6-T60novdecT77decT78aprmayT79aprT80may23.min.binpack leela96-filt-v2.min.binpack dfrc99-16tb7p-filt-v2.min.binpack test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack test79-may2022-16tb7p.filter-v6-dd.min.binpack test80-jun2022-16tb7p.filter-v6-dd.min.binpack test80-sep2022-16tb7p.filter-v6-dd.min.binpack test80-nov2022-16tb7p.filter-v6-dd.min.binpack test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack test80-mar2023-2tb7p.v6-sk16.min.binpack test60-novdec2021-16tb7p.min.binpack test77-dec2021-16tb7p.min.binpack test78-aprmay2022-16tb7p.min.binpack test79-apr2022-16tb7p.min.binpack test80-may2023-2tb7p.min.binpack 5. 960 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28 Increased max-epoch to 960 near the end of the first 800 epochs 5af11540bbfe dataset: https://github.com/official-stockfish/Stockfish/pull/4635 6. 1000 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28 Increased max-epoch to 1000 near the end of the first 800 epochs 1ee1aba5ed dataset: https://github.com/official-stockfish/Stockfish/pull/4782 ``` L1 weights permuted with: ```bash python3 serialize.py $nnue $nnue_permuted \ --features=HalfKAv2_hm \ --ft_optimize \ --ft_optimize_data=/data/fishpack32.binpack \ --ft_optimize_count=10000 ``` Speed measurements from 100 bench runs at depth 13 with profile-build x86-64-avx2: ``` sf_base = 1329051 +/- 2224 (95%) sf_test = 1163344 +/- 2992 (95%) diff = -165706 +/- 4913 (95%) speedup = -12.46807% +/- 0.370% (95%) ``` Training data can be found at: https://robotmoon.com/nnue-training-data/ Local elo at 25k nodes per move (vs. L1-2048 nn-1ee1aba5ed4c.nnue) ep959 : 16.2 +/- 2.3 Failed 10+0.1 STC: https://tests.stockfishchess.org/tests/view/6501beee2cd016da89abab21 LLR: -2.92 (-2.94,2.94) <0.00,2.00> Total: 13184 W: 3285 L: 3535 D: 6364 Ptnml(0-2): 85, 1662, 3334, 1440, 71 Failed 180+1.8 VLTC: https://tests.stockfishchess.org/tests/view/6505cf9a72620bc881ea908e LLR: -2.94 (-2.94,2.94) <0.00,2.00> Total: 64248 W: 16224 L: 16374 D: 31650 Ptnml(0-2): 26, 6788, 18640, 6650, 20 Passed 60+0.6 th 8 VLTC SMP (STC bounds): https://tests.stockfishchess.org/tests/view/65084a4618698b74c2e541dc LLR: 2.95 (-2.94,2.94) <0.00,2.00> Total: 90630 W: 23372 L: 23033 D: 44225 Ptnml(0-2): 13, 8490, 27968, 8833, 11 Passed 60+0.6 th 8 VLTC SMP: https://tests.stockfishchess.org/tests/view/6501d45d2cd016da89abacdb LLR: 2.95 (-2.94,2.94) <0.50,2.50> Total: 137804 W: 35764 L: 35276 D: 66764 Ptnml(0-2): 31, 13006, 42326, 13522, 17 closes https://github.com/official-stockfish/Stockfish/pull/4795 bench 1246812	2023-09-22 19:26:16 +02:00
mstembera	95fe2b9a9d	Reduce SIMD register count from 32 to 16 in the case of avx512 and vnni512 archs. Up to 17% speedup, depending on the compiler, e.g. ``` AMD pro 7840u (zen4 phoenix apu 4nm) bash bench_parallel.sh ./stockfish_avx512_gcc13 ./stockfish_avx512_pr_gcc13 20 10 sf_base = 1077737 +/- 8446 (95%) sf_test = 1264268 +/- 8543 (95%) diff = 186531 +/- 4280 (95%) speedup = 17.308% +/- 0.397% (95%) ``` Prior to this patch, it appears gcc spills registers. closes https://github.com/official-stockfish/Stockfish/pull/4796 No functional change	2023-09-22 19:15:34 +02:00
cj5716	fce4cc1829	Make casting styles consistent Make casting styles consistent with the rest of the code. closes https://github.com/official-stockfish/Stockfish/pull/4793 No functional change	2023-09-22 19:14:29 +02:00
mstembera	97f706ecc1	Sparse impl of affine_transform_non_ssse3() deal with the general case About a 8.6% speedup (for general arch) Results for 200 tests for each version: Base Test Diff Mean 141741 153998 -12257 StDev 2990 3042 3742 p-value: 0.999 speedup: 0.086 closes https://github.com/official-stockfish/Stockfish/pull/4786 No functional change	2023-09-22 19:03:47 +02:00
Tomasz Sobczyk	1461d861c8	Prevent usage of AVX-512 for the last layer. Add more static checks regarding the SIMD width match. STC: https://tests.stockfishchess.org/tests/view/64f5c568a9bc5a78c669e70e LLR: 2.95 (-2.94,2.94) <-1.75,0.25> Total: 125216 W: 31756 L: 31636 D: 61824 Ptnml(0-2): 327, 13993, 33848, 14113, 327 Fixes a bug introduced in `2f2f45f`, where with AVX-512 the weights and input to the last layer were being read out of bounds. Now AVX-512 is only used for the layers it can be used for. Additional static assertions have been added to prevent more errors like this in the future. closes https://github.com/official-stockfish/Stockfish/pull/4773 No functional change	2023-09-11 22:11:30 +02:00
Disservin	3c0e86a91e	Cleanup includes Reorder a few includes, include "position.h" where it was previously missing and apply include-what-you-use suggestions. Also make the order of the includes consistent, in the following way: 1. Related header (for .cpp files) 2. A blank line 3. C/C++ headers 4. A blank line 5. All other header files closes https://github.com/official-stockfish/Stockfish/pull/4763 fixes https://github.com/official-stockfish/Stockfish/issues/4707 No functional change	2023-09-03 08:24:51 +02:00
Gian-Carlo Pascutto	c6f62363a6	Simplify Square Clipped ReLU code. Squared numbers are never negative, so barring any wraparound there is no need to clamp to 0. From reading the code, there's no obvious way to get wraparound, so the entire operation can be simplified away. Updated original truncated code comments to be sensible. Verified by running ./stockfish bench 128 1 24 and by the following test: STC: https://tests.stockfishchess.org/tests/view/64da4db95b17f7c21c0eabe7 LLR: 2.94 (-2.94,2.94) <-1.75,0.25> Total: 60224 W: 15425 L: 15236 D: 29563 Ptnml(0-2): 195, 6576, 16382, 6763, 196 closes https://github.com/official-stockfish/Stockfish/pull/4751 No functional change	2023-08-22 11:14:19 +02:00
Matthies	9abef246a9	Allow compilation on Raspi (for ARMv8) Current master fails to compile for ARMv8 on Raspi cause gcc (version 10.2.1) does not like to cast between signed and unsigned vector types. This patch fixes it by using unsigned vector pointer for ARM to avoid implicite cast. closes https://github.com/official-stockfish/Stockfish/pull/4752 No functional change	2023-08-22 10:43:51 +02:00
Joost VandeVondele	8192945870	Improve testing coverage, remove unused code a) Add further tests to CI to cover most features. This uncovered a potential race in case setoption was sent between two searches. As the UCI protocol requires this sent to be went the engine is not searching, setoption now ensures that this is the case. b) Remove some unused code closes https://github.com/official-stockfish/Stockfish/pull/4730 No functional change	2023-08-11 19:27:46 +02:00
AndrovT	a6d9a302b8	Implement AffineTransformSparseInput for armv8 Implements AffineTransformSparseInput layer for the NNUE evaluation for the armv8 and armv8-dotprod architectures. We measured some nice speed improvements via 10 runs of our benchmark: armv8, Cortex-X1 : 18.5% speed-up armv8, Cortex-A76 : 13.2% speed-up armv8-dotprod, Cortex-X1 : 27.1% speed-up armv8-dotprod, Cortex-A76 : 12.1% speed-up armv8, Cortex-A72, Raspberry Pi 4 : 8.2% speed-up (thanks Torom!) closes https://github.com/official-stockfish/Stockfish/pull/4719 No functional change	2023-08-06 21:22:37 +02:00
Stéphane Nicolet	f84eb1f3ef	Improve some comments - clarify the examples for the bench command - typo in search.cpp closes https://github.com/official-stockfish/Stockfish/pull/4710 No functional change	2023-07-28 23:38:49 +02:00
mstembera	cb22520a9c	Remove unused return type from propagate() Also make two get_weight_index() static methods constexpr, for consistency with the other static get_hash_value() method right above. Tested for speed by user Torom (thanks). closes https://github.com/official-stockfish/Stockfish/pull/4708 No functional change	2023-07-28 23:24:42 +02:00
mstembera	1444837887	Remove inline assembly closes https://github.com/official-stockfish/Stockfish/pull/4698 No functional change	2023-07-19 21:39:51 +02:00
Joost VandeVondele	e8a5c64988	Consolidate to centipawns conversion avoid doing this calculations in multiple places. closes https://github.com/official-stockfish/Stockfish/pull/4686 No functional change	2023-07-16 15:14:50 +02:00
AndrovT	a42ab95e1f	remove large input specialization Removes unused large input specialization for dense affine transform. It has been obsolete since #4612 was merged. closes https://github.com/official-stockfish/Stockfish/pull/4684 No functional change	2023-07-16 15:12:21 +02:00
mstembera	529d3be8e2	More simplifications and cleanup in affine_transform_sparse_input.h closes https://github.com/official-stockfish/Stockfish/pull/4677 No functional change	2023-07-13 08:20:33 +02:00
Joost VandeVondele	af110e02ec	Remove classical evaluation since the introduction of NNUE (first released with Stockfish 12), we have maintained the classical evaluation as part of SF in frozen form. The idea that this code could lead to further inputs to the NN or search did not materialize. Now, after five releases, this PR removes the classical evaluation from SF. Even though this evaluation is probably the best of its class, it has become unimportant for the engine's strength, and there is little need to maintain this code (roughly 25% of SF) going forward, or to expend resources on trying to improve its integration in the NNUE eval. Indeed, it had still a very limited use in the current SF, namely for the evaluation of positions that are nearly decided based on material difference, where the speed of the classical evaluation outweights its inaccuracies. This impact on strength is small, roughly 2Elo, and probably decreasing in importance as the TC grows. Potentially, removal of this code could lead to the development of techniques to have faster, but less accurate NN evaluation, for certain positions. STC https://tests.stockfishchess.org/tests/view/64a320173ee09aa549c52157 Elo: -2.35 ± 1.1 (95%) LOS: 0.0% Total: 100000 W: 24916 L: 25592 D: 49492 Ptnml(0-2): 287, 12123, 25841, 11477, 272 nElo: -4.62 ± 2.2 (95%) PairsRatio: 0.95 LTC https://tests.stockfishchess.org/tests/view/64a320293ee09aa549c5215b Elo: -1.74 ± 1.0 (95%) LOS: 0.0% Total: 100000 W: 25010 L: 25512 D: 49478 Ptnml(0-2): 44, 11069, 28270, 10579, 38 nElo: -3.72 ± 2.2 (95%) PairsRatio: 0.96 VLTC SMP https://tests.stockfishchess.org/tests/view/64a3207c3ee09aa549c52168 Elo: -1.70 ± 0.9 (95%) LOS: 0.0% Total: 100000 W: 25673 L: 26162 D: 48165 Ptnml(0-2): 8, 9455, 31569, 8954, 14 nElo: -3.95 ± 2.2 (95%) PairsRatio: 0.95 closes https://github.com/official-stockfish/Stockfish/pull/4674 Bench: 1444646	2023-07-11 22:56:49 +02:00
mstembera	f8e65d82eb	Simplify away lookup_count. https://tests.stockfishchess.org/tests/view/64a3c1a93ee09aa549c53167 LLR: 2.94 (-2.94,2.94) <-1.75,0.25> Total: 32832 W: 8497 L: 8280 D: 16055 Ptnml(0-2): 80, 3544, 8967, 3729, 96 closes https://github.com/official-stockfish/Stockfish/pull/4662 No functional change	2023-07-06 23:02:11 +02:00
mstembera	80564bcfcd	Simplify lookup_count and clean up pieces(). https://github.com/official-stockfish/Stockfish/pull/4656 No functional change	2023-07-03 18:20:10 +02:00
Linmiao Xu	915532181f	Update NNUE architecture to SFNNv7 with larger L1 size of 2048 Creating this net involved: - a 5-step training process from scratch - greedy permuting L1 weights with https://github.com/official-stockfish/Stockfish/pull/4620 - leb128 compression with https://github.com/glinscott/nnue-pytorch/pull/251 - greedy 2- and 3- cycle permuting with https://github.com/official-stockfish/Stockfish/pull/4640 The 5 training steps were: 1. 400 epochs, lambda 1.0, lr 9.75e-4 UHOx2-wIsRight-multinet-dfrc-n5000-largeGensfen-d9.binpack (178G) nodes5000pv2_UHO.binpack data_pv-2_diff-100_nodes-5000.binpack wrongIsRight_nodes5000pv2.binpack multinet_pv-2_diff-100_nodes-5000.binpack dfrc_n5000.binpack large_gensfen_multipvdiff_100_d9.binpack ep399 chosen as start model for step2 2. 800 epochs, end-lambda 0.75, skip 16 LeelaFarseer-T78juntoaugT79marT80dec.binpack (141G) T60T70wIsRightFarseerT60T74T75T76.binpack test78-junjulaug2022-16tb7p.no-db.min.binpack test79-mar2022-16tb7p.no-db.min.binpack test80-dec2022-16tb7p.no-db.min.binpack ep559 chosen as start model for step3 3. 800 epochs, end-lambda 0.725, skip 20 leela96-dfrc99-v2-T80dectofeb-sk20-mar-v6-T77decT78janfebT79apr.binpack (223G) leela96-filt-v2.min.binpack dfrc99-16tb7p-eval-filt-v2.min.binpack test80-dec2022-16tb7p-filter-v6-sk20.min-mar2023.binpack test80-jan2023-16tb7p-filter-v6-sk20.min-mar2023.binpack test80-feb2023-16tb7p-filter-v6-sk20.min-mar2023.binpack test80-mar2023-2tb7p-filter-v6.min.binpack test77-dec2021-16tb7p.no-db.min.binpack test78-janfeb2022-16tb7p.no-db.min.binpack test79-apr2022-16tb7p.no-db.min.binpack ep499 chosen as start model for step4 4. 800 epochs, end-lambda 0.7, skip 24 0dd1cebea57 dataset https://github.com/official-stockfish/Stockfish/pull/4606 ep599 chosen as start model for step5 5. 800 epochs, end-lambda 0.7, skip 28 same dataset as step4 ep619 became nn-1b951f8b449d.nnue For the final step5 training: python3 easy_train.py \ --experiment-name L1-2048-S5-sameData-sk28-S4-0dd1cebea57-shuffled-S3-leela96-dfrc99-v2-T80dectofeb-sk20-mar-v6-T77decT78janfebT79apr-sk20-S2-LeelaFarseerT78T79T80-ep399-S1-UHOx2-wIsRight-multinet-dfrc-n5000-largeGensfen-d9 \ --training-dataset /data/leela96-dfrc99-T60novdec-v2-T80juntonovjanfebT79aprmayT78jantosepT77dec-v6dd-T80apr.binpack \ --early-fen-skipping 28 \ --nnue-pytorch-branch linrock/nnue-pytorch/misc-fixes-L1-2048 \ --engine-test-branch linrock/Stockfish/L1-2048 \ --start-from-engine-test-net False \ --start-from-model /data/experiments/experiment_L1-2048-S4-0dd1cebea57-shuffled-S3-leela96-dfrc99-v2-T80dectofeb-sk20-mar-v6-T77decT78janfebT79apr-sk20-S2-LeelaFarseerT78T79T80-ep399-S1-UHOx2-wIsRight-multinet-dfrc-n5000-largeGensfen-d9/training/run_0/nn-epoch599.nnue --max_epoch 800 \ --lr 4.375e-4 \ --gamma 0.995 \ --start-lambda 1.0 \ --end-lambda 0.7 \ --tui False \ --seed $RANDOM \ --gpus 0 SF training data components for the step1 dataset: https://drive.google.com/drive/folders/1yLCEmioC3Xx9KQr4T7uB6GnLm5icAYGU Leela training data for steps 2-5 can be found at: https://robotmoon.com/nnue-training-data/ Due to larger L1 size and slower inference, the speed penalty loses elo at STC. Measurements from 100 bench runs at depth 13 with x86-64-modern on Intel Core i5-1038NG7 2.00GHz: sf_base = 1240730 +/- 3443 (95%) sf_test = 1153341 +/- 2832 (95%) diff = -87388 +/- 1616 (95%) speedup = -7.04330% +/- 0.130% (95%) Local elo at 25k nodes per move (vs. L1-1536 nn-fdc1d0fe6455.nnue): nn-epoch619.nnue : 21.1 +/- 3.2 Failed STC: https://tests.stockfishchess.org/tests/view/6498ee93dc7002ce609cf979 LLR: -2.95 (-2.94,2.94) <0.00,2.00> Total: 11680 W: 3058 L: 3299 D: 5323 Ptnml(0-2): 44, 1422, 3149, 1181, 44 LTC: https://tests.stockfishchess.org/tests/view/649b32f5dc7002ce609d20cf Elo: 0.68 ± 1.5 (95%) LOS: 80.5% Total: 40000 W: 10887 L: 10809 D: 18304 Ptnml(0-2): 36, 3938, 11958, 4048, 20 nElo: 1.50 ± 3.4 (95%) PairsRatio: 1.02 Passed VLTC 180+1.8: https://tests.stockfishchess.org/tests/view/64992b43dc7002ce609cfd20 LLR: 3.06 (-2.94,2.94) <0.00,2.00> Total: 38086 W: 10612 L: 10338 D: 17136 Ptnml(0-2): 9, 3316, 12115, 3598, 5 Passed VLTC SMP 60+0.6 th 8: https://tests.stockfishchess.org/tests/view/649a21fedc7002ce609d0c7d LLR: 2.95 (-2.94,2.94) <0.50,2.50> Total: 38936 W: 11091 L: 10820 D: 17025 Ptnml(0-2): 1, 2948, 13305, 3207, 7 closes https://github.com/official-stockfish/Stockfish/pull/4646 Bench: 2505168	2023-07-01 13:34:30 +02:00
Stéphane Nicolet	e355c70594	Document the LEB128 patch Add some comments and harmonize style for the LEB128 patch. closes https://github.com/official-stockfish/Stockfish/pull/4642 No functional change	2023-07-01 13:01:28 +02:00
maxim	a46087ee30	Compressed network parameters Implemented LEB128 (de)compression for the feature transformer. Reduces embedded network size from 70 MiB to 39 Mib. The new nn-78bacfcee510.nnue corresponds to the master net compressed. closes https://github.com/official-stockfish/Stockfish/pull/4617 No functional change	2023-06-19 21:37:23 +02:00
Andreas Matthies	7922e07af8	Fix for MSVC compilation. MSVC needs two more explicit casts to compile new affine_transform_sparse_input. See https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_castsi256_ps and https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_castsi128_ps closes https://github.com/official-stockfish/Stockfish/pull/4616 No functional change	2023-06-13 08:45:25 +02:00
AndrovT	38e61663d8	Use block sparse input for the first layer. Use block sparse input for the first fully connected layer on architectures with at least SSSE3. Depending on the CPU architecture, this yields a speedup of up to 10%, e.g. ``` Result of 100 runs of 'bench 16 1 13 default depth NNUE' base (...ockfish-base) = 959345 +/- 7477 test (...ckfish-patch) = 1054340 +/- 9640 diff = +94995 +/- 3999 speedup = +0.0990 P(speedup > 0) = 1.0000 CPU: 8 x AMD Ryzen 7 5700U with Radeon Graphics Hyperthreading: on ``` Passed STC: https://tests.stockfishchess.org/tests/view/6485aa0965ffe077ca12409c LLR: 2.93 (-2.94,2.94) <0.00,2.00> Total: 8864 W: 2479 L: 2223 D: 4162 Ptnml(0-2): 13, 829, 2504, 1061, 25 This commit includes a net with reordered weights, to increase the likelihood of block sparse inputs, but otherwise equivalent to the previous master net (nn-ea57bea57e32.nnue). Activation data collected with https://github.com/AndrovT/Stockfish/tree/log-activations, running bench 16 1 13 varied_1000.epd depth NNUE on this data. Net parameters permuted with https://gist.github.com/AndrovT/9e3fbaebb7082734dc84d27e02094cb3. closes https://github.com/official-stockfish/Stockfish/pull/4612 No functional change	2023-06-12 20:41:27 +02:00
Linmiao Xu	c1fff71650	Update NNUE architecture to SFNNv6 with larger L1 size of 1536 Created by training a new net from scratch with L1 size increased from 1024 to 1536. Thanks to Vizvezdenec for the idea of exploring larger net sizes after recent training data improvements. A new net was first trained with lambda 1.0 and constant LR 8.75e-4. Then a strong net from a later epoch in the training run was chosen for retraining with start-lambda 1.0 and initial LR 4.375e-4 decaying with gamma 0.995. Retraining was performed a total of 3 times, for this 4-step process: 1. 400 epochs, lambda 1.0 on filtered T77+T79 v6 deduplicated data 2. 800 epochs, end-lambda 0.75 on T60T70wIsRightFarseerT60T74T75T76.binpack 3. 800 epochs, end-lambda 0.75 and early-fen-skipping 28 on the master dataset 4. 800 epochs, end-lambda 0.7 and early-fen-skipping 28 on the master dataset In the training sequence that reached the new nn-8d69132723e2.nnue net, the epochs used for the 3x retraining runs were: 1. epoch 379 trained on T77T79-filter-v6-dd.min.binpack 2. epoch 679 trained on T60T70wIsRightFarseerT60T74T75T76.binpack 3. epoch 799 trained on the master dataset For training from scratch: python3 easy_train.py \ --experiment-name new-L1-1536-T77T79-filter-v6dd \ --training-dataset /data/T77T79-filter-v6-dd.min.binpack \ --max_epoch 400 \ --lambda 1.0 \ --start-from-engine-test-net False \ --engine-test-branch linrock/Stockfish/L1-1536 \ --nnue-pytorch-branch linrock/Stockfish/misc-fixes-L1-1536 \ --tui False \ --gpus "0," \ --seed $RANDOM Retraining commands were similar to each other. For the 3rd retraining run: python3 easy_train.py \ --experiment-name L1-1536-T77T79-v6dd-Re1-LeelaFarseer-Re2-masterDataset-Re3-sameData \ --training-dataset /data/leela96-dfrc99-v2-T60novdecT80juntonovjanfebT79aprmayT78jantosepT77dec-v6dd.binpack \ --early-fen-skipping 28 \ --max_epoch 800 \ --start-lambda 1.0 \ --end-lambda 0.7 \ --lr 4.375e-4 \ --gamma 0.995 \ --start-from-engine-test-net False \ --start-from-model /data/L1-1536-T77T79-v6dd-Re1-LeelaFarseer-Re2-masterDataset-nn-epoch799.nnue \ --engine-test-branch linrock/Stockfish/L1-1536 \ --nnue-pytorch-branch linrock/nnue-pytorch/misc-fixes-L1-1536 \ --tui False \ --gpus "0," \ --seed $RANDOM The T77+T79 data used is a subset of the master dataset available at: https://robotmoon.com/nnue-training-data/ T60T70wIsRightFarseerT60T74T75T76.binpack is available at: https://drive.google.com/drive/folders/1S9-ZiQa_3ApmjBtl2e8SyHxj4zG4V8gG Local elo at 25k nodes per move vs. nn-e1fb1ade4432.nnue (L1 size 1024): nn-epoch759.nnue : 26.9 +/- 1.6 Failed STC https://tests.stockfishchess.org/tests/view/64742485d29264e4cfa75f97 LLR: -2.94 (-2.94,2.94) <0.00,2.00> Total: 13728 W: 3588 L: 3829 D: 6311 Ptnml(0-2): 71, 1661, 3610, 1482, 40 Failing LTC https://tests.stockfishchess.org/tests/view/64752d7c4a36543c4c9f3618 LLR: -1.91 (-2.94,2.94) <0.50,2.50> Total: 35424 W: 9522 L: 9603 D: 16299 Ptnml(0-2): 24, 3579, 10585, 3502, 22 Passed VLTC 180+1.8 https://tests.stockfishchess.org/tests/view/64752df04a36543c4c9f3638 LLR: 2.95 (-2.94,2.94) <0.50,2.50> Total: 47616 W: 13174 L: 12863 D: 21579 Ptnml(0-2): 13, 4261, 14952, 4566, 16 Passed VLTC SMP 60+0.6 th 8 https://tests.stockfishchess.org/tests/view/647446ced29264e4cfa761e5 LLR: 2.94 (-2.94,2.94) <0.50,2.50> Total: 19942 W: 5694 L: 5451 D: 8797 Ptnml(0-2): 6, 1504, 6707, 1749, 5 closes https://github.com/official-stockfish/Stockfish/pull/4593 bench 2222567	2023-05-31 08:51:22 +02:00
Maxim Masiutin	2f2f45f9f4	Made two specializations for affine transform easier to understand. Added AVX-512 for the specialization for small inputs closes https://github.com/official-stockfish/Stockfish/pull/4502 No functional change	2023-04-10 09:29:52 +02:00
Sebastian Buchwald	43108a6198	Reuse existing functions to read/write array of network parameters closes https://github.com/official-stockfish/Stockfish/pull/4463 No functional change	2023-03-29 21:36:27 +02:00
MinetaS	587bc647d7	Remove non_pawn_material in NNUE::evaluate After "Use NNUE complexity in search, retune related parameters" commit, the effect of non-pawn material adjustment has been nearly diminished. This patch removes pos.non_pawn_material as a simplification, which passed non-regression tests with both STC and LTC. Passed non-regression STC: LLR: 2.95 (-2.94,2.94) <-1.75,0.25> Total: 75152 W: 20030 L: 19856 D: 35266 Ptnml(0-2): 215, 8281, 20459, 8357, 264 https://tests.stockfishchess.org/tests/view/641ab471db43ab2ba6f7bc58 Passed non-regression LTC: LLR: 2.94 (-2.94,2.94) <-1.75,0.25> Total: 193864 W: 51870 L: 51829 D: 90165 Ptnml(0-2): 86, 18968, 58794, 18987, 97 https://tests.stockfishchess.org/tests/view/641b4fe6db43ab2ba6f7db96 closes https://github.com/official-stockfish/Stockfish/pull/4461 Bench: 5020718	2023-03-25 09:25:49 +01:00
disservin	af4b62a593	NNUE namespace cleanup This patch moves the nnue namespace in the appropiate header that correspondes with the definition. It also makes navigation a bit easier. closes https://github.com/official-stockfish/Stockfish/pull/4445 No functional change	2023-03-19 11:27:15 +01:00
pb00067	f0556dcbe3	Small cleanups remove some unneeded assignments, typos, incorrect comments, add authors entry. closes https://github.com/official-stockfish/Stockfish/pull/4417 no functional change	2023-03-14 08:38:02 +01:00
Sebastian Buchwald	564456a6a8	Unify type alias declarations The commit unifies the declaration of type aliases by replacing all typedefs with corresponding using statements. closing https://github.com/official-stockfish/Stockfish/pull/4412 No functional change	2023-02-27 08:29:47 +01:00
Sebastian Buchwald	29b5ad5dea	Fix typo in method name closes https://github.com/official-stockfish/Stockfish/pull/4404 No functional change	2023-02-24 20:12:53 +01:00
Joost VandeVondele	08385527dd	Introduce a function to compute NNUE accumulator This patch introduces `hint_common_parent_position()` to signal that potentially several child nodes will require an NNUE eval. By populating explicitly the accumulator, these subsequent evaluations can be performed more efficiently. This was based on the observation that calculating the evaluation in an excluded move position yielded a significant Elo gain, even though the evaluation itself was already available (work by pb00067). Sopel wrote the code to perform just the accumulator update. This PR is based on cleaned up code that passed STC: https://tests.stockfishchess.org/tests/view/63f62f9be74a12625bcd4aa0 LLR: 2.94 (-2.94,2.94) <0.50,2.50> Total: 110368 W: 29607 L: 29167 D: 51594 Ptnml(0-2): 41, 10551, 33572, 10967, 53 and in an the earlier (equivalent) version passed STC: https://tests.stockfishchess.org/tests/view/63f3c3fee74a12625bcce2a6 LLR: 2.95 (-2.94,2.94) <0.00,2.00> Total: 47552 W: 12786 L: 12467 D: 22299 Ptnml(0-2): 120, 5107, 12997, 5438, 114 passed LTC: https://tests.stockfishchess.org/tests/view/63f45cc2e74a12625bccfa63 LLR: 2.94 (-2.94,2.94) <0.50,2.50> Total: 110368 W: 29607 L: 29167 D: 51594 Ptnml(0-2): 41, 10551, 33572, 10967, 53 closes https://github.com/official-stockfish/Stockfish/pull/4402 Bench: 3726250	2023-02-23 13:25:35 +01:00
Sebastian Buchwald	77dfcbedce	Remove unused macros closes https://github.com/official-stockfish/Stockfish/pull/4397 No functional change	2023-02-23 13:24:37 +01:00
Sebastian Buchwald	b4ad3a3c4b	Add support for ARM dot product instructions The sdot instruction computes (and accumulates) a signed dot product, which is quite handy for Stockfish's NNUE code. The instruction is optional for Armv8.2 and Armv8.3, and mandatory for Armv8.4 and above. The commit adds a new 'arm-dotprod' architecture with enabled dot product support. It also enables dot product support for the existing 'apple-silicon' architecture, which is at least Armv8.5. The following local speed test was performed on an Apple M1 with ARCH=apple-silicon. I had to remove CPU pinning from the benchmark script. However, the results were still consistent: Checking both binaries against themselves reported a speedup of +0.0000 and +0.0005, respectively. ``` Result of 100 runs ================== base (...ish.037ef3e1) = 1917997 +/- 7152 test (...fish.dotprod) = 2159682 +/- 9066 diff = +241684 +/- 2923 speedup = +0.1260 P(speedup > 0) = 1.0000 CPU: 10 x arm Hyperthreading: off ``` Fixes #4193 closes https://github.com/official-stockfish/Stockfish/pull/4400 No functional change	2023-02-23 13:22:03 +01:00
MinetaS	2c36d1e7e7	Fix overflow in add_dpbusd_epi32x2 This patch fixes 16bit overflow in _add_dpbusd_epi32x2 functions, that can be triggered in rare cases depending on the NNUE weights. While the code leads to some slowdown on affected architectures (most notably avx2), the fix is simpler than some of the other options discussed in https://github.com/official-stockfish/Stockfish/pull/4394 Code suggested by Sopel97. Result of "bench 4096 1 30 default depth nnue": \| Architecture \| master \| patch (gcc) \| patch (clang) \| \|---------------------\|-----------\|-------------\|---------------\| \| x86-64-vnni512 \| 762122798 \| 762122798 \| 762122798 \| \| x86-64-avx512 \| 769723503 \| 762122798 \| 762122798 \| \| x86-64-bmi2 \| 769723503 \| 762122798 \| 762122798 \| \| x86-64-ssse3 \| 769723503 \| 762122798 \| 762122798 \| \| x86-64 \| 762122798 \| 762122798 \| 762122798 \| Following architectures will experience ~4% slowdown due to an additional instruction in the middle of hot path: x86-64-avx512 * x86-64-bmi2 * x86-64-avx2 * x86-64-sse41-popcnt (x86-64-modern) * x86-64-ssse3 * x86-32-sse41-popcnt This patch clearly loses Elo against master with both STC and LTC. Failed non-regression STC (256bit fix only): LLR: -2.95 (-2.94,2.94) <-1.75,0.25> Total: 33528 W: 8769 L: 9049 D: 15710 Ptnml(0-2): 96, 3616, 9600, 3376, 76 https://tests.stockfishchess.org/tests/view/63e6a5b44299542b1e26a485 60+0.6 @ 30000 games: Elo: -1.67 +-1.7 (95%) LOS: 2.8% Total: 30000 W: 7848 L: 7992 D: 14160 Ptnml(0-2): 12, 2847, 9436, 2683, 22 nElo: -3.84 +-3.9 (95%) PairsRatio: 0.95 https://tests.stockfishchess.org/tests/view/63e7ac716d0e1db55f35a660 However, a test against nn-a3dc078bafc7.nnue, which is the latest "safe" network not causing the bug, passed with regular bounds. Passed STC: LLR: 2.94 (-2.94,2.94) <0.00,2.00> Total: 160456 W: 42658 L: 42175 D: 75623 Ptnml(0-2): 487, 17638, 43469, 18173, 461 https://tests.stockfishchess.org/tests/view/63e89836d62a5d02b0fa82c8 closes https://github.com/official-stockfish/Stockfish/pull/4391 closes https://github.com/official-stockfish/Stockfish/pull/4394 No functional change	2023-02-18 13:23:18 +01:00
Sebastian Buchwald	2f67409506	Remove redundant const qualifiers The const qualifiers are already implied by the constexpr qualifiers. closes https://github.com/official-stockfish/Stockfish/pull/4359 No functional change	2023-01-28 16:49:27 +01:00
Sebastian Buchwald	2167942b6e	Simplify functions to read/write network parameters closes https://github.com/official-stockfish/Stockfish/pull/4358 No functional change	2023-01-28 16:47:52 +01:00
Sebastian Buchwald	da5bcec481	Fix asm modifiers in add_dpbusd_epi32x2 implementations The accumulator should be an earlyclobber because it is written before all input operands are read. Otherwise, the asm code computes a wrong result if the accumulator shares a register with one of the other input operands (which happens if we pass in the same expression for the accumulator and the operand). Closes https://github.com/official-stockfish/Stockfish/pull/4339 No functional change	2023-01-22 10:51:02 +01:00
Sebastian Buchwald	4f4e652eca	Avoid unnecessary string copies closes https://github.com/official-stockfish/Stockfish/pull/4326 also fixes typo, closes https://github.com/official-stockfish/Stockfish/pull/4332 No functional change	2023-01-09 20:32:58 +01:00
Sebastian Buchwald	e9e7a7b83f	Replace some std::string occurrences with std::string_view std::string_view is more lightweight than std::string. Furthermore, std::string_view variables can be declared constexpr. closes https://github.com/official-stockfish/Stockfish/pull/4328 No functional change	2023-01-09 20:28:24 +01:00
Stefano Di Martino	5a88c5bb9b	Modernize code base a little bit Removed sprintf() which generated a warning, because of security reasons. Replace NULL with nullptr Replace typedef with using Do not inherit from std::vector. Use composition instead. optimize mutex-unlocking closes https://github.com/official-stockfish/Stockfish/pull/4327 No functional change	2023-01-09 20:25:13 +01:00
Sebastian Buchwald	31acd6bab7	Warn if a global function has no previous declaration If a global function has no previous declaration, either the declaration is missing in the corresponding header file or the function should be declared static. Static functions are local to the translation unit, which allows the compiler to apply some optimizations earlier (when compiling the translation unit rather than during link-time optimization). The commit enables the warning for gcc, clang, and mingw. It also fixes the reported warnings by declaring the functions static or by adding a header file (benchmark.h). closes https://github.com/official-stockfish/Stockfish/pull/4325 No functional change	2023-01-09 20:18:39 +01:00
Sebastian Buchwald	b60f9cc451	Update copyright years Happy New Year! closes https://github.com/official-stockfish/Stockfish/pull/4315 No functional change	2023-01-02 19:07:38 +01:00
Joost VandeVondele	ad2aa8c06f	Normalize evaluation Normalizes the internal value as reported by evaluate or search to the UCI centipawn result used in output. This value is derived from the win_rate_model() such that Stockfish outputs an advantage of "100 centipawns" for a position if the engine has a 50% probability to win from this position in selfplay at fishtest LTC time control. The reason to introduce this normalization is that our evaluation is, since NNUE, no longer related to the classical parameter PawnValueEg (=208). This leads to the current evaluation changing quite a bit from release to release, for example, the eval needed to have 50% win probability at fishtest LTC (in cp and internal Value): June 2020 : 113cp (237) June 2021 : 115cp (240) April 2022 : 134cp (279) July 2022 : 167cp (348) With this patch, a 100cp advantage will have a fixed interpretation, i.e. a 50% win chance. To keep this value steady, it will be needed to update the win_rate_model() from time to time, based on fishtest data. This analysis can be performed with a set of scripts currently available at https://github.com/vondele/WLD_model fixes https://github.com/official-stockfish/Stockfish/issues/4155 closes https://github.com/official-stockfish/Stockfish/pull/4216 No functional change	2022-11-05 09:15:53 +01:00
Clausable	8333b2a94c	Fix README typos, update AUTHORS closes https://github.com/official-stockfish/Stockfish/pull/4208 No functional change	2022-10-27 08:15:46 +02:00
mstembera	93f71ecfe1	Optimize make_index() using templates and lookup tables. https://tests.stockfishchess.org/tests/view/634517e54bc7650f07542f99 LLR: 2.94 (-2.94,2.94) <0.00,2.00> Total: 642672 W: 171819 L: 170658 D: 300195 Ptnml(0-2): 2278, 68077, 179416, 69336, 2229 this also introduces `-flto-partition=one` as suggested by MinetaS (Syine Mineta) to avoid linking errors due to LTO on 32 bit mingw. This change was tested in isolation as well https://tests.stockfishchess.org/tests/view/634aacf84bc7650f0755188b LLR: 2.94 (-2.94,2.94) <-1.75,0.25> Total: 119352 W: 31986 L: 31862 D: 55504 Ptnml(0-2): 439, 12624, 33400, 12800, 413 closes https://github.com/official-stockfish/Stockfish/pull/4199 No functional change	2022-10-16 11:42:19 +02:00
mstembera	82bb21dc7a	Optimize AVX2 path in NNUE evaluation always selecting AffineTransform specialization for small inputs. A related patch was tested as Initially tested as a simplification STC https://tests.stockfishchess.org/tests/view/6317c3f437f41b13973d6dff LLR: 2.95 (-2.94,2.94) <-1.75,0.25> Total: 58072 W: 15619 L: 15425 D: 27028 Ptnml(0-2): 241, 6191, 15992, 6357, 255 Elo gain speedup test STC https://tests.stockfishchess.org/tests/view/63181c1b37f41b13973d79dc LLR: 2.94 (-2.94,2.94) <0.00,2.00> Total: 184496 W: 49922 L: 49401 D: 85173 Ptnml(0-2): 851, 19397, 51208, 19964, 828 and this patch gained in testing speedup = +0.0071 P(speedup > 0) = 1.0000 on CPU: 16 x AMD Ryzen 9 3950X closes https://github.com/official-stockfish/Stockfish/pull/4158 No functional change	2022-09-11 14:19:57 +02:00
Dubslow	442c40b43d	Use NNUE complexity in search, retune related parameters This builds on ideas of xoto10 and mstembera to use more output from NNUE in the search algorithm. passed STC: https://tests.stockfishchess.org/tests/view/62ae454fe7ee5525ef88a957 LLR: 2.95 (-2.94,2.94) <0.00,2.50> Total: 89208 W: 24127 L: 23753 D: 41328 Ptnml(0-2): 400, 9886, 23642, 10292, 384 passed LTC: https://tests.stockfishchess.org/tests/view/62acc6ddd89eb6cf1e0750a1 LLR: 2.93 (-2.94,2.94) <0.50,3.00> Total: 56352 W: 15430 L: 15115 D: 25807 Ptnml(0-2): 44, 5501, 16782, 5794, 55 closes https://github.com/official-stockfish/Stockfish/pull/4088 bench 5332964	2022-06-20 08:30:57 +02:00

1 2 3

125 commits