BadFish

mirror of https://github.com/sockspls/badfish synced 2025-05-01 17:19:36 +00:00

Author	SHA1	Message	Date
gab8192	49ef4c935a	Implement accumulator refresh table For each thread persist an accumulator cache for the network, where each cache contains multiple entries for each of the possible king squares. When the accumulator needs to be refreshed, the cached entry is used to more efficiently update the accumulator, instead of rebuilding it from scratch. This idea, was first described by Luecx (author of Koivisto) and is commonly referred to as "Finny Tables". When the accumulator needs to be refreshed, instead of filling it with biases and adding every piece from scratch, we... 1. Take the `AccumulatorRefreshEntry` associated with the new king bucket 2. Calculate the features to activate and deactivate (from differences between bitboards in the entry and bitboards of the actual position) 3. Apply the updates on the refresh entry 4. Copy the content of the refresh entry accumulator to the accumulator we were refreshing 5. Copy the bitboards from the position to the refresh entry, to match the newly updated accumulator Results at STC: https://tests.stockfishchess.org/tests/view/662301573fe04ce4cefc1386 (first version) https://tests.stockfishchess.org/tests/view/6627fa063fe04ce4cefc6560 (final) Non-Regression between first and final: https://tests.stockfishchess.org/tests/view/662801e33fe04ce4cefc660a STC SMP: https://tests.stockfishchess.org/tests/view/662808133fe04ce4cefc667c closes https://github.com/official-stockfish/Stockfish/pull/5183 No functional change	2024-04-24 18:38:20 +02:00
Gahtan Nahdi	d0e72c19fa	fix clang compiler warning for avx512 build Initialize variable in constexpr function to get rid of clang compiler warning for avx512 build. closes https://github.com/official-stockfish/Stockfish/pull/5176 Non-functional change	2024-04-21 14:38:16 +02:00
mstembera	94484db6e8	Avoid permuting inputs during transform() Avoid permuting inputs during transform() and instead do it once at load time. Affects AVX2 and newer Intel architectures only. https://tests.stockfishchess.org/tests/view/661306613eb00c8ccc0033c7 LLR: 2.94 (-2.94,2.94) <0.00,2.00> Total: 108480 W: 28319 L: 27898 D: 52263 Ptnml(0-2): 436, 12259, 28438, 12662, 445 speedups measured such as e.g. ``` Result of 100 runs ================== base (./stockfish.master ) = 1241128 +/- 3757 test (./stockfish.patch ) = 1247713 +/- 3689 diff = +6585 +/- 2583 speedup = +0.0053 P(speedup > 0) = 1.0000 ``` closes https://github.com/official-stockfish/Stockfish/pull/5160 No functional change	2024-04-11 22:38:38 +02:00
mstembera	5001d49f42	Update nnue_feature_transformer.h Unroll update_accumulator_refresh to process two active indices simultaneously. The compiler might not unroll effectively because the number of active indices isn't known at compile time. STC https://tests.stockfishchess.org/tests/view/65faa8850ec64f0526c4fca9 LLR: 2.93 (-2.94,2.94) <0.00,2.00> Total: 130464 W: 33882 L: 33431 D: 63151 Ptnml(0-2): 539, 14591, 34501, 15082, 519 closes https://github.com/official-stockfish/Stockfish/pull/5125 No functional change	2024-03-26 18:06:49 +01:00
mstembera	7831131591	Only evaluate the PSQT part of the small net for large evals. Thanks to Viren6 for suggesting to set complexity to 0. STC https://tests.stockfishchess.org/tests/view/65d7d6709b2da0226a5a203f LLR: 2.92 (-2.94,2.94) <0.00,2.00> Total: 328384 W: 85316 L: 84554 D: 158514 Ptnml(0-2): 1414, 39076, 82486, 39766, 1450 LTC https://tests.stockfishchess.org/tests/view/65dce6d290f639b028a54d2e LLR: 2.95 (-2.94,2.94) <0.50,2.50> Total: 165162 W: 41918 L: 41330 D: 81914 Ptnml(0-2): 102, 18332, 45124, 18922, 101 closes https://github.com/official-stockfish/Stockfish/pull/5083 bench: 1504003	2024-03-03 15:29:58 +01:00
FauziAkram	59691d46a1	Assorted trivial cleanups Renaming doubleExtensions variable to multiExtensions, since now we have also triple extensions. Some extra cleanups. Recent tests used to measure the elo worth: https://tests.stockfishchess.org/tests/view/659fd0c379aa8af82b96abc3 https://tests.stockfishchess.org/tests/view/65a8f3da79aa8af82b9751e3 https://tests.stockfishchess.org/tests/view/65b51824c865510db0272740 https://tests.stockfishchess.org/tests/view/65b58fbfc865510db0272f5b closes https://github.com/official-stockfish/Stockfish/pull/5032 No functional change	2024-02-09 19:06:24 +01:00
Linmiao Xu	584d9efedc	Dual NNUE with L1-128 smallnet Credit goes to @mstembera for: - writing the code enabling dual NNUE: https://github.com/official-stockfish/Stockfish/pull/4898 - the idea of trying L1-128 trained exclusively on high simple eval positions The L1-128 smallnet is: - epoch 399 of a single-stage training from scratch - trained only on positions from filtered data with high material difference - defined by abs(simple_eval) > 1000 ```yaml experiment-name: 128--S1-only-hse-v2 training-dataset: - /data/hse/S3/dfrc99-16tb7p-eval-filt-v2.min.high-simple-eval-1k.binpack - /data/hse/S3/leela96-filt-v2.min.high-simple-eval-1k.binpack - /data/hse/S3/test80-apr2022-16tb7p.min.high-simple-eval-1k.binpack - /data/hse/S7/test60-2020-2tb7p.v6-3072.high-simple-eval-1k.binpack - /data/hse/S7/test60-novdec2021-12tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack - /data/hse/S7/test77-nov2021-2tb7p.v6-3072.min.high-simple-eval-1k.binpack - /data/hse/S7/test77-dec2021-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack - /data/hse/S7/test77-jan2022-2tb7p.high-simple-eval-1k.binpack - /data/hse/S7/test78-jantomay2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack - /data/hse/S7/test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack - /data/hse/S7/test79-apr2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack - /data/hse/S7/test79-may2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack # T80 2022 - /data/hse/S7/test80-may2022-16tb7p.high-simple-eval-1k.binpack - /data/hse/S7/test80-jun2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack - /data/hse/S7/test80-jul2022-16tb7p.v6-dd.min.high-simple-eval-1k.binpack - /data/hse/S7/test80-aug2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack - /data/hse/S7/test80-sep2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack - /data/hse/S7/test80-oct2022-16tb7p.v6-dd.high-simple-eval-1k.binpack - /data/hse/S7/test80-nov2022-16tb7p-v6-dd.min.high-simple-eval-1k.binpack # T80 2023 - /data/hse/S7/test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack - /data/hse/S7/test80-feb2023-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack - /data/hse/S7/test80-mar2023-2tb7p.v6-sk16.min.high-simple-eval-1k.binpack - /data/hse/S7/test80-apr2023-2tb7p-filter-v6-sk16.min.high-simple-eval-1k.binpack - /data/hse/S7/test80-may2023-2tb7p.v6.min.high-simple-eval-1k.binpack - /data/hse/S7/test80-jun2023-2tb7p.v6-3072.min.high-simple-eval-1k.binpack - /data/hse/S7/test80-jul2023-2tb7p.v6-3072.min.high-simple-eval-1k.binpack - /data/hse/S7/test80-aug2023-2tb7p.v6.min.high-simple-eval-1k.binpack - /data/hse/S7/test80-sep2023-2tb7p.high-simple-eval-1k.binpack - /data/hse/S7/test80-oct2023-2tb7p.high-simple-eval-1k.binpack start-from-engine-test-net: False nnue-pytorch-branch: linrock/nnue-pytorch/L1-128 engine-test-branch: linrock/Stockfish/L1-128-nolazy engine-base-branch: linrock/Stockfish/L1-128 num-epochs: 500 lambda: 1.0 ``` Experiment yaml configs converted to easy_train.sh commands with: https://github.com/linrock/nnue-tools/blob/4339954/yaml_easy_train.py Binpacks interleaved at training time with: https://github.com/official-stockfish/nnue-pytorch/pull/259 Data filtered for high simple eval positions with: https://github.com/linrock/nnue-data/blob/32d6a68/filter_high_simple_eval_plain.py https://github.com/linrock/Stockfish/blob/61dbfe/src/tools/transform.cpp#L626-L655 Training data can be found at: https://robotmoon.com/nnue-training-data/ Local elo at 25k nodes per move of L1-128 smallnet (nnue-only eval) vs. L1-128 trained on standard S1 data: nn-epoch399.nnue : -318.1 +/- 2.1 Passed STC: https://tests.stockfishchess.org/tests/view/6574cb9d95ea6ba1fcd49e3b LLR: 2.93 (-2.94,2.94) <0.00,2.00> Total: 62432 W: 15875 L: 15521 D: 31036 Ptnml(0-2): 177, 7331, 15872, 7633, 203 Passed LTC: https://tests.stockfishchess.org/tests/view/6575da2d4d789acf40aaac6e LLR: 2.94 (-2.94,2.94) <0.50,2.50> Total: 64830 W: 16118 L: 15738 D: 32974 Ptnml(0-2): 43, 7129, 17697, 7497, 49 closes https://github.com/official-stockfish/Stockfish/pulls Bench: 1330050 Co-Authored-By: mstembera <5421953+mstembera@users.noreply.github.com>	2024-01-07 21:15:52 +01:00
Disservin	444f03ee95	Update copyright year closes https://github.com/official-stockfish/Stockfish/pull/4954 No functional change	2024-01-04 15:47:10 +01:00
FauziAkram	833a2e2bc0	Cleanup comments Tests used to derive some Elo worth comments: https://tests.stockfishchess.org/tests/view/656a7f4e136acbc573555a31 https://tests.stockfishchess.org/tests/view/6585fb455457644dc984620f closes https://github.com/official-stockfish/Stockfish/pull/4945 No functional change	2023-12-31 19:54:27 +01:00
Joost VandeVondele	ec02714b62	Cleanup comments and some code reorg. passed STC: https://tests.stockfishchess.org/tests/view/6536dc7dcc309ae83955b04d LLR: 2.93 (-2.94,2.94) <-1.75,0.25> Total: 58048 W: 14693 L: 14501 D: 28854 Ptnml(0-2): 200, 6399, 15595, 6669, 161 closes https://github.com/official-stockfish/Stockfish/pull/4846 No functional change	2023-10-24 17:43:05 +02:00
Disservin	2d0237db3f	add clang-format This introduces clang-format to enforce a consistent code style for Stockfish. Having a documented and consistent style across the code will make contributing easier for new developers, and will make larger changes to the codebase easier to make. To facilitate formatting, this PR includes a Makefile target (`make format`) to format the code, this requires clang-format (version 17 currently) to be installed locally. Installing clang-format is straightforward on most OS and distros (e.g. with https://apt.llvm.org/, brew install clang-format, etc), as this is part of quite commonly used suite of tools and compilers (llvm / clang). Additionally, a CI action is present that will verify if the code requires formatting, and comment on the PR as needed. Initially, correct formatting is not required, it will be done by maintainers as part of the merge or in later commits, but obviously this is encouraged. fixes https://github.com/official-stockfish/Stockfish/issues/3608 closes https://github.com/official-stockfish/Stockfish/pull/4790 Co-Authored-By: Joost VandeVondele <Joost.VandeVondele@gmail.com>	2023-10-22 16:06:27 +02:00
mstembera	d3d0c69dc1	Remove outdated Tile naming. cleanup variable naming after #4816 closes #4833 No functional change	2023-10-21 10:28:55 +02:00
mstembera	c17a657b04	Optimize the most common update accumalator cases w/o tiling In the most common case where we only update a single state it's faster to not use temporary accumulation registers and tiling. (Also includes a couple of small cleanups.) passed STC https://tests.stockfishchess.org/tests/view/651918e3cff46e538ee0023b LLR: 2.95 (-2.94,2.94) <0.00,2.00> Total: 34944 W: 8989 L: 8687 D: 17268 Ptnml(0-2): 88, 3743, 9512, 4037, 92 A simpler version https://tests.stockfishchess.org/tests/view/65190dfacff46e538ee00155 also passed but this version is stronger still https://tests.stockfishchess.org/tests/view/6519b95fcff46e538ee00fa2 closes https://github.com/official-stockfish/Stockfish/pull/4816 No functional change	2023-10-08 07:42:39 +02:00
mstembera	8a912951de	Remove handcrafted MMX code too small a benefit to maintain this old target closes https://github.com/official-stockfish/Stockfish/pull/4804 No functional change	2023-10-08 07:37:01 +02:00
mstembera	95fe2b9a9d	Reduce SIMD register count from 32 to 16 in the case of avx512 and vnni512 archs. Up to 17% speedup, depending on the compiler, e.g. ``` AMD pro 7840u (zen4 phoenix apu 4nm) bash bench_parallel.sh ./stockfish_avx512_gcc13 ./stockfish_avx512_pr_gcc13 20 10 sf_base = 1077737 +/- 8446 (95%) sf_test = 1264268 +/- 8543 (95%) diff = 186531 +/- 4280 (95%) speedup = 17.308% +/- 0.397% (95%) ``` Prior to this patch, it appears gcc spills registers. closes https://github.com/official-stockfish/Stockfish/pull/4796 No functional change	2023-09-22 19:15:34 +02:00
mstembera	97f706ecc1	Sparse impl of affine_transform_non_ssse3() deal with the general case About a 8.6% speedup (for general arch) Results for 200 tests for each version: Base Test Diff Mean 141741 153998 -12257 StDev 2990 3042 3742 p-value: 0.999 speedup: 0.086 closes https://github.com/official-stockfish/Stockfish/pull/4786 No functional change	2023-09-22 19:03:47 +02:00
Disservin	3c0e86a91e	Cleanup includes Reorder a few includes, include "position.h" where it was previously missing and apply include-what-you-use suggestions. Also make the order of the includes consistent, in the following way: 1. Related header (for .cpp files) 2. A blank line 3. C/C++ headers 4. A blank line 5. All other header files closes https://github.com/official-stockfish/Stockfish/pull/4763 fixes https://github.com/official-stockfish/Stockfish/issues/4707 No functional change	2023-09-03 08:24:51 +02:00
maxim	a46087ee30	Compressed network parameters Implemented LEB128 (de)compression for the feature transformer. Reduces embedded network size from 70 MiB to 39 Mib. The new nn-78bacfcee510.nnue corresponds to the master net compressed. closes https://github.com/official-stockfish/Stockfish/pull/4617 No functional change	2023-06-19 21:37:23 +02:00
pb00067	f0556dcbe3	Small cleanups remove some unneeded assignments, typos, incorrect comments, add authors entry. closes https://github.com/official-stockfish/Stockfish/pull/4417 no functional change	2023-03-14 08:38:02 +01:00
Sebastian Buchwald	564456a6a8	Unify type alias declarations The commit unifies the declaration of type aliases by replacing all typedefs with corresponding using statements. closing https://github.com/official-stockfish/Stockfish/pull/4412 No functional change	2023-02-27 08:29:47 +01:00
Sebastian Buchwald	29b5ad5dea	Fix typo in method name closes https://github.com/official-stockfish/Stockfish/pull/4404 No functional change	2023-02-24 20:12:53 +01:00
Joost VandeVondele	08385527dd	Introduce a function to compute NNUE accumulator This patch introduces `hint_common_parent_position()` to signal that potentially several child nodes will require an NNUE eval. By populating explicitly the accumulator, these subsequent evaluations can be performed more efficiently. This was based on the observation that calculating the evaluation in an excluded move position yielded a significant Elo gain, even though the evaluation itself was already available (work by pb00067). Sopel wrote the code to perform just the accumulator update. This PR is based on cleaned up code that passed STC: https://tests.stockfishchess.org/tests/view/63f62f9be74a12625bcd4aa0 LLR: 2.94 (-2.94,2.94) <0.50,2.50> Total: 110368 W: 29607 L: 29167 D: 51594 Ptnml(0-2): 41, 10551, 33572, 10967, 53 and in an the earlier (equivalent) version passed STC: https://tests.stockfishchess.org/tests/view/63f3c3fee74a12625bcce2a6 LLR: 2.95 (-2.94,2.94) <0.00,2.00> Total: 47552 W: 12786 L: 12467 D: 22299 Ptnml(0-2): 120, 5107, 12997, 5438, 114 passed LTC: https://tests.stockfishchess.org/tests/view/63f45cc2e74a12625bccfa63 LLR: 2.94 (-2.94,2.94) <0.50,2.50> Total: 110368 W: 29607 L: 29167 D: 51594 Ptnml(0-2): 41, 10551, 33572, 10967, 53 closes https://github.com/official-stockfish/Stockfish/pull/4402 Bench: 3726250	2023-02-23 13:25:35 +01:00
Sebastian Buchwald	b60f9cc451	Update copyright years Happy New Year! closes https://github.com/official-stockfish/Stockfish/pull/4315 No functional change	2023-01-02 19:07:38 +01:00
mstembera	93f71ecfe1	Optimize make_index() using templates and lookup tables. https://tests.stockfishchess.org/tests/view/634517e54bc7650f07542f99 LLR: 2.94 (-2.94,2.94) <0.00,2.00> Total: 642672 W: 171819 L: 170658 D: 300195 Ptnml(0-2): 2278, 68077, 179416, 69336, 2229 this also introduces `-flto-partition=one` as suggested by MinetaS (Syine Mineta) to avoid linking errors due to LTO on 32 bit mingw. This change was tested in isolation as well https://tests.stockfishchess.org/tests/view/634aacf84bc7650f0755188b LLR: 2.94 (-2.94,2.94) <-1.75,0.25> Total: 119352 W: 31986 L: 31862 D: 55504 Ptnml(0-2): 439, 12624, 33400, 12800, 413 closes https://github.com/official-stockfish/Stockfish/pull/4199 No functional change	2022-10-16 11:42:19 +02:00
Giacomo Lorenzetti	f7d1491b3d	Assorted small cleanups closes https://github.com/official-stockfish/Stockfish/pull/3973 No functional change	2022-05-29 18:42:48 +02:00
Ben Chaney	270a0e737f	Generalize the feature transform to use vec_t macros This commit generalizes the feature transform to use vec_t macros that are architecture defined instead of using a seperate code path for each one. It should make some old architectures (MMX, including improvements by Fanael) faster and make further such improvements easier in the future. Includes some corrections to CI for mingw. closes https://github.com/official-stockfish/Stockfish/pull/3955 closes https://github.com/official-stockfish/Stockfish/pull/3928 No functional change	2022-03-02 23:39:08 +01:00
mstembera	5f781d366e	Clean up and simplify some nnue code. Remove some unnecessary code and it's execution during inference. Also the change on line 49 in nnue_architecture.h results in a more efficient SIMD code path through ClippedReLU::propagate(). passed STC: https://tests.stockfishchess.org/tests/view/6217d3bfda649bba32ef25d5 LLR: 2.94 (-2.94,2.94) <-2.25,0.25> Total: 12056 W: 3281 L: 3092 D: 5683 Ptnml(0-2): 55, 1213, 3312, 1384, 64 passed STC SMP: https://tests.stockfishchess.org/tests/view/6217f344da649bba32ef295e LLR: 2.94 (-2.94,2.94) <-2.25,0.25> Total: 27376 W: 7295 L: 7137 D: 12944 Ptnml(0-2): 52, 2859, 7715, 3003, 59 closes https://github.com/official-stockfish/Stockfish/pull/3944 No functional change bench: 6820724	2022-02-25 08:37:57 +01:00
Tomasz Sobczyk	cb9c2594fc	Update architecture to "SFNNv4". Update network to nn-6877cd24400e.nnue. Architecture: The diagram of the "SFNNv4" architecture: https://user-images.githubusercontent.com/8037982/153455685-cbe3a038-e158-4481-844d-9d5fccf5c33a.png The most important architectural changes are the following: * 1024x2 [activated] neurons are pairwise, elementwise multiplied (not quite pairwise due to implementation details, see diagram), which introduces a non-linearity that exhibits similar benefits to previously tested sigmoid activation (quantmoid4), while being slightly faster. * The following layer has therefore 2x less inputs, which we compensate by having 2 more outputs. It is possible that reducing the number of outputs might be beneficial (as we had it as low as 8 before). The layer is now 1024->16. * The 16 outputs are split into 15 and 1. The 1-wide output is added to the network output (after some necessary scaling due to quantization differences). The 15-wide is activated and follows the usual path through a set of linear layers. The additional 1-wide output is at least neutral, but has shown a slightly positive trend in training compared to networks without it (all 16 outputs through the usual path), and allows possibly an additional stage of lazy evaluation to be introduced in the future. Additionally, the inference code was rewritten and no longer uses a recursive implementation. This was necessitated by the splitting of the 16-wide intermediate result into two, which was impossible to do with the old implementation with ugly hacks. This is hopefully overall for the better. First session: The first session was training a network from scratch (random initialization). The exact trainer used was slightly different (older) from the one used in the second session, but it should not have a measurable effect. The purpose of this session is to establish a strong network base for the second session. Small deviations in strength do not harm the learnability in the second session. The training was done using the following command: python3 train.py \ /home/sopel/nnue/nnue-pytorch-training/data/nodes5000pv2_UHO.binpack \ /home/sopel/nnue/nnue-pytorch-training/data/nodes5000pv2_UHO.binpack \ --gpus "$3," \ --threads 4 \ --num-workers 4 \ --batch-size 16384 \ --progress_bar_refresh_rate 20 \ --random-fen-skipping 3 \ --features=HalfKAv2_hm^ \ --lambda=1.0 \ --gamma=0.992 \ --lr=8.75e-4 \ --max_epochs=400 \ --default_root_dir ../nnue-pytorch-training/experiment_$1/run_$2 Every 20th net was saved and its playing strength measured against some baseline at 25k nodes per move with pure NNUE evaluation (modified binary). The exact setup is not important as long as it's consistent. The purpose is to sift good candidates from bad ones. The dataset can be found https://drive.google.com/file/d/1UQdZN_LWQ265spwTBwDKo0t1WjSJKvWY/view Second session: The second training session was done starting from the best network (as determined by strength testing) from the first session. It is important that it's resumed from a .pt model and NOT a .ckpt model. The conversion can be performed directly using serialize.py The LR schedule was modified to use gamma=0.995 instead of gamma=0.992 and LR=4.375e-4 instead of LR=8.75e-4 to flatten the LR curve and allow for longer training. The training was then running for 800 epochs instead of 400 (though it's possibly mostly noise after around epoch 600). The training was done using the following command: The training was done using the following command: python3 train.py \ /data/sopel/nnue/nnue-pytorch-training/data/T60T70wIsRightFarseerT60T74T75T76.binpack \ /data/sopel/nnue/nnue-pytorch-training/data/T60T70wIsRightFarseerT60T74T75T76.binpack \ --gpus "$3," \ --threads 4 \ --num-workers 4 \ --batch-size 16384 \ --progress_bar_refresh_rate 20 \ --random-fen-skipping 3 \ --features=HalfKAv2_hm^ \ --lambda=1.0 \ --gamma=0.995 \ --lr=4.375e-4 \ --max_epochs=800 \ --resume-from-model /data/sopel/nnue/nnue-pytorch-training/data/exp295/nn-epoch399.pt \ --default_root_dir ../nnue-pytorch-training/experiment_$1/run_$run_id In particular note that we now use lambda=1.0 instead of lambda=0.8 (previous nets), because tests show that WDL-skipping introduced by vondele performs better with lambda=1.0. Nets were being saved every 20th epoch. In total 16 runs were made with these settings and the best nets chosen according to playing strength at 25k nodes per move with pure NNUE evaluation - these are the 4 nets that have been put on fishtest. The dataset can be found either at ftp://ftp.chessdb.cn/pub/sopel/data_sf/T60T70wIsRightFarseerT60T74T75T76.binpack in its entirety (download might be painfully slow because hosted in China) or can be assembled in the following way: Get the `5640ad48ae/script/interleave_binpacks.py` script. Download T60T70wIsRightFarseer.binpack https://drive.google.com/file/d/1_sQoWBl31WAxNXma2v45004CIVltytP8/view Download farseerT74.binpack http://trainingdata.farseer.org/T74-May13-End.7z Download farseerT75.binpack http://trainingdata.farseer.org/T75-June3rd-End.7z Download farseerT76.binpack http://trainingdata.farseer.org/T76-Nov10th-End.7z Run python3 interleave_binpacks.py T60T70wIsRightFarseer.binpack farseerT74.binpack farseerT75.binpack farseerT76.binpack T60T70wIsRightFarseerT60T74T75T76.binpack Tests: STC: https://tests.stockfishchess.org/tests/view/6203fb85d71106ed12a407b7 LLR: 2.94 (-2.94,2.94) <0.00,2.50> Total: 16952 W: 4775 L: 4521 D: 7656 Ptnml(0-2): 133, 1818, 4318, 2076, 131 LTC: https://tests.stockfishchess.org/tests/view/62041e68d71106ed12a40e85 LLR: 2.94 (-2.94,2.94) <0.50,3.00> Total: 14944 W: 4138 L: 3907 D: 6899 Ptnml(0-2): 21, 1499, 4202, 1728, 22 closes https://github.com/official-stockfish/Stockfish/pull/3927 Bench: 4919707	2022-02-10 19:54:31 +01:00
Brad Knox	ad926d34c0	Update copyright years Happy New Year! closes https://github.com/official-stockfish/Stockfish/pull/3881 No functional change	2022-01-06 15:45:45 +01:00
Tomasz Sobczyk	4766dfc395	Optimize FT activation and affine transform for NEON. This patch optimizes the NEON implementation in two ways. The activation layer after the feature transformer is rewritten to make it easier for the compiler to see through dependencies and unroll. This in itself is a minimal, but a positive improvement. Other architectures could benefit from this too in the future. This is not an algorithmic change. The affine transform for large matrices (first layer after FT) on NEON now utilizes the same optimized code path as >=SSSE3, which makes the memory accesses more sequential and makes better use of the available registers, which allows for code that has longer dependency chains. Benchmarks from Redshift#161, profile-build with apple clang george@Georges-MacBook-Air nets % ./stockfish-b82d93 bench 2>&1 \| tail -4 (current master) =========================== Total time (ms) : 2167 Nodes searched : 4667742 Nodes/second : 2154011 george@Georges-MacBook-Air nets % ./stockfish-7377b8 bench 2>&1 \| tail -4 (this patch) =========================== Total time (ms) : 1842 Nodes searched : 4667742 Nodes/second : 2534061 This is a solid 18% improvement overall, larger in a bench with NNUE-only, not mixed. Improvement is also observed on armv7-neon (Raspberry Pi, and older phones), around 5% speedup. No changes for architectures other than NEON. closes https://github.com/official-stockfish/Stockfish/pull/3837 No functional changes.	2021-12-07 18:08:54 +01:00
mstembera	644f6d4790	Simplify away ValueListInserter plus minor cleanups STC: https://tests.stockfishchess.org/tests/view/616f059b40f619782fd4f73f LLR: 2.94 (-2.94,2.94) <-2.50,0.50> Total: 84992 W: 21244 L: 21197 D: 42551 Ptnml(0-2): 279, 9005, 23868, 9078, 266 closes https://github.com/official-stockfish/Stockfish/pull/3749 No functional change	2021-10-23 12:21:17 +02:00
Tomasz Sobczyk	900f249f59	Reduce the number of accumulator states Reduce from 3 to 2. Make the intent of the states clearer. STC: https://tests.stockfishchess.org/tests/view/60c50111457376eb8bcaad03 LLR: 2.95 (-2.94,2.94) <-2.50,0.50> Total: 61888 W: 5007 L: 4944 D: 51937 Ptnml(0-2): 164, 3947, 22649, 4030, 154 LTC: https://tests.stockfishchess.org/tests/view/60c52b1c457376eb8bcaad2c LLR: 2.94 (-2.94,2.94) <-2.50,0.50> Total: 20248 W: 688 L: 618 D: 18942 Ptnml(0-2): 7, 551, 8946, 605, 15 closes https://github.com/official-stockfish/Stockfish/pull/3548 No functional change.	2021-06-14 11:22:08 +02:00
Tomasz Sobczyk	ce4c523ad3	Register count for feature transformer Compute optimal register count for feature transformer accumulation dynamically. This also introduces a change where AVX512 would only use 8 registers instead of 16 (now possible due to a 2x increase in feature transformer size). closes https://github.com/official-stockfish/Stockfish/pull/3543 No functional change	2021-06-13 13:10:56 +02:00
Tomasz Sobczyk	b84fa04db6	Read NNUE net faster Load feature transformer weights in bulk on little-endian machines. This is in particular useful to test new nets with c-chess-cli, see https://github.com/lucasart/c-chess-cli/issues/44 ``` $ time ./stockfish.exe uci Before : 0m0.914s After : 0m0.483s ``` No functional change	2021-06-13 09:39:03 +02:00
Stéphane Nicolet	8f081c86f7	Clean SIMD code a bit Cleaner vector code structure in feature transformer. This patch just regroups the parts of the inner loop for each SIMD instruction set. Tested for non-regression: LLR: 2.96 (-2.94,2.94) <-2.50,0.50> Total: 115760 W: 9835 L: 9831 D: 96094 Ptnml(0-2): 326, 7776, 41715, 7694, 369 https://tests.stockfishchess.org/tests/view/60b96b39457376eb8bcaa26e It would be nice if a future patch could use some of the macros at the top of the file to unify the code between the distincts SIMD instruction sets (of course, unifying the Relu will be the challenge). closes https://github.com/official-stockfish/Stockfish/pull/3506 No functional change	2021-06-04 14:07:46 +02:00
Tomasz Sobczyk	5448cad29e	Fix export of the feature transformer. PSQT export was missing. fixes #3507 closes https://github.com/official-stockfish/Stockfish/pull/3508 No functional change	2021-05-30 21:31:58 +02:00
Stéphane Nicolet	f193778446	Do not use lazy evaluation inside NNUE This simplification patch implements two changes: 1. it simplifies away the so-called "lazy" path in the NNUE evaluation internals, where we trusted the psqt head alone to avoid the costly "positional" head in some cases; 2. it raises a little bit the NNUEThreshold1 in evaluate.cpp (from 682 to 800), which increases the limit where we switched from NNUE eval to Classical eval. Both effects increase the number of positional evaluations done by our new net architecture, but the results of our tests below seem to indicate that the loss of speed will be compensated by the gain of eval quality. STC: LLR: 2.95 (-2.94,2.94) <-2.50,0.50> Total: 26280 W: 2244 L: 2137 D: 21899 Ptnml(0-2): 72, 1755, 9405, 1810, 98 https://tests.stockfishchess.org/tests/view/60ae73f112066fd299795a51 LTC: LLR: 2.95 (-2.94,2.94) <-2.50,0.50> Total: 20592 W: 750 L: 677 D: 19165 Ptnml(0-2): 9, 614, 8980, 681, 12 https://tests.stockfishchess.org/tests/view/60ae88e812066fd299795a82 closes https://github.com/official-stockfish/Stockfish/pull/3503 Bench: 3817907	2021-05-27 01:21:56 +02:00
Tomasz Sobczyk	9d53129075	Expose the lazy threshold for the feature transformer PSQT as a parameter. Definition of the lazy threshold moved to evaluate.cpp where all others are. Lazy threshold only used for real searches, not used for the "eval" call. This preserves the purity of NNUE evaluation, which is useful to verify consistency between the engine and the NNUE trainer. closes https://github.com/official-stockfish/Stockfish/pull/3499 No functional change	2021-05-25 21:40:51 +02:00
Fanael Linithien	038487f954	Use packed 32-bit MMX operations for updating the PSQT accumulator This improves the speed of NNUE by a bit on old hardware that code path is intended for, like a Pentium III 1.13 GHz: 10 repeats of "./stockfish bench 16 1 13 default depth NNUE": Before: 54 642 504 897 cycles (± 0.12%) 62 301 937 829 instructions (± 0.03%) After: 54 320 821 928 cycles (± 0.13%) 62 084 742 699 instructions (± 0.02%) Speed of go depth 20 from startpos: Before: 53103 nps After: 53856 nps closes https://github.com/official-stockfish/Stockfish/pull/3476 No functional change.	2021-05-19 19:34:44 +02:00
Tomasz Sobczyk	e8d64af123	New NNUE architecture and net Introduces a new NNUE network architecture and associated network parameters, as obtained by a new pytorch trainer. The network is already very strong at short TC, without regression at longer TC, and has potential for further improvements. https://tests.stockfishchess.org/tests/view/60a159c65085663412d0921d TC: 10s+0.1s, 1 thread ELO: 21.74 +-3.4 (95%) LOS: 100.0% Total: 10000 W: 1559 L: 934 D: 7507 Ptnml(0-2): 38, 701, 2972, 1176, 113 https://tests.stockfishchess.org/tests/view/60a187005085663412d0925b TC: 60s+0.6s, 1 thread ELO: 5.85 +-1.7 (95%) LOS: 100.0% Total: 20000 W: 1381 L: 1044 D: 17575 Ptnml(0-2): 27, 885, 7864, 1172, 52 https://tests.stockfishchess.org/tests/view/60a2beede229097940a03806 TC: 20s+0.2s, 8 threads LLR: 2.93 (-2.94,2.94) <0.50,3.50> Total: 34272 W: 1610 L: 1452 D: 31210 Ptnml(0-2): 30, 1285, 14350, 1439, 32 https://tests.stockfishchess.org/tests/view/60a2d687e229097940a03c72 TC: 60s+0.6s, 8 threads LLR: 2.94 (-2.94,2.94) <-2.50,0.50> Total: 45544 W: 1262 L: 1214 D: 43068 Ptnml(0-2): 12, 1129, 20442, 1177, 12 The network has been trained (by vondele) using the https://github.com/glinscott/nnue-pytorch/ trainer (started by glinscott), specifically the branch https://github.com/Sopel97/nnue-pytorch/tree/experiment_56. The data used are in 64 billion positions (193GB total) generated and scored with the current master net d8: https://drive.google.com/file/d/1hOOYSDKgOOp38ZmD0N4DV82TOLHzjUiF/view?usp=sharing d9: https://drive.google.com/file/d/1VlhnHL8f-20AXhGkILujnNXHwy9T-MQw/view?usp=sharing d10: https://drive.google.com/file/d/1ZC5upzBYMmMj1gMYCkt6rCxQG0GnO3Kk/view?usp=sharing fishtest_d9: https://drive.google.com/file/d/1GQHt0oNgKaHazwJFTRbXhlCN3FbUedFq/view?usp=sharing This network also contains a few architectural changes with respect to the current master: Size changed from 256x2-32-32-1 to 512x2-16-32-1 ~15-20% slower ~2x larger adds a special path for 16 valued ClippedReLU fixes affine transform code for 16 inputs/outputs, buy using InputDimensions instead of PaddedInputDimensions this is safe now because the inputs are processed in groups of 4 in the current affine transform code The feature set changed from HalfKP to HalfKAv2 Includes information about the kings like HalfKA Packs king features better, resulting in 8% size reduction compared to HalfKA The board is flipped for the black's perspective, instead of rotated like in the current master PSQT values for each feature the feature transformer now outputs a part that is fowarded directly to the output and allows learning piece values more directly than the previous network architecture. The effect is visible for high imbalance positions, where the current master network outputs evaluations skewed towards zero. 8 PSQT values per feature, chosen based on (popcount(pos.pieces()) - 1) / 4 initialized to classical material values on the start of the training 8 subnetworks (512x2->16->32->1), chosen based on (popcount(pos.pieces()) - 1) / 4 only one subnetwork is evaluated for any position, no or marginal speed loss A diagram of the network is available: https://user-images.githubusercontent.com/8037982/118656988-553a1700-b7eb-11eb-82ef-56a11cbebbf2.png A more complete description: https://github.com/glinscott/nnue-pytorch/blob/master/docs/nnue.md closes https://github.com/official-stockfish/Stockfish/pull/3474 Bench: 3806488	2021-05-18 18:06:23 +02:00
Tomasz Sobczyk	58054fd0fa	Exporting the currently loaded network file This PR adds an ability to export any currently loaded network. The export_net command now takes an optional filename parameter. If the loaded net is not the embedded net the filename parameter is required. Two changes were required to support this: * the "architecture" string, which is really just a some kind of description in the net, is now saved into netDescription on load and correctly saved on export. * the AffineTransform scrambles weights for some architectures and sparsifies them, such that retrieving the index is hard. This is solved by having a temporary scrambled<->unscrambled index lookup table when loading the network, and the actual index is saved for each individual weight that makes it to canSaturate16. This increases the size of the canSaturate16 entries by 6 bytes. closes https://github.com/official-stockfish/Stockfish/pull/3456 No functional change	2021-05-11 19:36:11 +02:00
Tomasz Sobczyk	b748b46714	Cleanup and simplify NNUE code. A lot of optimizations happend since the NNUE was introduced and since then some parts of the code were left unused. This got to the point where asserts were have to be made just to let people know that modifying something will not have any effects or may even break everything due to the assumptions being made. Removing these parts removes those inexisting "false dependencies". Additionally: * append_changed_indices now takes the king pos and stateinfo explicitly, no more misleading pos parameter * IndexList is removed in favor of a generic ValueList. Feature transformer just instantiates the type it needs. * The update cost and refresh requirement is deferred to the feature set once again, but now doesn't go through the whole FeatureSet machinery and just calls HalfKP directly. * accumulator no longer has a singular dimension. * The PS constants and the PieceSquareIndex array are made local to the HalfKP feature set because they are specific to it and DO differ for other feature sets. * A few names are changed to more descriptive Passed STC non-regression: https://tests.stockfishchess.org/tests/view/608421dd95e7f1852abd2790 LLR: 2.95 (-2.94,2.94) <-2.50,0.50> Total: 180008 W: 16186 L: 16258 D: 147564 Ptnml(0-2): 587, 12593, 63725, 12503, 596 closes https://github.com/official-stockfish/Stockfish/pull/3441 No functional change	2021-04-25 13:16:30 +02:00
Tomasz Sobczyk	fbbd4adc3c	Unify naming convention of the NNUE code matches the rest of the stockfish code base closes https://github.com/official-stockfish/Stockfish/pull/3437 No functional change	2021-04-24 12:49:29 +02:00
Dieter Dobbelaere	7ffae17f85	Add Stockfish namespace. fixes #3350 and is a small cleanup that might make it easier to use SF in separate projects, like a NNUE trainer or similar. closes https://github.com/official-stockfish/Stockfish/pull/3370 No functional change.	2021-03-07 14:26:54 +01:00
Joost VandeVondele	c4d67d77c9	Update copyright years No functional change	2021-01-08 17:04:23 +01:00
Stéphane Nicolet	027626db1e	Small cleanups 13 No functional change	2020-11-23 22:20:32 +01:00
Tomasz Sobczyk	ba35c88ab8	AVX-512 for smaller affine and feature transforms. For the feature transformer the code is analogical to AVX2 since there was room for easy adaptation of wider simd registers. For the smaller affine transforms that have 32 byte stride we keep 2 columns in one zmm register. We also unroll more aggressively so that in the end we have to do 16 parallel horizontal additions on ymm slices each consisting of 4 32-bit integers. The slices are embedded in 8 zmm registers. These changes provide about 1.5% speedup for AVX-512 builds. Closes https://github.com/official-stockfish/Stockfish/pull/3218 No functional change.	2020-11-07 16:49:49 +01:00
Tomasz Sobczyk	3f6451eff7	Manually align arrays on the stack as a workaround to issues with overaligned alignas() on stack variables in gcc < 9.3 on windows. closes https://github.com/official-stockfish/Stockfish/pull/3217 fixes #3216 No functional change	2020-11-04 19:52:42 +01:00
syzygy1	2046d5da30	More incremental accumulator updates This patch was inspired by `c065abd` which updates the accumulator, if possible, based on the accumulator of two plies back if the accumulator of the preceding ply is not available. With this patch we look back even further in the position history in an attempt to reduce the number of complete recomputations. When we find a usable accumulator for the position N plies back, we also update the accumulator of the position N-1 plies back because that accumulator is most likely to be helpful later when evaluating positions in sibling branches. By not updating all intermediate accumulators immediately, we avoid doing too much work that is not certain to be useful. Overall, roughly 2-3% speedup. This patch makes the code more specific to the net architecture, changing input features of the net will require additional changes to the incremental update code as discussed in the PR #3193 and #3191. Passed STC: https://tests.stockfishchess.org/tests/view/5f9056712c92c7fe3a8c60d0 LLR: 2.94 (-2.94,2.94) {-0.25,1.25} Total: 10040 W: 1116 L: 968 D: 7956 Ptnml(0-2): 42, 722, 3365, 828, 63 closes https://github.com/official-stockfish/Stockfish/pull/3193 No functional change.	2020-10-22 20:50:16 +02:00
noobpwnftw	c065abdcaf	Use incremental updates more often Use incremental updates for accumulators for up to 2 plies. Do not copy accumulator. About 2% speedup. Passed STC: LLR: 2.95 (-2.94,2.94) {-0.25,1.25} Total: 21752 W: 2583 L: 2403 D: 16766 Ptnml(0-2): 128, 1761, 6923, 1931, 133 https://tests.stockfishchess.org/tests/view/5f7150cf3b22d6afa5069412 closes https://github.com/official-stockfish/Stockfish/pull/3157 No functional change	2020-09-28 16:54:35 +02:00

1 2

60 commits