Cleaner vector code structure in feature transformer. This patch just
regroups the parts of the inner loop for each SIMD instruction set.
Tested for non-regression:
LLR: 2.96 (-2.94,2.94) <-2.50,0.50>
Total: 115760 W: 9835 L: 9831 D: 96094
Ptnml(0-2): 326, 7776, 41715, 7694, 369
https://tests.stockfishchess.org/tests/view/60b96b39457376eb8bcaa26e
It would be nice if a future patch could use some of the macros at
the top of the file to unify the code between the distincts SIMD
instruction sets (of course, unifying the Relu will be the challenge).
closes https://github.com/official-stockfish/Stockfish/pull/3506
No functional change
This simplification patch implements two changes:
1. it simplifies away the so-called "lazy" path in the NNUE evaluation internals,
where we trusted the psqt head alone to avoid the costly "positional" head in
some cases;
2. it raises a little bit the NNUEThreshold1 in evaluate.cpp (from 682 to 800),
which increases the limit where we switched from NNUE eval to Classical eval.
Both effects increase the number of positional evaluations done by our new net
architecture, but the results of our tests below seem to indicate that the loss
of speed will be compensated by the gain of eval quality.
STC:
LLR: 2.95 (-2.94,2.94) <-2.50,0.50>
Total: 26280 W: 2244 L: 2137 D: 21899
Ptnml(0-2): 72, 1755, 9405, 1810, 98
https://tests.stockfishchess.org/tests/view/60ae73f112066fd299795a51
LTC:
LLR: 2.95 (-2.94,2.94) <-2.50,0.50>
Total: 20592 W: 750 L: 677 D: 19165
Ptnml(0-2): 9, 614, 8980, 681, 12
https://tests.stockfishchess.org/tests/view/60ae88e812066fd299795a82
closes https://github.com/official-stockfish/Stockfish/pull/3503
Bench: 3817907
Definition of the lazy threshold moved to evaluate.cpp where all others are.
Lazy threshold only used for real searches, not used for the "eval" call.
This preserves the purity of NNUE evaluation, which is useful to verify
consistency between the engine and the NNUE trainer.
closes https://github.com/official-stockfish/Stockfish/pull/3499
No functional change
Our new nets output two values for the side to move in the last layer.
We can interpret the first value as a material evaluation of the
position, and the second one as the dynamic, positional value of the
location of pieces.
This patch changes the balance for the (materialist, positional) parts
of the score from (128, 128) to (121, 135) when the piece material is
equal between the two players, but keeps the standard (128, 128) balance
when one player is at least an exchange up.
Passed STC:
LLR: 2.93 (-2.94,2.94) <-0.50,2.50>
Total: 15936 W: 1421 L: 1266 D: 13249
Ptnml(0-2): 37, 1037, 5694, 1134, 66
https://tests.stockfishchess.org/tests/view/60a82df9ce8ea25a3ef0408f
Passed LTC:
LLR: 2.94 (-2.94,2.94) <0.50,3.50>
Total: 13904 W: 516 L: 410 D: 12978
Ptnml(0-2): 4, 374, 6088, 484, 2
https://tests.stockfishchess.org/tests/view/60a8bbf9ce8ea25a3ef04101
closes https://github.com/official-stockfish/Stockfish/pull/3492
Bench: 3856635
The Tempo variable was introduced 10 years ago in our search because the
classical evaluation function was antisymmetrical in White and Black by design
to gain speed:
Eval(White to play) = -Eval(Black to play)
Nowadays our neural networks know which side is to play in a position when
they evaluate a position and are trained on real games, so the neural network
encodes the advantage of moving as an output of search. This patch shows that
the Tempo variable is not necessary anymore.
STC:
LLR: 2.94 (-2.94,2.94) <-2.50,0.50>
Total: 33512 W: 2805 L: 2709 D: 27998
Ptnml(0-2): 80, 2209, 12095, 2279, 93
https://tests.stockfishchess.org/tests/view/60a44ceace8ea25a3ef03d30
LTC:
LLR: 2.95 (-2.94,2.94) <-2.50,0.50>
Total: 53920 W: 1807 L: 1760 D: 50353
Ptnml(0-2): 16, 1617, 23650, 1658, 19
https://tests.stockfishchess.org/tests/view/60a477f0ce8ea25a3ef03d49
We also tried a match (20000 games) at STC using purely classical, result was neutral:
https://tests.stockfishchess.org/tests/view/60a4eebcce8ea25a3ef03db5
Note: there are two locations left in search.cpp where we assume antisymmetry
of evaluation (in relation with a speed optimization for null moves in lines
770 and 1439), but as the values are just used for heuristic pruning this
approximation should not hurt too much because the order of magnitude is still
true most of the time.
closes https://github.com/official-stockfish/Stockfish/pull/3481
Bench: 4015864
This improves the speed of NNUE by a bit on old hardware that code path
is intended for, like a Pentium III 1.13 GHz:
10 repeats of "./stockfish bench 16 1 13 default depth NNUE":
Before:
54 642 504 897 cycles (± 0.12%)
62 301 937 829 instructions (± 0.03%)
After:
54 320 821 928 cycles (± 0.13%)
62 084 742 699 instructions (± 0.02%)
Speed of go depth 20 from startpos:
Before: 53103 nps
After: 53856 nps
closes https://github.com/official-stockfish/Stockfish/pull/3476
No functional change.
This patch increases lmrDepth threshold for continuation history based pruning in search.
This part of code for a long time was known to be really TC sensitive - decreasing
this threshold easily passed lower time controls but failed badly at LTC,
on the other hand it increase was part of a tuning that resulted
in being negative at STC but was +12 elo at 180+1.8.
After recent simplification of special conditions that sometimes
increase it from 4 to 5 it was logical to overall test at longer
time controls if 5 is better than 4 with deeper searches.
reduces strenght on STC
https://tests.stockfishchess.org/tests/view/60a3a8bbce8ea25a3ef03c74
ELO: -2.57 +-2.0 (95%) LOS: 0.6%
Total: 20000 W: 1820 L: 1968 D: 16212
Ptnml(0-2): 68, 1582, 6836, 1458, 56
Passed LTC with STC bounds
https://tests.stockfishchess.org/tests/view/60a027395085663412d090ce
LLR: 2.93 (-2.94,2.94) <-0.50,2.50>
Total: 175256 W: 6774 L: 6548 D: 161934
Ptnml(0-2): 91, 5808, 75604, 6034, 91
Passed VLTC with LTC bounds
https://tests.stockfishchess.org/tests/view/60a2bccce229097940a037a7
LLR: 2.96 (-2.94,2.94) <0.50,3.50>
Total: 65736 W: 1224 L: 1092 D: 63420
Ptnml(0-2): 5, 1012, 30706, 1136, 9
closes https://github.com/official-stockfish/Stockfish/pull/3473
bench 3689330
- Comment for Countemove pruning -> Continuation history
- Fix comment in input_slice.h
- Shorter lines in Makefile
- Comment for scale factor
- Fix comment for pinners in see_ge()
- Change Thread.id() signature to size_t
- Trailing space in reprosearch.sh
- Add Douglas Matos Gomes to the AUTHORS file
- Introduce comment for undo_null_move()
- Use Stockfish coding style for export_net()
- Change date in AUTHORS file
closes https://github.com/official-stockfish/Stockfish/pull/3416
No functional change
e2k (Elbrus 2000) - this is a VLIW/EPIC architecture,
the like Intel Itanium (IA-64) architecture.
The architecture has half native / half software support
for most Intel/AMD SIMD (e.g. MMX/SSE/SSE2/SSE3/SSSE3/SSE4.1/SSE4.2/AES/AVX/AVX2 & 3DNow!/SSE4a/XOP/FMA4) via intrinsics.
https://en.wikipedia.org/wiki/Elbrus_2000
closes https://github.com/official-stockfish/Stockfish/pull/3425
No functional change
This PR adds an ability to export any currently loaded network.
The export_net command now takes an optional filename parameter.
If the loaded net is not the embedded net the filename parameter is required.
Two changes were required to support this:
* the "architecture" string, which is really just a some kind of description in the net, is now saved into netDescription on load and correctly saved on export.
* the AffineTransform scrambles weights for some architectures and sparsifies them, such that retrieving the index is hard. This is solved by having a temporary scrambled<->unscrambled index lookup table when loading the network, and the actual index is saved for each individual weight that makes it to canSaturate16. This increases the size of the canSaturate16 entries by 6 bytes.
closes https://github.com/official-stockfish/Stockfish/pull/3456
No functional change
This patch broadens and simplifies definition of PvNode that is likely to fail low.
New definition can be described as following "If node was already researched
at depth >= current depth and failed low there" which is more logical than the
previous version and takes less space + allows to not recompute it every time during move loop.
Passed simplification STC
https://tests.stockfishchess.org/tests/view/609148bf95e7f1852abd2e82
LLR: 2.93 (-2.94,2.94) <-2.50,0.50>
Total: 20128 W: 1865 L: 1751 D: 16512
Ptnml(0-2): 63, 1334, 7165, 1430, 72
Passed simplification LTC
https://tests.stockfishchess.org/tests/view/6091691295e7f1852abd2e8b
LLR: 2.94 (-2.94,2.94) <-2.50,0.50>
Total: 95128 W: 3498 L: 3481 D: 88149
Ptnml(0-2): 41, 2956, 41549, 2981, 37
closes https://github.com/official-stockfish/Stockfish/pull/3455
Bench: 3933037
Introduce variable tempo for nnue depending on logarithm of estimated
strength, where strength is the product of time and number of threads.
The original idea here was that NNUE is best with a slightly different
tempo value to classical, since its style of play is slightly different.
It turns out that the best tempo for NNUE varies with strength of play,
so a formula is used which gives about 19 for STC and 24 for LTC under
current fishtest settings.
STC 10+0.1:
LLR: 2.94 (-2.94,2.94) {-0.20,1.10}
Total: 120816 W: 11155 L: 10861 D: 98800
Ptnml(0-2): 406, 8728, 41933, 8848, 493
https://tests.stockfishchess.org/tests/view/60735b3a8141753378960534
LTC 60+0.6:
LLR: 2.94 (-2.94,2.94) {0.20,0.90}
Total: 35688 W: 1392 L: 1234 D: 33062
Ptnml(0-2): 23, 1079, 15473, 1255, 14
https://tests.stockfishchess.org/tests/view/6073ffbc814175337896057f
Passed non-regression SMP test at LTC 20+0.2 (8 threads):
LLR: 2.95 (-2.94,2.94) {-0.70,0.20}
Total: 11008 W: 317 L: 267 D: 10424
Ptnml(0-2): 2, 245, 4962, 291, 4
https://tests.stockfishchess.org/tests/view/60749ea881417533789605a4
closes https://github.com/official-stockfish/Stockfish/pull/3426
Bench 4075325