As stockfish nets and search evolve, the existing time control appears
to give too little time at STC, roughly correct at LTC, and too little
at VLTC+.
This change adds an adjustment to the optExtra calculation. This
adjustment is easy to retune and refine, so it should be easier to keep
up-to-date than the more complex calculations used for optConstant and
optScale.
Passed STC 10+0.1:
LLR: 2.93 (-2.94,2.94) <0.00,2.00>
Total: 169568 W: 43803 L: 43295 D: 82470
Ptnml(0-2): 485, 19679, 44055, 19973, 592
https://tests.stockfishchess.org/tests/view/66531865a86388d5e27da9fa
Yellow LTC 60+0.6:
LLR: -2.94 (-2.94,2.94) <0.50,2.50>
Total: 209970 W: 53087 L: 52914 D: 103969
Ptnml(0-2): 91, 19652, 65314, 19849, 79
https://tests.stockfishchess.org/tests/view/6653e38ba86388d5e27daaa0
Passed VLTC 180+1.8 :
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 85618 W: 21735 L: 21342 D: 42541
Ptnml(0-2): 15, 8267, 25848, 8668, 11
https://tests.stockfishchess.org/tests/view/6655131da86388d5e27db95f
closes https://github.com/official-stockfish/Stockfish/pull/5297
Bench: 1212167
Allow for NUMA memory replication for NNUE weights. Bind threads to ensure execution on a specific NUMA node.
This patch introduces NUMA memory replication, currently only utilized for the NNUE weights. Along with it comes all machinery required to identify NUMA nodes and bind threads to specific processors/nodes. It also comes with small changes to Thread and ThreadPool to allow easier execution of custom functions on the designated thread. Old thread binding (WinProcGroup) machinery is removed because it's incompatible with this patch. Small changes to unrelated parts of the code were made to ensure correctness, like some classes being made unmovable, raw pointers replaced with unique_ptr. etc.
Windows 7 and Windows 10 is partially supported. Windows 11 is fully supported. Linux is fully supported, with explicit exclusion of Android. No additional dependencies.
-----------------
A new UCI option `NumaPolicy` is introduced. It can take the following values:
```
system - gathers NUMA node information from the system (lscpu or windows api), for each threads binds it to a single NUMA node
none - assumes there is 1 NUMA node, never binds threads
auto - this is the default value, depends on the number of set threads and NUMA nodes, will only enable binding on multinode systems and when the number of threads reaches a threshold (dependent on node size and count)
[[custom]] -
// ':'-separated numa nodes
// ','-separated cpu indices
// supports "first-last" range syntax for cpu indices,
for example '0-15,32-47:16-31,48-63'
```
Setting `NumaPolicy` forces recreation of the threads in the ThreadPool, which in turn forces the recreation of the TT.
The threads are distributed among NUMA nodes in a round-robin fashion based on fill percentage (i.e. it will strive to fill all NUMA nodes evenly). Threads are bound to NUMA nodes, not specific processors, because that's our only requirement and the OS can schedule them better.
Special care is made that maximum memory usage on systems that do not require memory replication stays as previously, that is, unnecessary copies are avoided.
On linux the process' processor affinity is respected. This means that if you for example use taskset to restrict Stockfish to a single NUMA node then the `system` and `auto` settings will only see a single NUMA node (more precisely, the processors included in the current affinity mask) and act accordingly.
-----------------
We can't ensure that a memory allocation takes place on a given NUMA node without using libnuma on linux, or using appropriate custom allocators on windows (https://learn.microsoft.com/en-us/windows/win32/memory/allocating-memory-from-a-numa-node), so to avoid complications the current implementation relies on first-touch policy. Due to this we also rely on the memory allocator to give us a new chunk of untouched memory from the system. This appears to work reliably on linux, but results may vary.
MacOS is not supported, because AFAIK it's not affected, and implementation would be problematic anyway.
Windows is supported since Windows 7 (https://learn.microsoft.com/en-us/windows/win32/api/processtopologyapi/nf-processtopologyapi-setthreadgroupaffinity). Until Windows 11/Server 2022 NUMA nodes are split such that they cannot span processor groups. This is because before Windows 11/Server 2022 it's not possible to set thread affinity spanning processor groups. The splitting is done manually in some cases (required after Windows 10 Build 20348). Since Windows 11/Server 2022 we can set affinites spanning processor group so this splitting is not done, so the behaviour is pretty much like on linux.
Linux is supported, **without** libnuma requirement. `lscpu` is expected.
-----------------
Passed 60+1 @ 256t 16000MB hash: https://tests.stockfishchess.org/tests/view/6654e443a86388d5e27db0d8
```
LLR: 2.95 (-2.94,2.94) <0.00,10.00>
Total: 278 W: 110 L: 29 D: 139
Ptnml(0-2): 0, 1, 56, 82, 0
```
Passed SMP STC: https://tests.stockfishchess.org/tests/view/6654fc74a86388d5e27db1cd
```
LLR: 2.95 (-2.94,2.94) <-1.75,0.25>
Total: 67152 W: 17354 L: 17177 D: 32621
Ptnml(0-2): 64, 7428, 18408, 7619, 57
```
Passed STC: https://tests.stockfishchess.org/tests/view/6654fb27a86388d5e27db15c
```
LLR: 2.94 (-2.94,2.94) <-1.75,0.25>
Total: 131648 W: 34155 L: 34045 D: 63448
Ptnml(0-2): 426, 13878, 37096, 14008, 416
```
fixes#5253
closes https://github.com/official-stockfish/Stockfish/pull/5285
No functional change
Also "fix" movepicker to allow depths between CHECKS and NO_CHECKS,
which makes them easier to tweak (not that they get tweaked hardly ever)
(This was more beneficial when there was a third stage to DEPTH_QS, but
it's still an improvement now)
closes https://github.com/official-stockfish/Stockfish/pull/5205
No functional change
This reverts commit 4edd1a3
The unusual result of (combined) +12.0 +- 3.7 in the 2 VVLTC simplification SPRTs ran was the result of base having only 64MB of hash instead of 512MB (Asymmetric hash).
Vizvezdenec was the one to notice this.
closes https://github.com/official-stockfish/Stockfish/pull/5265
bench 1404295
Co-Authored-By: Michael Chaly <26898827+Vizvezdenec@users.noreply.github.com>
Basically the same idea as it is for continuation/main history, but it
has some tweaks.
1) it has * 2 multiplier for bonus instead of full/half bonus - for
whatever reason this seems to work better;
2) attempts with this type of big bonuses scaled somewhat poorly (or
were unlucky at longer time controls), but after measuring the fact
that average value of pawn history in LMR after adding this bonuses
increased by substantial number (for multiplier 1,5 it increased by
smth like 400~ from 8192 cap) attempts were made to make default pawn
history negative to compensate it - and version with multiplier 2 and
initial fill value -900 passed.
Passed STC:
https://tests.stockfishchess.org/tests/view/66424815f9f4e8fc783cba59
LLR: 2.93 (-2.94,2.94) <0.00,2.00>
Total: 115008 W: 30001 L: 29564 D: 55443
Ptnml(0-2): 432, 13629, 28903, 14150, 390
Passed LTC:
https://tests.stockfishchess.org/tests/view/6642f5437134c82f3f7a3ffa
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 56448 W: 14432 L: 14067 D: 27949
Ptnml(0-2): 36, 6268, 15254, 6627, 39
Bench: 1857237
Stockfish appears to take too much time on the first move of a game and
then not enough on moves 2,3,4... Probably caused by most of the factors
that increase time usually applying on the first move.
Attempts to give more time to the subsequent moves have not worked so
far, but this change to simply reduce first move time by 5% worked.
STC 10+0.1 :
LLR: 2.96 (-2.94,2.94) <0.00,2.00>
Total: 78496 W: 20516 L: 20135 D: 37845
Ptnml(0-2): 340, 8859, 20456, 9266, 327
https://tests.stockfishchess.org/tests/view/663d47bf507ebe1c0e9200ba
LTC 60+0.6 :
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 94872 W: 24179 L: 23751 D: 46942
Ptnml(0-2): 61, 9743, 27405, 10161, 66
https://tests.stockfishchess.org/tests/view/663e779cbb28828150dd9089
closes https://github.com/official-stockfish/Stockfish/pull/5235
Bench: 1876282
1. The current time management system utilizes limits.inc and
limits.time, which can represent either milliseconds or node count,
depending on whether the nodestime option is active. There have been
several modifications which brought Elo gain for typical uses (i.e.
real-time matches), however some of these changes overlooked such
distinction. This patch adjusts constants and multiplication/division to
more accurately simulate real TC conditions when nodestime is used.
2. The advance_nodes_time function has a bug that can extend the time
limit when availableNodes reaches exact zero. This patch fixes the bug
by initializing the variable to -1 and make sure it does not go below
zero.
3. elapsed_time function is newly introduced to print PV in the UCI
output based on real time. This makes PV output more consistent with the
behavior of trivial use cases.
closes https://github.com/official-stockfish/Stockfish/pull/5186
No functional changes
If there is an upper bound stored in the transposition table, but we still have a ttMove, the upperbound indicates that the last time the ttMove was tried, it failed low. This fail low indicates that the ttMove may not be good, so this patch introduces a depth reduction of one for cutnodes with such ttMoves.
Passed STC:
https://tests.stockfishchess.org/tests/view/663be4d1ca93dad645f7f45f
LLR: 2.93 (-2.94,2.94) <0.00,2.00>
Total: 139424 W: 35900 L: 35433 D: 68091
Ptnml(0-2): 425, 16357, 35743, 16700, 487
Passed LTC:
https://tests.stockfishchess.org/tests/view/663bec95ca93dad645f7f5c8
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 129690 W: 32902 L: 32390 D: 64398
Ptnml(0-2): 63, 14304, 35610, 14794, 74
closes https://github.com/official-stockfish/Stockfish/pull/5227
bench 2257437
Make it formula more in line with what we use in search - current formula is more or less the one we used years ago for search but since then it was remade, this patch remakes qsearch formula to almost exactly the same as we use in search - with sum of conthist 0, 1 and pawn structure history.
Passed STC:
https://tests.stockfishchess.org/tests/view/6639c8421343f0cb16716206
LLR: 2.93 (-2.94,2.94) <0.00,2.00>
Total: 84992 W: 22414 L: 22019 D: 40559
Ptnml(0-2): 358, 9992, 21440, 10309, 397
Passed LTC:
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 119136 W: 30407 L: 29916 D: 58813
Ptnml(0-2): 46, 13192, 32622, 13641, 67
closes https://github.com/official-stockfish/Stockfish/pull/5224
Bench: 2138659
The idea came to me by checking for trends from the megafauzi tunes, since the values of the divisor for this specific formula were as follows:
stc: 15990
mtc: 16117
ltc: 14805
vltc: 12719
new vltc passed by Muzhen: 12076
This shows a clear trend related to time control, the higher it is, the lower the optimum value for the divisor seems to be.
So I tried a simple formula, using educated guesses based on some calculations, tests show it works pretty fine, and it can still be further tuned at VLTC in the future to scale even better.
Passed STC:
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 431360 W: 110791 L: 109898 D: 210671
Ptnml(0-2): 1182, 50846, 110698, 51805, 1149
https://tests.stockfishchess.org/tests/view/663770409819650825aa269f
Passed LTC:
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 114114 W: 29109 L: 28625 D: 56380
Ptnml(0-2): 105, 12628, 31101, 13124, 99
https://tests.stockfishchess.org/tests/view/66378c099819650825aa73f6https://github.com/official-stockfish/Stockfish/pull/5223
bench: 2273551
This adds the functions `update_refutations` and `update_quiet_histories` to better distinguish the two. `update_quiet_stats` now just calls both of these functions.
The functional side of this patch is two-fold:
1. Stop refutations being updated when we carry out multicut
2. Update pawn history every time we update other quiet histories
Yellow STC:
LLR: -2.95 (-2.94,2.94) <0.00,2.00>
Total: 238976 W: 61506 L: 61415 D: 116055
Ptnml(0-2): 846, 28628, 60456, 28705, 853
https://tests.stockfishchess.org/tests/view/66321b5ed01fb9ac9bcdca83
However, it passed in <-1.75, 0.25> bounds:
$ python3 sprt.py --wins 61506 --losses 61415 --draws 116055 --elo0 -1.75 --elo1 0.25
ELO: 0.132 +- 0.998 [-0.865, 1.13]
LLR: 4.15 [-1.75, 0.25] (-2.94, 2.94)
H1 Accepted
Passed LTC:
LLR: 2.94 (-2.94,2.94) <-1.75,0.25>
Total: 399126 W: 100730 L: 100896 D: 197500
Ptnml(0-2): 116, 44328, 110843, 44158, 118
https://tests.stockfishchess.org/tests/view/66357b0473559a8aa857ba6fcloses#5215
Bench 2370967