Move the engine options into the engine class, also avoid duplicated
initializations after startup. UCIEngine needs to register an add_listener to
listen to all option changes and print these. Also avoid a double
initialization of the TT, which was the case with the old state.
closes https://github.com/official-stockfish/Stockfish/pull/5356
No functional change
Allow for NUMA memory replication for NNUE weights. Bind threads to ensure execution on a specific NUMA node.
This patch introduces NUMA memory replication, currently only utilized for the NNUE weights. Along with it comes all machinery required to identify NUMA nodes and bind threads to specific processors/nodes. It also comes with small changes to Thread and ThreadPool to allow easier execution of custom functions on the designated thread. Old thread binding (WinProcGroup) machinery is removed because it's incompatible with this patch. Small changes to unrelated parts of the code were made to ensure correctness, like some classes being made unmovable, raw pointers replaced with unique_ptr. etc.
Windows 7 and Windows 10 is partially supported. Windows 11 is fully supported. Linux is fully supported, with explicit exclusion of Android. No additional dependencies.
-----------------
A new UCI option `NumaPolicy` is introduced. It can take the following values:
```
system - gathers NUMA node information from the system (lscpu or windows api), for each threads binds it to a single NUMA node
none - assumes there is 1 NUMA node, never binds threads
auto - this is the default value, depends on the number of set threads and NUMA nodes, will only enable binding on multinode systems and when the number of threads reaches a threshold (dependent on node size and count)
[[custom]] -
// ':'-separated numa nodes
// ','-separated cpu indices
// supports "first-last" range syntax for cpu indices,
for example '0-15,32-47:16-31,48-63'
```
Setting `NumaPolicy` forces recreation of the threads in the ThreadPool, which in turn forces the recreation of the TT.
The threads are distributed among NUMA nodes in a round-robin fashion based on fill percentage (i.e. it will strive to fill all NUMA nodes evenly). Threads are bound to NUMA nodes, not specific processors, because that's our only requirement and the OS can schedule them better.
Special care is made that maximum memory usage on systems that do not require memory replication stays as previously, that is, unnecessary copies are avoided.
On linux the process' processor affinity is respected. This means that if you for example use taskset to restrict Stockfish to a single NUMA node then the `system` and `auto` settings will only see a single NUMA node (more precisely, the processors included in the current affinity mask) and act accordingly.
-----------------
We can't ensure that a memory allocation takes place on a given NUMA node without using libnuma on linux, or using appropriate custom allocators on windows (https://learn.microsoft.com/en-us/windows/win32/memory/allocating-memory-from-a-numa-node), so to avoid complications the current implementation relies on first-touch policy. Due to this we also rely on the memory allocator to give us a new chunk of untouched memory from the system. This appears to work reliably on linux, but results may vary.
MacOS is not supported, because AFAIK it's not affected, and implementation would be problematic anyway.
Windows is supported since Windows 7 (https://learn.microsoft.com/en-us/windows/win32/api/processtopologyapi/nf-processtopologyapi-setthreadgroupaffinity). Until Windows 11/Server 2022 NUMA nodes are split such that they cannot span processor groups. This is because before Windows 11/Server 2022 it's not possible to set thread affinity spanning processor groups. The splitting is done manually in some cases (required after Windows 10 Build 20348). Since Windows 11/Server 2022 we can set affinites spanning processor group so this splitting is not done, so the behaviour is pretty much like on linux.
Linux is supported, **without** libnuma requirement. `lscpu` is expected.
-----------------
Passed 60+1 @ 256t 16000MB hash: https://tests.stockfishchess.org/tests/view/6654e443a86388d5e27db0d8
```
LLR: 2.95 (-2.94,2.94) <0.00,10.00>
Total: 278 W: 110 L: 29 D: 139
Ptnml(0-2): 0, 1, 56, 82, 0
```
Passed SMP STC: https://tests.stockfishchess.org/tests/view/6654fc74a86388d5e27db1cd
```
LLR: 2.95 (-2.94,2.94) <-1.75,0.25>
Total: 67152 W: 17354 L: 17177 D: 32621
Ptnml(0-2): 64, 7428, 18408, 7619, 57
```
Passed STC: https://tests.stockfishchess.org/tests/view/6654fb27a86388d5e27db15c
```
LLR: 2.94 (-2.94,2.94) <-1.75,0.25>
Total: 131648 W: 34155 L: 34045 D: 63448
Ptnml(0-2): 426, 13878, 37096, 14008, 416
```
fixes#5253
closes https://github.com/official-stockfish/Stockfish/pull/5285
No functional change
This aims to remove some of the annoying global structure which Stockfish has.
Overall there is no major elo regression to be expected.
Non regression SMP STC (paused, early version):
https://tests.stockfishchess.org/tests/view/65983d7979aa8af82b9608f1
LLR: 0.23 (-2.94,2.94) <-1.75,0.25>
Total: 76232 W: 19035 L: 19096 D: 38101
Ptnml(0-2): 92, 8735, 20515, 8690, 84
Non regression STC (early version):
https://tests.stockfishchess.org/tests/view/6595b3a479aa8af82b95da7f
LLR: 2.93 (-2.94,2.94) <-1.75,0.25>
Total: 185344 W: 47027 L: 46972 D: 91345
Ptnml(0-2): 571, 21285, 48943, 21264, 609
Non regression SMP STC:
https://tests.stockfishchess.org/tests/view/65a0715c79aa8af82b96b7e4
LLR: 2.94 (-2.94,2.94) <-1.75,0.25>
Total: 142936 W: 35761 L: 35662 D: 71513
Ptnml(0-2): 209, 16400, 38135, 16531, 193
These global structures/variables add hidden dependencies and allow data
to be mutable from where it shouldn't it be (i.e. options). They also
prevent Stockfish from internal selfplay, which would be a nice thing to
be able to do, i.e. instantiate two Stockfish instances and let them
play against each other. It will also allow us to make Stockfish a
library, which can be easier used on other platforms.
For consistency with the old search code, `thisThread` has been kept,
even though it is not strictly necessary anymore. This the first major
refactor of this kind (in recent time), and future changes are required,
to achieve the previously described goals. This includes cleaning up the
dependencies, transforming the network to be self contained and coming
up with a plan to deal with proper tablebase memory management (see
comments for more information on this).
The removal of these global structures has been discussed in parts with
Vondele and Sopel.
closes https://github.com/official-stockfish/Stockfish/pull/4968
No functional change
And #ifdef instead of #if defined
This is more standard form (see for example iostream file).
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Directly use the underlying std::map instead and avoid
a useless inheritance.
As a nice side-effect Options global object has now a
default c'tor avoiding possible issues with globals
initializations.
Suggested by Rein Halbersma.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Return a reference instead of void so to enable
chained assignments like
"p = q = Position(...);"
Suggested by Rein Halbersma.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
When a button fires UCIOption::operator=() is called and from
there the on_change() function. Now it happens that in case of
a button the on_change() function resets option's value to
"false" triggering again UCIOption::operator=() that calls again
on_change() and so on in an endless loop that is experienced
by the user as an application hang.
Rework the button logic to fix the issue and also be more clear
about how button works.
Reported by several people working with Scid and tracked down
to the "Clear Hash" UCI button by Steven Atkinson.
Bug recently introduced by 2ef5b4066e.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
So to be done only once at startup and in the (unlikely)
cases that a relevant UCI parameter is changed, instead
of doing it at the beginning of each search.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Introduce 'on change' actions that are triggered as soon as
an UCI option is changed by the GUI. This allows to set hash
size before to start the game, helpful especially on very fast
TC and big TT size.
As a side effect remove the 'button' type option, that now
is managed as a 'check' type.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Greatly improves the usage. User defined conversions
are a novelity for SF, another amazing C++ facility
at work !
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Make a better use of C++ operators overloading to
streamline the APIs.
Also sync polyglot.ini file while there.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Instead of our home grown function to perform a case
insensitive compare on option names as required by UCI
protocol.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Correctly handle uci option names in a case insensitive way.
Alos fix some indentation while there.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
UCI protocol uses "true" and "false" for check and button types,
so store that values instead of "1" and "0", this simplifies a
bit the code.
Also a bit strictier option's type checking in debug mode.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>