![]() Allow for NUMA memory replication for NNUE weights. Bind threads to ensure execution on a specific NUMA node. This patch introduces NUMA memory replication, currently only utilized for the NNUE weights. Along with it comes all machinery required to identify NUMA nodes and bind threads to specific processors/nodes. It also comes with small changes to Thread and ThreadPool to allow easier execution of custom functions on the designated thread. Old thread binding (WinProcGroup) machinery is removed because it's incompatible with this patch. Small changes to unrelated parts of the code were made to ensure correctness, like some classes being made unmovable, raw pointers replaced with unique_ptr. etc. Windows 7 and Windows 10 is partially supported. Windows 11 is fully supported. Linux is fully supported, with explicit exclusion of Android. No additional dependencies. ----------------- A new UCI option `NumaPolicy` is introduced. It can take the following values: ``` system - gathers NUMA node information from the system (lscpu or windows api), for each threads binds it to a single NUMA node none - assumes there is 1 NUMA node, never binds threads auto - this is the default value, depends on the number of set threads and NUMA nodes, will only enable binding on multinode systems and when the number of threads reaches a threshold (dependent on node size and count) [[custom]] - // ':'-separated numa nodes // ','-separated cpu indices // supports "first-last" range syntax for cpu indices, for example '0-15,32-47:16-31,48-63' ``` Setting `NumaPolicy` forces recreation of the threads in the ThreadPool, which in turn forces the recreation of the TT. The threads are distributed among NUMA nodes in a round-robin fashion based on fill percentage (i.e. it will strive to fill all NUMA nodes evenly). Threads are bound to NUMA nodes, not specific processors, because that's our only requirement and the OS can schedule them better. Special care is made that maximum memory usage on systems that do not require memory replication stays as previously, that is, unnecessary copies are avoided. On linux the process' processor affinity is respected. This means that if you for example use taskset to restrict Stockfish to a single NUMA node then the `system` and `auto` settings will only see a single NUMA node (more precisely, the processors included in the current affinity mask) and act accordingly. ----------------- We can't ensure that a memory allocation takes place on a given NUMA node without using libnuma on linux, or using appropriate custom allocators on windows (https://learn.microsoft.com/en-us/windows/win32/memory/allocating-memory-from-a-numa-node), so to avoid complications the current implementation relies on first-touch policy. Due to this we also rely on the memory allocator to give us a new chunk of untouched memory from the system. This appears to work reliably on linux, but results may vary. MacOS is not supported, because AFAIK it's not affected, and implementation would be problematic anyway. Windows is supported since Windows 7 (https://learn.microsoft.com/en-us/windows/win32/api/processtopologyapi/nf-processtopologyapi-setthreadgroupaffinity). Until Windows 11/Server 2022 NUMA nodes are split such that they cannot span processor groups. This is because before Windows 11/Server 2022 it's not possible to set thread affinity spanning processor groups. The splitting is done manually in some cases (required after Windows 10 Build 20348). Since Windows 11/Server 2022 we can set affinites spanning processor group so this splitting is not done, so the behaviour is pretty much like on linux. Linux is supported, **without** libnuma requirement. `lscpu` is expected. ----------------- Passed 60+1 @ 256t 16000MB hash: https://tests.stockfishchess.org/tests/view/6654e443a86388d5e27db0d8 ``` LLR: 2.95 (-2.94,2.94) <0.00,10.00> Total: 278 W: 110 L: 29 D: 139 Ptnml(0-2): 0, 1, 56, 82, 0 ``` Passed SMP STC: https://tests.stockfishchess.org/tests/view/6654fc74a86388d5e27db1cd ``` LLR: 2.95 (-2.94,2.94) <-1.75,0.25> Total: 67152 W: 17354 L: 17177 D: 32621 Ptnml(0-2): 64, 7428, 18408, 7619, 57 ``` Passed STC: https://tests.stockfishchess.org/tests/view/6654fb27a86388d5e27db15c ``` LLR: 2.94 (-2.94,2.94) <-1.75,0.25> Total: 131648 W: 34155 L: 34045 D: 63448 Ptnml(0-2): 426, 13878, 37096, 14008, 416 ``` fixes #5253 closes https://github.com/official-stockfish/Stockfish/pull/5285 No functional change |
||
---|---|---|
.github | ||
scripts | ||
src | ||
tests | ||
.clang-format | ||
.git-blame-ignore-revs | ||
.gitignore | ||
AUTHORS | ||
CITATION.cff | ||
CONTRIBUTING.md | ||
Copying.txt | ||
README.md | ||
Top CPU Contributors.txt |
Stockfish
A free and strong UCI chess engine.
Explore Stockfish docs »
Report bug
·
Open a discussion
·
Discord
·
Blog
Overview
Stockfish is a free and strong UCI chess engine derived from Glaurung 2.1 that analyzes chess positions and computes the optimal moves.
Stockfish does not include a graphical user interface (GUI) that is required to display a chessboard and to make it easy to input moves. These GUIs are developed independently from Stockfish and are available online. Read the documentation for your GUI of choice for information about how to use Stockfish with it.
See also the Stockfish documentation for further usage help.
Files
This distribution of Stockfish consists of the following files:
-
README.md, the file you are currently reading.
-
Copying.txt, a text file containing the GNU General Public License version 3.
-
AUTHORS, a text file with the list of authors for the project.
-
src, a subdirectory containing the full source code, including a Makefile that can be used to compile Stockfish on Unix-like systems.
-
a file with the .nnue extension, storing the neural network for the NNUE evaluation. Binary distributions will have this file embedded.
Contributing
See Contributing Guide.
Donating hardware
Improving Stockfish requires a massive amount of testing. You can donate your hardware resources by installing the Fishtest Worker and viewing the current tests on Fishtest.
Improving the code
In the chessprogramming wiki, many techniques used in Stockfish are explained with a lot of background information. The section on Stockfish describes many features and techniques used by Stockfish. However, it is generic rather than focused on Stockfish's precise implementation.
The engine testing is done on Fishtest. If you want to help improve Stockfish, please read this guideline first, where the basics of Stockfish development are explained.
Discussions about Stockfish take place these days mainly in the Stockfish Discord server. This is also the best place to ask questions about the codebase and how to improve it.
Compiling Stockfish
Stockfish has support for 32 or 64-bit CPUs, certain hardware instructions, big-endian machines such as Power PC, and other platforms.
On Unix-like systems, it should be easy to compile Stockfish directly from the
source code with the included Makefile in the folder src
. In general, it is
recommended to run make help
to see a list of make targets with corresponding
descriptions. An example suitable for most Intel and AMD chips:
cd src
make -j profile-build ARCH=x86-64-avx2
Detailed compilation instructions for all platforms can be found in our documentation. Our wiki also has information about the UCI commands supported by Stockfish.
Terms of use
Stockfish is free and distributed under the GNU General Public License version 3 (GPL v3). Essentially, this means you are free to do almost exactly what you want with the program, including distributing it among your friends, making it available for download from your website, selling it (either by itself or as part of some bigger software package), or using it as the starting point for a software project of your own.
The only real limitation is that whenever you distribute Stockfish in some way, you MUST always include the license and the full source code (or a pointer to where the source code can be found) to generate the exact binary you are distributing. If you make any changes to the source code, these changes must also be made available under GPL v3.