1
0
Fork 0
mirror of https://github.com/sockspls/badfish synced 2025-04-30 00:33:09 +00:00
Commit graph

15 commits

Author SHA1 Message Date
Tomasz Sobczyk
3ac75cd27d Add a standardized benchmark command speedtest.
`speedtest [threads] [hash_MiB] [time_s]`. `threads` default to system concurrency. `hash_MiB` defaults to `threads*128`. `time_s` defaults to 150.

Intended to be used with default parameters, as a stable hardware benchmark.

Example:
```
C:\dev\stockfish-master\src>stockfish.exe speedtest
Stockfish dev-20240928-nogit by the Stockfish developers (see AUTHORS file)
info string Using 16 threads
Warmup position 3/3
Position 258/258
===========================
Version                    : Stockfish dev-20240928-nogit
Compiled by                : g++ (GNUC) 13.2.0 on MinGW64
Compilation architecture   : x86-64-vnni256
Compilation settings       : 64bit VNNI BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT
Compiler __VERSION__ macro : 13.2.0
Large pages                : yes
User invocation            : speedtest
Filled invocation          : speedtest 16 2048 150
Available processors       : 0-15
Thread count               : 16
Thread binding             : none
TT size [MiB]              : 2048
Hash max, avg [per mille]  :
    single search          : 40, 21
    single game            : 631, 428
Total nodes searched       : 2099917842
Total search time [s]      : 153.937
Nodes/second               : 13641410
```

-------------------------------

Small unrelated tweaks:
 - Network verification output is now handled as a callback.
 - TT hashfull queries allow specifying maximum entry age.

closes https://github.com/official-stockfish/Stockfish/pull/5354

No functional change
2024-09-28 18:01:26 +02:00
Disservin
effa246071 Use optional for the engine path
- A small quality of file change is to change the type of engine path
  from a string to an optional string, skips the binary directory
  lookup, which is commonly disabled by people who create wasm builds or
  include stockfish as a library.

closes https://github.com/official-stockfish/Stockfish/pull/5575

No functional change
2024-09-09 18:02:32 +02:00
Tomasz Sobczyk
8e560c4fd3 Replicate network weights only to used NUMA nodes
On a system with multiple NUMA nodes, this patch avoids unneeded replicated
(e.g. 8x for a single threaded run), reducting memory use in that case.

Lazy initialization forced before search.

Passed STC:
https://tests.stockfishchess.org/tests/view/66a28c524ff211be9d4ecdd4
LLR: 2.96 (-2.94,2.94) <-1.75,0.25>
Total: 691776 W: 179429 L: 179927 D: 332420
Ptnml(0-2): 2573, 79370, 182547, 78778, 2620

closes https://github.com/official-stockfish/Stockfish/pull/5515

No functional change
2024-08-03 09:41:37 +02:00
FauziAkram
986173264f Adding LowestElo and HighestElo constants
These values represent the lowest Elo rating in the skill level calculation,
and the highest one, but it's not clear from the code where these values come
from other than the comment.  This should improve code readability and
maintainability. It makes the purpose of the values clear and allows for easy
modification if the Elo range for skill level calculation changes in the
future.  Moved the Skill struct definition from search.cpp to search.h header
file to define the Search::Skill struct, making it accessible from other files.

closes https://github.com/official-stockfish/Stockfish/pull/5508

No functional change
2024-07-23 19:23:57 +02:00
Andyson007
42aae5fe8b Fixed non UCI compliance
print `<empty>` and accept `<empty>` for UCI string options,
accepting empty strings as well. Internally use empty strings (`""`).

closes https://github.com/official-stockfish/Stockfish/pull/5474

No functional change
2024-07-15 13:14:57 +02:00
Joost VandeVondele
ad0f1fecda Move info strings once more
Follow up from #5404 ... current location leads to troubles with Aquarium GUI

Fixes #5430

Now prints the information on threads and available processors at the beginning
of search, where info about the networks is already printed (and is known to
work)

closes https://github.com/official-stockfish/Stockfish/pull/5433

No functional change.
2024-07-03 13:39:31 +02:00
Disservin
7013a22b74 Move options into the engine
Move the engine options into the engine class, also avoid duplicated
initializations after startup.  UCIEngine needs to register an add_listener to
listen to all option changes and print these.  Also avoid a double
initialization of the TT, which was the case with the old state.

closes https://github.com/official-stockfish/Stockfish/pull/5356

No functional change
2024-06-12 09:17:04 +02:00
Tomasz Sobczyk
02ff76630b Add NumaPolicy "hardware" option that bypasses current processor affinity.
Can be used in case a GUI (e.g. ChessBase 17 see #5307) sets affinity to a
single processor group, but the user would like to use the full capabilities of
the hardware.  Improves affinity handling on Windows in case of multiple
available APIs and existing affinities.

closes https://github.com/official-stockfish/Stockfish/pull/5353

No functional change
2024-06-05 21:01:45 +02:00
Tomasz Sobczyk
a169c78b6d Improve performance on NUMA systems
Allow for NUMA memory replication for NNUE weights.  Bind threads to ensure execution on a specific NUMA node.

This patch introduces NUMA memory replication, currently only utilized for the NNUE weights. Along with it comes all machinery required to identify NUMA nodes and bind threads to specific processors/nodes. It also comes with small changes to Thread and ThreadPool to allow easier execution of custom functions on the designated thread. Old thread binding (WinProcGroup) machinery is removed because it's incompatible with this patch. Small changes to unrelated parts of the code were made to ensure correctness, like some classes being made unmovable, raw pointers replaced with unique_ptr. etc.

Windows 7 and Windows 10 is partially supported. Windows 11 is fully supported. Linux is fully supported, with explicit exclusion of Android. No additional dependencies.

-----------------

A new UCI option `NumaPolicy` is introduced. It can take the following values:
```
system - gathers NUMA node information from the system (lscpu or windows api), for each threads binds it to a single NUMA node
none - assumes there is 1 NUMA node, never binds threads
auto - this is the default value, depends on the number of set threads and NUMA nodes, will only enable binding on multinode systems and when the number of threads reaches a threshold (dependent on node size and count)
[[custom]] -
  // ':'-separated numa nodes
  // ','-separated cpu indices
  // supports "first-last" range syntax for cpu indices,
  for example '0-15,32-47:16-31,48-63'
```

Setting `NumaPolicy` forces recreation of the threads in the ThreadPool, which in turn forces the recreation of the TT.

The threads are distributed among NUMA nodes in a round-robin fashion based on fill percentage (i.e. it will strive to fill all NUMA nodes evenly). Threads are bound to NUMA nodes, not specific processors, because that's our only requirement and the OS can schedule them better.

Special care is made that maximum memory usage on systems that do not require memory replication stays as previously, that is, unnecessary copies are avoided.

On linux the process' processor affinity is respected. This means that if you for example use taskset to restrict Stockfish to a single NUMA node then the `system` and `auto` settings will only see a single NUMA node (more precisely, the processors included in the current affinity mask) and act accordingly.

-----------------

We can't ensure that a memory allocation takes place on a given NUMA node without using libnuma on linux, or using appropriate custom allocators on windows (https://learn.microsoft.com/en-us/windows/win32/memory/allocating-memory-from-a-numa-node), so to avoid complications the current implementation relies on first-touch policy. Due to this we also rely on the memory allocator to give us a new chunk of untouched memory from the system. This appears to work reliably on linux, but results may vary.

MacOS is not supported, because AFAIK it's not affected, and implementation would be problematic anyway.

Windows is supported since Windows 7 (https://learn.microsoft.com/en-us/windows/win32/api/processtopologyapi/nf-processtopologyapi-setthreadgroupaffinity). Until Windows 11/Server 2022 NUMA nodes are split such that they cannot span processor groups. This is because before Windows 11/Server 2022 it's not possible to set thread affinity spanning processor groups. The splitting is done manually in some cases (required after Windows 10 Build 20348). Since Windows 11/Server 2022 we can set affinites spanning processor group so this splitting is not done, so the behaviour is pretty much like on linux.

Linux is supported, **without** libnuma requirement. `lscpu` is expected.

-----------------

Passed 60+1 @ 256t 16000MB hash: https://tests.stockfishchess.org/tests/view/6654e443a86388d5e27db0d8
```
LLR: 2.95 (-2.94,2.94) <0.00,10.00>
Total: 278 W: 110 L: 29 D: 139
Ptnml(0-2): 0, 1, 56, 82, 0
```

Passed SMP STC: https://tests.stockfishchess.org/tests/view/6654fc74a86388d5e27db1cd
```
LLR: 2.95 (-2.94,2.94) <-1.75,0.25>
Total: 67152 W: 17354 L: 17177 D: 32621
Ptnml(0-2): 64, 7428, 18408, 7619, 57
```

Passed STC: https://tests.stockfishchess.org/tests/view/6654fb27a86388d5e27db15c
```
LLR: 2.94 (-2.94,2.94) <-1.75,0.25>
Total: 131648 W: 34155 L: 34045 D: 63448
Ptnml(0-2): 426, 13878, 37096, 14008, 416
```

fixes #5253
closes https://github.com/official-stockfish/Stockfish/pull/5285

No functional change
2024-05-28 18:34:15 +02:00
Disservin
be026bdcb2 Clear Workers after changing the network
ensures internal state (e.g. accumulator cache) is consistent with network

closes https://github.com/official-stockfish/Stockfish/pull/5204

No functional change
2024-05-05 12:30:28 +02:00
xoto10
886ed90ec3 Use less time on recaptures
Credit for the idea goes to peregrine on discord.

Passed STC 10+0.1:
https://tests.stockfishchess.org/tests/view/662652623fe04ce4cefc48cf
LLR: 2.95 (-2.94,2.94) <0.00,2.00>
Total: 75712 W: 19793 L: 19423 D: 36496
Ptnml(0-2): 258, 8487, 20023, 8803, 285

Passed LTC 60+0.6:
https://tests.stockfishchess.org/tests/view/6627495e3fe04ce4cefc59b6
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 49788 W: 12743 L: 12404 D: 24641
Ptnml(0-2): 29, 5141, 14215, 5480, 29

The code was updated slightly and tested for non-regression against the
original code at STC:

LLR: 2.94 (-2.94,2.94) <-1.75,0.25>
Total: 41952 W: 10912 L: 10698 D: 20342
Ptnml(0-2): 133, 4825, 10835, 5061, 122
https://tests.stockfishchess.org/tests/view/662d84f56115ff6764c7e438

closes https://github.com/official-stockfish/Stockfish/pull/5189

Bench: 1836777
2024-04-28 21:26:25 +02:00
Disservin
ddd250b9d6 Restore NPS output for Perft
Previously it was possible to also get the node counter after running a bench with perft, i.e.
`./stockfish bench 1 1 5 current perft`, caused by a small regression from the uci refactoring.

```
Nodes searched: 4865609

===========================
Total time (ms) : 18
Nodes searched  : 4865609
Nodes/second    : 270311611
````

closes https://github.com/official-stockfish/Stockfish/pull/5188

No functional change
2024-04-24 18:20:55 +02:00
Disservin
4912f5b0b5 Remove duplicated Position object in UCIEngine
Also fixes searchmoves.

Drop the need of a Position object in uci.cpp.

A side note, it is still required for the static functions,
but these should be moved to a different namespace/class
later on, since sf kinda relies on them.

closes https://github.com/official-stockfish/Stockfish/pull/5169

No functional change
2024-04-12 19:37:39 +02:00
Disservin
9032c6cbe7 Transform search output to engine callbacks
Part 2 of the Split UCI into UCIEngine and Engine refactor.
This creates function callbacks for search to use when an update should occur.
The benching in uci.cpp for example does this to extract the total nodes
searched.

No functional change
2024-04-05 21:03:58 +02:00
Disservin
299707d2c2 Split UCI into UCIEngine and Engine
This is another refactor which aims to decouple uci from stockfish. A new engine
class manages all engine related logic and uci is a "small" wrapper around it.

In the future we should also try to remove the need for the Position object in
the uci and replace the options with an actual options struct instead of using a
map. Also convert the std::string's in the Info structs a string_view.

closes #5147

No functional change
2024-04-04 00:15:17 +02:00