1
0
Fork 0
mirror of https://github.com/sockspls/badfish synced 2025-05-01 01:03:09 +00:00
Commit graph

158 commits

Author SHA1 Message Date
Dubslow
c8213ba0d0 Simplify TT interface and avoid changing TT info
This commit builds on the work and ideas of #5345, #5348, and #5364.

Place as much as possible of the TT implementation in tt.cpp, rather than in the
header.  Some commentary is added to better document the public interface.

Fix the search read-TT races, or at least contain them to within TT methods only.

Passed SMP STC: https://tests.stockfishchess.org/tests/view/666134ab91e372763104b443
LLR: 2.94 (-2.94,2.94) <-1.75,0.25>
Total: 512552 W: 132387 L: 132676 D: 247489
Ptnml(0-2): 469, 58429, 138771, 58136, 471

The unmerged version has bench identical to the other PR (see also #5348) and
therefore those same-functionality tests:

SMP LTC: https://tests.stockfishchess.org/tests/view/665c7021fd45fb0f907c214a
SMP LTC: https://tests.stockfishchess.org/tests/view/665d28a7fd45fb0f907c5495

closes https://github.com/official-stockfish/Stockfish/pull/5369

bench 1205675
2024-06-12 09:17:04 +02:00
Disservin
4f53560d24 Accumulate nodes over all bench positions not just the last
closes https://github.com/official-stockfish/Stockfish/pull/5352

No functional change
2024-06-04 08:26:35 +02:00
Disservin
7f09d06b83 Properly initialize the TT in a multithreaded way again 2024-06-04 07:53:25 +02:00
Disservin
00a28ae325 Add helpers for managing aligned memory
Previously, we had two type aliases, LargePagePtr and AlignedPtr, which
required manually initializing the aligned memory for the pointer.

The new helpers:

- make_unique_aligned
- make_unique_large_page

are now available for allocating aligned memory (with large pages). They
behave similarly to std::make_unique, ensuring objects allocated with
these functions follow RAII.

The old approach had issues with initializing non-trivial types or
arrays of objects. The evaluation function of the network is now a
unique pointer to an array instead of an array of unique pointers.

Memory related functions have been moved into memory.h

Passed High Hash Pressure Test Non-Regression STC:
https://tests.stockfishchess.org/tests/view/665b2b36586058766677cfd2
LLR: 2.93 (-2.94,2.94) <-1.75,0.25>
Total: 476992 W: 122426 L: 122677 D: 231889
Ptnml(0-2): 1145, 51027, 134419, 50744, 1161

Failed Normal Non-Regression STC:
https://tests.stockfishchess.org/tests/view/665b2997586058766677cfc8
LLR: -2.94 (-2.94,2.94) <-1.75,0.25>
Total: 877312 W: 225233 L: 226395 D: 425684
Ptnml(0-2): 2110, 94642, 246239, 93630, 2035

Probably a fluke since there shouldn't be a real slowndown and it has also
passed the high hash pressure test.

closes https://github.com/official-stockfish/Stockfish/pull/5332

No functional change
2024-06-03 23:11:59 +02:00
Tomasz Sobczyk
a169c78b6d Improve performance on NUMA systems
Allow for NUMA memory replication for NNUE weights.  Bind threads to ensure execution on a specific NUMA node.

This patch introduces NUMA memory replication, currently only utilized for the NNUE weights. Along with it comes all machinery required to identify NUMA nodes and bind threads to specific processors/nodes. It also comes with small changes to Thread and ThreadPool to allow easier execution of custom functions on the designated thread. Old thread binding (WinProcGroup) machinery is removed because it's incompatible with this patch. Small changes to unrelated parts of the code were made to ensure correctness, like some classes being made unmovable, raw pointers replaced with unique_ptr. etc.

Windows 7 and Windows 10 is partially supported. Windows 11 is fully supported. Linux is fully supported, with explicit exclusion of Android. No additional dependencies.

-----------------

A new UCI option `NumaPolicy` is introduced. It can take the following values:
```
system - gathers NUMA node information from the system (lscpu or windows api), for each threads binds it to a single NUMA node
none - assumes there is 1 NUMA node, never binds threads
auto - this is the default value, depends on the number of set threads and NUMA nodes, will only enable binding on multinode systems and when the number of threads reaches a threshold (dependent on node size and count)
[[custom]] -
  // ':'-separated numa nodes
  // ','-separated cpu indices
  // supports "first-last" range syntax for cpu indices,
  for example '0-15,32-47:16-31,48-63'
```

Setting `NumaPolicy` forces recreation of the threads in the ThreadPool, which in turn forces the recreation of the TT.

The threads are distributed among NUMA nodes in a round-robin fashion based on fill percentage (i.e. it will strive to fill all NUMA nodes evenly). Threads are bound to NUMA nodes, not specific processors, because that's our only requirement and the OS can schedule them better.

Special care is made that maximum memory usage on systems that do not require memory replication stays as previously, that is, unnecessary copies are avoided.

On linux the process' processor affinity is respected. This means that if you for example use taskset to restrict Stockfish to a single NUMA node then the `system` and `auto` settings will only see a single NUMA node (more precisely, the processors included in the current affinity mask) and act accordingly.

-----------------

We can't ensure that a memory allocation takes place on a given NUMA node without using libnuma on linux, or using appropriate custom allocators on windows (https://learn.microsoft.com/en-us/windows/win32/memory/allocating-memory-from-a-numa-node), so to avoid complications the current implementation relies on first-touch policy. Due to this we also rely on the memory allocator to give us a new chunk of untouched memory from the system. This appears to work reliably on linux, but results may vary.

MacOS is not supported, because AFAIK it's not affected, and implementation would be problematic anyway.

Windows is supported since Windows 7 (https://learn.microsoft.com/en-us/windows/win32/api/processtopologyapi/nf-processtopologyapi-setthreadgroupaffinity). Until Windows 11/Server 2022 NUMA nodes are split such that they cannot span processor groups. This is because before Windows 11/Server 2022 it's not possible to set thread affinity spanning processor groups. The splitting is done manually in some cases (required after Windows 10 Build 20348). Since Windows 11/Server 2022 we can set affinites spanning processor group so this splitting is not done, so the behaviour is pretty much like on linux.

Linux is supported, **without** libnuma requirement. `lscpu` is expected.

-----------------

Passed 60+1 @ 256t 16000MB hash: https://tests.stockfishchess.org/tests/view/6654e443a86388d5e27db0d8
```
LLR: 2.95 (-2.94,2.94) <0.00,10.00>
Total: 278 W: 110 L: 29 D: 139
Ptnml(0-2): 0, 1, 56, 82, 0
```

Passed SMP STC: https://tests.stockfishchess.org/tests/view/6654fc74a86388d5e27db1cd
```
LLR: 2.95 (-2.94,2.94) <-1.75,0.25>
Total: 67152 W: 17354 L: 17177 D: 32621
Ptnml(0-2): 64, 7428, 18408, 7619, 57
```

Passed STC: https://tests.stockfishchess.org/tests/view/6654fb27a86388d5e27db15c
```
LLR: 2.94 (-2.94,2.94) <-1.75,0.25>
Total: 131648 W: 34155 L: 34045 D: 63448
Ptnml(0-2): 426, 13878, 37096, 14008, 416
```

fixes #5253
closes https://github.com/official-stockfish/Stockfish/pull/5285

No functional change
2024-05-28 18:34:15 +02:00
Dubslow
ed79745bb9 Improve comments about DEPTH constants
Also "fix" movepicker to allow depths between CHECKS and NO_CHECKS,
which makes them easier to tweak (not that they get tweaked hardly ever)
(This was more beneficial when there was a third stage to DEPTH_QS, but
it's still an improvement now)

closes https://github.com/official-stockfish/Stockfish/pull/5205

No functional change
2024-05-23 21:29:11 +02:00
Disservin
0a3eb1d8fa Document TT code more
Slight refactor of the TT code with the goal to make it easier to understand / tweak.

Passed Non-Regression STC:
https://tests.stockfishchess.org/tests/view/65d51e401d8e83c78bfdc427
LLR: 2.95 (-2.94,2.94) <-1.75,0.25>
Total: 56416 W: 14750 L: 14550 D: 27116
Ptnml(0-2): 227, 6386, 14796, 6558, 241

closes https://github.com/official-stockfish/Stockfish/pull/5061

No functional change
2024-03-03 15:21:57 +01:00
Disservin
a107910951 Refactor global variables
This aims to remove some of the annoying global structure which Stockfish has.

Overall there is no major elo regression to be expected.

Non regression SMP STC (paused, early version):
https://tests.stockfishchess.org/tests/view/65983d7979aa8af82b9608f1
LLR: 0.23 (-2.94,2.94) <-1.75,0.25>
Total: 76232 W: 19035 L: 19096 D: 38101
Ptnml(0-2): 92, 8735, 20515, 8690, 84

Non regression STC (early version):
https://tests.stockfishchess.org/tests/view/6595b3a479aa8af82b95da7f
LLR: 2.93 (-2.94,2.94) <-1.75,0.25>
Total: 185344 W: 47027 L: 46972 D: 91345
Ptnml(0-2): 571, 21285, 48943, 21264, 609

Non regression SMP STC:
https://tests.stockfishchess.org/tests/view/65a0715c79aa8af82b96b7e4
LLR: 2.94 (-2.94,2.94) <-1.75,0.25>
Total: 142936 W: 35761 L: 35662 D: 71513
Ptnml(0-2): 209, 16400, 38135, 16531, 193

These global structures/variables add hidden dependencies and allow data
to be mutable from where it shouldn't it be (i.e. options). They also
prevent Stockfish from internal selfplay, which would be a nice thing to
be able to do, i.e. instantiate two Stockfish instances and let them
play against each other. It will also allow us to make Stockfish a
library, which can be easier used on other platforms.

For consistency with the old search code, `thisThread` has been kept,
even though it is not strictly necessary anymore. This the first major
refactor of this kind (in recent time), and future changes are required,
to achieve the previously described goals. This includes cleaning up the
dependencies, transforming the network to be self contained and coming
up with a plan to deal with proper tablebase memory management (see
comments for more information on this).

The removal of these global structures has been discussed in parts with
Vondele and Sopel.

closes https://github.com/official-stockfish/Stockfish/pull/4968

No functional change
2024-01-13 19:40:53 +01:00
Disservin
cafbe8e8e8 Change the Move enum to a class
This changes the Move enum to a class, this way
all move related functions can be moved into the class
and be more self contained.

closes https://github.com/official-stockfish/Stockfish/pull/4958

No functional change
2024-01-04 15:51:04 +01:00
Disservin
444f03ee95 Update copyright year
closes https://github.com/official-stockfish/Stockfish/pull/4954

No functional change
2024-01-04 15:47:10 +01:00
Disservin
a105978bbd remove blank line between function and it's description
- remove the blank line between the declaration of the function and it's
  comment, leads to better IDE support when hovering over a function to see it's
  description
- remove the unnecessary duplication of the function name in the functions
  description
- slightly refactored code for lsb, msb in bitboard.h There are still a few
  things we can be improved later on, move the description of a function where
  it was declared (instead of implemented) and add descriptions to functions
  which are behind macros ifdefs

closes https://github.com/official-stockfish/Stockfish/pull/4840

No functional change
2023-10-23 20:39:48 +02:00
Disservin
2d0237db3f add clang-format
This introduces clang-format to enforce a consistent code style for Stockfish.

Having a documented and consistent style across the code will make contributing easier
for new developers, and will make larger changes to the codebase easier to make.

To facilitate formatting, this PR includes a Makefile target (`make format`) to format the code,
this requires clang-format (version 17 currently) to be installed locally.

Installing clang-format is straightforward on most OS and distros
(e.g. with https://apt.llvm.org/, brew install clang-format, etc), as this is part of quite commonly
used suite of tools and compilers (llvm / clang).

Additionally, a CI action is present that will verify if the code requires formatting,
and comment on the PR as needed. Initially, correct formatting is not required, it will be
done by maintainers as part of the merge or in later commits, but obviously this is encouraged.

fixes https://github.com/official-stockfish/Stockfish/issues/3608
closes https://github.com/official-stockfish/Stockfish/pull/4790

Co-Authored-By: Joost VandeVondele <Joost.VandeVondele@gmail.com>
2023-10-22 16:06:27 +02:00
FauziAkram
edb4ab924f Standardize Comments
use double slashes (//) only for comments.

closes #4820

No functional change.
2023-10-21 10:25:03 +02:00
cj5716
fce4cc1829 Make casting styles consistent
Make casting styles consistent with the rest of the code.

closes https://github.com/official-stockfish/Stockfish/pull/4793

No functional change
2023-09-22 19:14:29 +02:00
Disservin
3c0e86a91e Cleanup includes
Reorder a few includes, include "position.h" where it was previously missing
and apply include-what-you-use suggestions. Also make the order of the includes
consistent, in the following way:

1. Related header (for .cpp files)
2. A blank line
3. C/C++ headers
4. A blank line
5. All other header files

closes https://github.com/official-stockfish/Stockfish/pull/4763
fixes https://github.com/official-stockfish/Stockfish/issues/4707

No functional change
2023-09-03 08:24:51 +02:00
Sebastian Buchwald
b60f9cc451 Update copyright years
Happy New Year!

closes https://github.com/official-stockfish/Stockfish/pull/4315

No functional change
2023-01-02 19:07:38 +01:00
Brad Knox
ad926d34c0 Update copyright years
Happy New Year!

closes https://github.com/official-stockfish/Stockfish/pull/3881

No functional change
2022-01-06 15:45:45 +01:00
Dieter Dobbelaere
7ffae17f85 Add Stockfish namespace.
fixes #3350 and is a small cleanup that might make it easier to use SF
in separate projects, like a NNUE trainer or similar.

closes https://github.com/official-stockfish/Stockfish/pull/3370

No functional change.
2021-03-07 14:26:54 +01:00
mattginsberg
573f0e364f Better code for hash table generation
This patch removes some magic numbers in TT bit management and introduce proper
constants in the code, to improve documentation and ease further modifications.

No function change
2021-02-11 22:29:35 +01:00
Joost VandeVondele
c4d67d77c9 Update copyright years
No functional change
2021-01-08 17:04:23 +01:00
Sami Kiminki
485d517c68 Add large page support for NNUE weights and simplify TT mem management
Use TT memory functions to allocate memory for the NNUE weights. This
should provide a small speed-up on systems where large pages are not
automatically used, including Windows and some Linux distributions.

Further, since we now have a wrapper for std::aligned_alloc(), we can
simplify the TT memory management a bit:

- We no longer need to store separate pointers to the hash table and
  its underlying memory allocation.
- We also get to merge the Linux-specific and default implementations
  of aligned_ttmem_alloc().

Finally, we'll enable the VirtualAlloc code path with large page
support also for Win32.

STC: https://tests.stockfishchess.org/tests/view/5f66595823a84a47b9036fba
LLR: 2.94 (-2.94,2.94) {-0.25,1.25}
Total: 14896 W: 1854 L: 1686 D: 11356
Ptnml(0-2): 65, 1224, 4742, 1312, 105

closes https://github.com/official-stockfish/Stockfish/pull/3081

No functional change.
2020-09-21 08:43:48 +02:00
Sami Kiminki
f7b3f0e842 Allow TT entries with key16==0 to be fetched
Fix the issue where a TT entry with key16==0 would always be reported
as a miss. Instead, we'll use depth8 to detect whether the TT entry is
occupied. In order to do that, we'll change DEPTH_OFFSET to -7
(depth8==0) to distinguish between an unoccupied entry and the
otherwise lowest possible depth, i.e., DEPTH_NONE (depth8==1).

To prevent a performance regression, we'll reorder the TT entry fields
by the access order of TranspositionTable::probe(). Memory in general
works fastest when accessed in sequential order. We'll also match the
store order in TTEntry::save() with the entry field order, and
re-order the 'if-or' expressions in TTEntry::save() from the cheapest
to the most expensive.

Finally, as we now have a proper TT entry occupancy test, we'll fix a
minor corner case with hashfull reporting. To reproduce:
- Use a big hash
- Either:
  a. Start 31 very quick searches (this wraparounds generation to 0); or
  b. Force generation of the first search to 0.
- go depth infinite

Before the fix, hashfull would incorrectly report nearly full hash
immediately after the search start, since
TranspositionTable::hashfull() used to consider only the entry
generation and not whether the entry was actually occupied.

STC:
LLR: 2.95 (-2.94,2.94) {-0.25,1.25}
Total: 36848 W: 4091 L: 3898 D: 28859
Ptnml(0-2): 158, 2996, 11972, 3091, 207
https://tests.stockfishchess.org/tests/view/5f3f98d5dc02a01a0c2881f7

LTC:
LLR: 2.95 (-2.94,2.94) {0.25,1.25}
Total: 32280 W: 1828 L: 1653 D: 28799
Ptnml(0-2): 34, 1428, 13051, 1583, 44
https://tests.stockfishchess.org/tests/view/5f3fe77a87a5c3c63d8f5332

closes https://github.com/official-stockfish/Stockfish/pull/3048

Bench: 3760677
2020-08-24 12:03:28 +02:00
nodchip
84f3e86790 Add NNUE evaluation
This patch ports the efficiently updatable neural network (NNUE) evaluation to Stockfish.

Both the NNUE and the classical evaluations are available, and can be used to
assign a value to a position that is later used in alpha-beta (PVS) search to find the
best move. The classical evaluation computes this value as a function of various chess
concepts, handcrafted by experts, tested and tuned using fishtest. The NNUE evaluation
computes this value with a neural network based on basic inputs. The network is optimized
and trained on the evalutions of millions of positions at moderate search depth.

The NNUE evaluation was first introduced in shogi, and ported to Stockfish afterward.
It can be evaluated efficiently on CPUs, and exploits the fact that only parts
of the neural network need to be updated after a typical chess move.
[The nodchip repository](https://github.com/nodchip/Stockfish) provides additional
tools to train and develop the NNUE networks.

This patch is the result of contributions of various authors, from various communities,
including: nodchip, ynasu87, yaneurao (initial port and NNUE authors), domschl, FireFather,
rqs, xXH4CKST3RXx, tttak, zz4032, joergoster, mstembera, nguyenpham, erbsenzaehler,
dorzechowski, and vondele.

This new evaluation needed various changes to fishtest and the corresponding infrastructure,
for which tomtor, ppigazzini, noobpwnftw, daylen, and vondele are gratefully acknowledged.

The first networks have been provided by gekkehenker and sergiovieri, with the latter
net (nn-97f742aaefcd.nnue) being the current default.

The evaluation function can be selected at run time with the `Use NNUE` (true/false) UCI option,
provided the `EvalFile` option points the the network file (depending on the GUI, with full path).

The performance of the NNUE evaluation relative to the classical evaluation depends somewhat on
the hardware, and is expected to improve quickly, but is currently on > 80 Elo on fishtest:

60000 @ 10+0.1 th 1
https://tests.stockfishchess.org/tests/view/5f28fe6ea5abc164f05e4c4c
ELO: 92.77 +-2.1 (95%) LOS: 100.0%
Total: 60000 W: 24193 L: 8543 D: 27264
Ptnml(0-2): 609, 3850, 9708, 10948, 4885

40000 @ 20+0.2 th 8
https://tests.stockfishchess.org/tests/view/5f290229a5abc164f05e4c58
ELO: 89.47 +-2.0 (95%) LOS: 100.0%
Total: 40000 W: 12756 L: 2677 D: 24567
Ptnml(0-2): 74, 1583, 8550, 7776, 2017

At the same time, the impact on the classical evaluation remains minimal, causing no significant
regression:

sprt @ 10+0.1 th 1
https://tests.stockfishchess.org/tests/view/5f2906a2a5abc164f05e4c5b
LLR: 2.94 (-2.94,2.94) {-6.00,-4.00}
Total: 34936 W: 6502 L: 6825 D: 21609
Ptnml(0-2): 571, 4082, 8434, 3861, 520

sprt @ 60+0.6 th 1
https://tests.stockfishchess.org/tests/view/5f2906cfa5abc164f05e4c5d
LLR: 2.93 (-2.94,2.94) {-6.00,-4.00}
Total: 10088 W: 1232 L: 1265 D: 7591
Ptnml(0-2): 49, 914, 3170, 843, 68

The needed networks can be found at https://tests.stockfishchess.org/nns
It is recommended to use the default one as indicated by the `EvalFile` UCI option.

Guidelines for testing new nets can be found at
https://github.com/glinscott/fishtest/wiki/Creating-my-first-test#nnue-net-tests

Integration has been discussed in various issues:
https://github.com/official-stockfish/Stockfish/issues/2823
https://github.com/official-stockfish/Stockfish/issues/2728

The integration branch will be closed after the merge:
https://github.com/official-stockfish/Stockfish/pull/2825
https://github.com/official-stockfish/Stockfish/tree/nnue-player-wip

closes https://github.com/official-stockfish/Stockfish/pull/2912

This will be an exciting time for computer chess, looking forward to seeing the evolution of
this approach.

Bench: 4746616
2020-08-06 16:37:45 +02:00
Joost VandeVondele
ab5cd8340f Small cleanups
closes https://github.com/official-stockfish/Stockfish/pull/2756

No functional change
2020-06-24 22:20:04 +02:00
mstembera
1ea488d34c Use 128 bit multiply for TT index
Remove super cluster stuff from TT and just use a 128 bit multiply.

STC https://tests.stockfishchess.org/tests/view/5ee719b3aae8aec816ab7548
LLR: 2.94 (-2.94,2.94) {-1.50,0.50}
Total: 12736 W: 2502 L: 2333 D: 7901
Ptnml(0-2): 191, 1452, 2944, 1559, 222

LTC https://tests.stockfishchess.org/tests/view/5ee732d1aae8aec816ab7556
LLR: 2.93 (-2.94,2.94) {-1.50,0.50}
Total: 27584 W: 3431 L: 3350 D: 20803
Ptnml(0-2): 173, 2500, 8400, 2511, 208

Scheme back to being derived from https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/

Also the default optimized version of the index calculation now uses fewer instructions.
https://godbolt.org/z/Tktxbv
Might benefit from mulx (requires -mbmi2)

closes https://github.com/official-stockfish/Stockfish/pull/2744

bench: 4320954
2020-06-17 07:32:16 +02:00
Sami Kiminki
4b10578acb Increase the maximum hash size by a factor of 256
Conceptually group hash clusters into super clusters of 256 clusters.
This scheme allows us to use hash sizes up to 32 TB
(= 2^32 super clusters = 2^40 clusters).

Use 48 bits of the Zobrist key to choose the cluster index. We use 8
extra bits to mitigate the quantization error for very large hashes when
scaling the hash key to cluster index.

The hash index computation is organized to be compatible with the existing
scheme for power-of-two hash sizes up to 128 GB.

Fixes https://github.com/official-stockfish/Stockfish/issues/1349

closes https://github.com/official-stockfish/Stockfish/pull/2722

Passed non-regression STC:
LLR: 2.93 (-2.94,2.94) {-1.50,0.50}
Total: 37976 W: 7336 L: 7211 D: 23429
Ptnml(0-2): 578, 4295, 9149, 4356, 610
https://tests.stockfishchess.org/tests/view/5edcbaaef29b40b0fc95abc5

No functional change.
2020-06-09 18:44:07 +02:00
Sami Kiminki
beb327f910 Fix a Windows-only crash on exit without 'quit'
There was a bug in commit d4763424d2
(Add support for Windows large pages) that could result in trying to
free memory allocated with VirtualAlloc incorrectly with free().

Fix this by reverting the TT.resize(0) logic in the previous commit,
and instead, just call aligned_ttmem_free() in
TranspositionTable::~TranspositionTable().

fixes https://github.com/official-stockfish/Stockfish/issues/2677

closes https://github.com/official-stockfish/Stockfish/pull/2679

No functional change
2020-05-14 20:35:40 +02:00
Joost VandeVondele
ddcbacd04d Small cleanups
closes https://github.com/official-stockfish/Stockfish/pull/2567

No functional change.
2020-03-14 17:04:50 +01:00
Sami Kiminki
39437f4e55 Advise the kernel to use huge pages (Linux)
Align the TT allocation by 2M to make it huge page friendly and advise the
kernel to use huge pages.

Benchmarks on my i7-8700K (6C/12T) box: (3 runs per bench per config)

                    vanilla (nps)               hugepages (nps)              avg
==================================================================================
bench             | 3012490  3024364  3036331   3071052  3067544  3071052    +1.5%
bench 16 12 20    | 19237932 19050166 19085315  19266346 19207025 19548758   +1.1%
bench 16384 12 20 | 18182313 18371581 18336838  19381275 19738012 19620225   +7.0%

On my box, huge pages have a significant perf impact when using a big
hash size. They also speed up TT initialization big time:

                                  vanilla (s)  huge pages (s)  speed-up
=======================================================================
time stockfish bench 16384 1 1  | 5.37         1.48            3.6x

In practice, huge pages with auto-defrag may always be enabled in the
system, in which case this patch has no effect. This
depends on the values in /sys/kernel/mm/transparent_hugepage/enabled
and /sys/kernel/mm/transparent_hugepage/defrag.

closes https://github.com/official-stockfish/Stockfish/pull/2463

No functional change
2020-01-27 11:16:10 +01:00
Alain SAVARD
09bef14c76 Update lists of authors and contributors
Preparing for version 11 of Stockfish: update lists of authors,
contributors giving CPU time to the fishtest framework, etc.

No functional change
2020-01-09 01:43:47 +01:00
Brian Sheppard
ca7d4e9ac7 Eliminate ONE_PLY
Simplification that eliminates ONE_PLY, based on a suggestion in the forum that
support for fractional plies has never been used, and @mcostalba's openness to
the idea of eliminating it. We lose a little bit of type safety by making Depth
an integer, but in return we simplify the code in search.cpp quite significantly.

No functional change

------------------------------------------

The argument favoring eliminating ONE_PLY:

* The term “ONE_PLY” comes up in a lot of forum posts (474 to date)
https://groups.google.com/forum/?fromgroups=#!searchin/fishcooking/ONE_PLY%7Csort:relevance

* There is occasionally a commit that breaks invariance of the code
with respect to ONE_PLY
https://groups.google.com/forum/?fromgroups=#!searchin/fishcooking/ONE_PLY%7Csort:date/fishcooking/ZIPdYj6k0fk/KdNGcPWeBgAJ

* To prevent such commits, there is a Travis CI hack that doubles ONE_PLY
and rechecks bench

* Sustaining ONE_PLY has, alas, not resulted in any improvements to the
  engine, despite many individuals testing many experiments over 5 years.

The strongest argument in favor of preserving ONE_PLY comes from @locutus:
“If we use par example ONE_PLY=256 the parameter space is increases by the
factor 256. So it seems very unlikely that the optimal setting is in the
subspace of ONE_PLY=1.”

There is a strong theoretical impediment to fractional depth systems: the
transposition table uses depth to determine when a stored result is good
enough to supply an answer for a current search. If you have fractional
depths, then different pathways to the position can be at fractionally
different depths.

In the end, there are three separate times when a proposal to remove ONE_PLY
was defeated by the suggestion to “give it a few more months.” So… it seems
like time to remove this distraction from the community.

See the pull request here:
https://github.com/official-stockfish/Stockfish/pull/2289
2019-10-06 00:57:00 +02:00
Marco Costalba
d39bc2efa1 Assorted trivial cleanups 5/2019
No functional change.

bench: 4178282
2019-06-09 14:57:08 +02:00
Joost VandeVondele
893a08a8c2 Allow for higher depths. (#2147)
High rootDepths, selDepths and generally searches are increasingly
common with long time control games, analysis, and improving hardware.
In this case, depths of MAX_DEPTH/MAX_PLY (128) can be reached,
and the search tree is truncated.

In principle MAX_PLY can be easily increased, except for a technicality
of storing depths in a signed 8 bit int in the TT. This patch increases
MAX_PLY by storing the depth in an unsigned 8 bit, after shifting by the
most negative depth stored in TT (DEPTH_NONE).

No regression at STC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 42235 W: 9565 L: 9484 D: 23186
http://tests.stockfishchess.org/tests/view/5cdb35360ebc5925cf0595e1

Verified to reach high depths on
k1b5/1p1p4/pP1Pp3/K2pPp2/1P1p1P2/3P1P2/5P2/8 w - -
info depth 142 seldepth 154 multipv 1 score cp 537 nodes 26740713110 ...

No bench change.
2019-05-15 09:52:27 +02:00
Marco Costalba
05f7d59a9a Assorted trivial cleanups 1/2019
To address #1862

No functional change.
2019-02-08 10:20:43 +01:00
MJZ1977
70880b8e24 Flag critical search tree in hash table
Introducing new concept, saving principal lines into the transposition table
to generate a "critical search tree" which we can reuse later for intelligent
pruning/extension decisions.

For instance in this patch we just reduce reduction for these lines. But a lot
of other ideas are possible.

To go further : tune some parameters, how to add or remove lines from the
critical search tree, how to use these lines in search choices, etc.

STC :
LLR: 2.94 (-2.94,2.94) [0.50,4.50]
Total: 59761 W: 13321 L: 12863 D: 33577 +2.23 ELO
http://tests.stockfishchess.org/tests/view/5c34da5d0ebc596a450c53d3

LTC :
LLR: 2.96 (-2.94,2.94) [0.00,3.50]
Total: 26826 W: 4439 L: 4191 D: 18196 +2.9 ELO
http://tests.stockfishchess.org/tests/view/5c35ceb00ebc596a450c65b2

Special thanks to Miguel Lahoz for his help in transposition table in/out.

Bench: 3399866
2019-01-09 15:05:33 +01:00
Stéphane Nicolet
cf5d683408 Stockfish 10-beta
Preparation commit for the upcoming Stockfish 10 version, giving a chance to catch last minute feature bugs and evaluation regression during the one-week code freeze period. Also changing the copyright dates to include 2019.

No functional change
2018-11-19 11:18:21 +01:00
Joost VandeVondele
e7cfa5d020 Simplify saving a TT entry.
Avoid passing TT.generation() to TTEntry::save() at every call,
moving the implementation of TTEntry::save from tt.h to tt.cpp.

tested for no regression:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 53787 W: 11948 L: 11890 D: 29949
http://tests.stockfishchess.org/tests/view/5b2ff37f0ebc5902b2e582fe

Closes https://github.com/official-stockfish/Stockfish/pull/1662

No functional change.
2018-07-04 00:59:15 +02:00
Ronald de Man
759b3c79cf Mark all compile-time constants as constexpr.
To more clearly distinguish them from "const" local variables, this patch
defines compile-time local constants as constexpr. This is consistent with
the definition of PvNode as constexpr in search() and qsearch(). It also
makes the code more robust, since the compiler will now check that those
constants are indeed compile-time constants.

We can go even one step further and define all the evaluation and search
compile-time constants as constexpr.

In generate_castling() I replaced "K" with "step", since K was incorrectly
capitalised (in the Chess960 case).

In timeman.cpp I had to make the non-local constants MaxRatio and StealRatio
constepxr, since otherwise gcc would complain when calculating TMaxRatio and
TStealRatio. (Strangely, I did not have to make Is64Bit constexpr even though
it is used in ucioption.cpp in the calculation of constexpr MaxHashMB.)

I have renamed PieceCount to pieceCount in material.h, since the values of
the array are not compile-time constants.

Some compile-time constants in tbprobe.cpp were overlooked. Sides and MaxFile
are not compile-time constants, so were renamed to sides and maxFile.

Non-functional change.
2018-03-18 23:48:16 +01:00
Joost VandeVondele
9afa1d7330 New Year 2018
Adjust copyright headers.

No functional change.
2018-01-01 13:18:10 +01:00
Joost VandeVondele
2198cd0524 Allow for general transposition table sizes. (#1341)
For efficiency reasons current master only allows for transposition table sizes that are N = 2^k in size, the index computation can be done efficiently as (hash % N) can be written instead as (hash & 2^k - 1). On a typical computer (with 4, 8... etc Gb of RAM), this implies roughly half the RAM is left unused in analysis.

This issue was mentioned on fishcooking by Mindbreaker:
http://tests.stockfishchess.org/tests/view/5a3587de0ebc590ccbb8be04

Recently a neat trick was proposed to map a hash into the range [0,N[ more efficiently than (hash % N) for general N, nearly as efficiently as (hash % 2^k):

https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/

namely computing (hash * N / 2^32) for 32 bit hashes. This patch implements this trick and now allows for general hash sizes. Note that for N = 2^k this just amounts to using a different subset of bits from the hash. Master will use the lower k bits, this trick will use the upper k bits (of the 32 bit hash).

There is no slowdown as measured with [-3, 1] test:

http://tests.stockfishchess.org/tests/view/5a3587de0ebc590ccbb8be04
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 128498 W: 23332 L: 23395 D: 81771

There are two (smaller) caveats:

1) the patch is implemented for a 32 bit hash (so that a 64 bit multiply can be used), this effectively limits the number of clusters that can be used to 2^32 or to 128Gb of transpostion table. That's a change in the maximum allowed TT size, which could bother those using 256Gb or more regularly.

2) Already in master, an excluded move is hashed into the position key in rather simple way, essentially only affecting the lower 16 bits of the key. This is OK in master, since bits 0-15 end up in the index, but not in the new scheme, which picks the higher bits. This is 'fixed' by shifting the excluded move a few bits up. Eventually a better hashing scheme seems wise.

Despite these two caveats, I think this is a nice improvement in usability.

Bench: 5346341
2017-12-18 16:32:21 +01:00
Joost VandeVondele
d8f683760c Adjust copyright headers to 2017 (#965)
No functional change.
2017-01-11 08:46:29 +01:00
Marco Costalba
0b944c7186 Silence some warnings with MSVC 2013
No functional change.
2016-08-27 12:16:13 +02:00
Marco Costalba
4c5cbb1b14 Make engine ONE_PLY value independent
This non-functional change patch is a deep work to allow SF to be independent
from the actual value of ONE_PLY (currently set to 1). I have verified SF is
now independent for ONE_PLY values 1, 2, 4, 8, 16, 32 and 256.

This patch gives consistency to search code and enables future work, opening
the door to safely tweaking the ONE_PLY value for any reason.

Verified for no speed regression at STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 95643 W: 17728 L: 17737 D: 60178

No functional change.
2016-08-27 09:12:25 +02:00
Guenther Demetz
12eb345ebd Depth margin parameter-tweak in TT-save
Verified that is improvement with multiple threads:

LLR: 2.95 (-2.94,2.94) [0.00,4.00] sprt @ 30+0.3 th 3
Total: 14817 W: 2103 L: 1915 D: 10799

LLR: 2.96 (-2.94,2.94) [0.00,4.00] sprt @ 15+0.15 th 7
Total: 10264 W: 1498 L: 1321 D: 7445

Verified that is not a significant regression with a single thread:

LLR: 2.96 (-2.94,2.94) [-4.00,0.00] sprt @ 60+0.6 th 1
Total: 23975 W: 3294 L: 3210 D: 17471

Resolves #575
2016-01-18 22:04:38 +00:00
ppigazzini
d4af15f682 Update AUTHORS and copyright notice
No functional change

Resolves #555
2016-01-02 09:43:51 +00:00
Marco Costalba
9742fb10fd Update Copyright year
No functional change.

Resolves #554
2016-01-01 10:17:36 +00:00
lucasart
328098d027 Fix TT comment and static_assert()
Comment is based on a misunderstanding of what unaligned memory access is. Here
is an article that explains it very clearly:
https://www.kernel.org/doc/Documentation/unaligned-memory-access.txt

No matter how we define TTEntry or TTCluster, there will never be any unaligned
memory access. This is because the complier knows the alignment rules, and does
the necessary adjustments to make sure unaligned memory access does not occur.

The issue being adressed here has nothing to do with unaligned memory access. It
is about cache performance. In order to achieve best cache performance:
- we prefetch the cacheline as soon as possible.
- we ensure that TT clusters do not spread across two cachelines. If they did,
  we would need to prefetch 2 cachelines, which could hurt cache performance.

Therefore the true conditions to achieve this are:
1/ start adress of TT is cache line aligned. void TranspositionTable::resize()
enforces this.
2/ TT cluster size should *divide* the cache line size. Currently, we pack 2
clusters per cache lines. It used to be 1 before "TT sardines". Does not matter
what the ratio is, all we want is to fit an integer number of clusters per cache
line.

No functional change.

Resolves #506
2015-11-20 23:23:53 -08:00
Marco Costalba
dc3508d157 Fix a comment in TTEntry::save
Comment was slightly incorrect.

No functional change.
2015-10-05 09:16:16 +02:00
mstembera
01fab4d432 Remove unnecessary generation check in TT save
Checking for generation is unnecessary because if the key matches then the entry was probed and refreshed earlier.

STC 2MB
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 57391 W: 10671 L: 10613 D: 36107
http://tests.stockfishchess.org/tests/view/55ef59fa0ebc5976a2d6da5d

LTC 8MB
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 60732 W: 9260 L: 9199 D: 42273
http://tests.stockfishchess.org/tests/view/55ef8fe60ebc5976a2d6da6b

STC 16MB
LLR: 2.95 (-2.94,2.94) [-4.00,0.00]
Total: 23443 W: 4369 L: 4293 D: 14781
http://tests.stockfishchess.org/tests/view/55ef8fe60ebc5976a2d6da6b

No functional change

Resolves #427
2015-09-17 17:13:45 -07:00
Marco Costalba
ee0371f86e Cleanup work in misc.cpp
Also some code style tidy up of latest patches.

Also renamed checkSq -> checkSquares because it
is a bitboard and not a square.

No functional change.
2015-05-10 09:42:26 +02:00