Introducing new concept, saving principal lines into the transposition table
to generate a "critical search tree" which we can reuse later for intelligent
pruning/extension decisions.
For instance in this patch we just reduce reduction for these lines. But a lot
of other ideas are possible.
To go further : tune some parameters, how to add or remove lines from the
critical search tree, how to use these lines in search choices, etc.
STC :
LLR: 2.94 (-2.94,2.94) [0.50,4.50]
Total: 59761 W: 13321 L: 12863 D: 33577 +2.23 ELO
http://tests.stockfishchess.org/tests/view/5c34da5d0ebc596a450c53d3
LTC :
LLR: 2.96 (-2.94,2.94) [0.00,3.50]
Total: 26826 W: 4439 L: 4191 D: 18196 +2.9 ELO
http://tests.stockfishchess.org/tests/view/5c35ceb00ebc596a450c65b2
Special thanks to Miguel Lahoz for his help in transposition table in/out.
Bench: 3399866
This was inspired after reading about
[Multi-Cut](https://www.chessprogramming.org/Multi-Cut).
We now do non-singular cut node pruning. The idea is to prune when we
have a "backup plan" in case our expected fail high node does not fail
high on the ttMove.
For singular extensions, we do a search on all other moves but the
ttMove. If this fails high on our original beta, this means that both
the ttMove, as well as at least one other move was proven to fail high
on a lower depth search. We then assume that one of these moves will
work on a higher depth and prune.
STC:
LLR: 2.96 (-2.94,2.94) [0.50,4.50]
Total: 72952 W: 16104 L: 15583 D: 41265
http://tests.stockfishchess.org/tests/view/5c3119640ebc596a450c0be5
LTC:
LLR: 2.95 (-2.94,2.94) [0.00,3.50]
Total: 27103 W: 4564 L: 4314 D: 18225
http://tests.stockfishchess.org/tests/view/5c3184c00ebc596a450c1662
Bench: 3145487
Small tweak of parameters, yielding some Elo.
The cluster branch can now be considered to be in good shape. In local testing, it runs stable for >30k games. Performance benefits from an MPI implementation that is able to make asynchronous progress. The code should be run with 1 MPI rank per node, and threaded on the node.
Performance against master has now been measured. Master has been given 1 node with 32 cores/threads in standard SMP, the cluster branch has been given N=2..20 of those nodes, running the corresponding number of MPI processes, each with 32 threads. Time control has been 10s+0.1s, Hash 8MB/core, the book 8moves_v3.pgn, the number of games 400.
```
Score of cluster-2mpix32t vs master-32t: 96 - 27 - 277 [0.586] 400
Elo difference: 60.54 +/- 18.49
Score of cluster-3mpix32t vs master-32t: 101 - 18 - 281 [0.604] 400
Elo difference: 73.16 +/- 17.94
Score of cluster-4mpix32t vs master-32t: 126 - 18 - 256 [0.635] 400
Elo difference: 96.19 +/- 19.68
Score of cluster-5mpix32t vs master-32t: 110 - 5 - 285 [0.631] 400
Elo difference: 93.39 +/- 17.09
Score of cluster-6mpix32t vs master-32t: 117 - 9 - 274 [0.635] 400
Elo difference: 96.19 +/- 18.06
Score of cluster-7mpix32t vs master-32t: 142 - 10 - 248 [0.665] 400
Elo difference: 119.11 +/- 19.89
Score of cluster-8mpix32t vs master-32t: 125 - 14 - 261 [0.639] 400
Elo difference: 99.01 +/- 19.18
Score of cluster-9mpix32t vs master-32t: 137 - 7 - 256 [0.662] 400
Elo difference: 117.16 +/- 19.20
Score of cluster-10mpix32t vs master-32t: 145 - 8 - 247 [0.671] 400
Elo difference: 124.01 +/- 19.86
Score of cluster-16mpix32t vs master-32t: 153 - 6 - 241 [0.684] 400
Elo difference: 133.95 +/- 20.17
Score of cluster-20mpix32t vs master-32t: 134 - 8 - 258 [0.657] 400
Elo difference: 113.29 +/- 19.11
```
As the cluster parallelism is essentially lazyMPI, the nodes per second has been verified to scale perfectly to large node counts. Unfortunately, that is not necessarily indicative of playing strength. In the following 2min search from startPos, we reach about 4.8Gnps (128 nodes).
```
info depth 38 seldepth 51 multipv 1 score cp 53 nodes 576165794092 nps 4801341606 hashfull 1000 tbhits 0 time 120001 pv e2e4 c7c5 g1f3 d7d6 f1b5 c8d7 b5d7 d8d7 c2c4 b8c6 b1c3 g8f6 d2d4 d7g4 d4d5 c6d4 f3d4 g4d1 e1d1 c5d4 c3b5 a8c8 b2b3 a7a6 b5d4 f6e4 d1e2 g7g6 c1e3 f8g7 a1c1 e4c5 f2f3 f7f5 h1d1 e8g8 d4c2 c5d7 a2a4 a6a5 e3d4 f5f4 d4f2 f8f7 h2h3 d7c5
```
This addresses partially issue #1911 in that it documents in our
Readme the command that users can use to verifying the md5sum of
their downloaded tablebase files.
Additionally, a quick check of the file size (the size of each
tablebase file modulo 64 is 16 as pointed out by @syzygy1) has been
implemented at launch time in Stockfish.
Closes https://github.com/official-stockfish/Stockfish/pull/1927
and https://github.com/official-stockfish/Stockfish/issues/1911
No functional change.
Fixes one TODO, by moving the IO related to bestmove to the root, even if this move is found by a different rank.
This is needed to make sure IO from different ranks is ordered properly. If this is not done it is possible that e.g. a bestmove arrives before all info lines have been received, leading to output that confuses tools and humans alike (see e.g. https://github.com/cutechess/cutechess/issues/472)
Delay legality check of castling moves at search time,
just before making the move, as is the standard with all
the other move types.
This should avoid an useless and not trivial legality check
when the castling is then not tried later. For instance due
to a previous cut-off.
The patch is also a big simplification and allows to entirely
remove generate_castling()
Bench changes due to a different move sequence out of MovePicker.
STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 45073 W: 9918 L: 9843 D: 25312
http://tests.stockfishchess.org/tests/view/5c2f176f0ebc596a450bdfb3
LTC:
LLR: 3.15 (-2.94,2.94) [-3.00,1.00]
Total: 10156 W: 1707 L: 1560 D: 6889
http://tests.stockfishchess.org/tests/view/5c2e7dfd0ebc596a450bcdf4
Verified with perft both in standard and Chess960 cases.
Closes https://github.com/official-stockfish/Stockfish/pull/1929
Bench: 3559104
This rewrites in part the message passing part, using in place gather, and collecting, rather than merging, the data of all threads.
neutral with a single thread per rank:
Score of new-2mpi-1t vs old-2mpi-1t: 789 - 787 - 2615 [0.500] 4191
Elo difference: 0.17 +/- 6.44
likely progress with multiple threads per rank:
Score of new-2mpi-36t vs old-2mpi-36t: 76 - 53 - 471 [0.519] 600
Elo difference: 13.32 +/- 12.85
A single popcount in evaluate.cpp replaces all openFiles stuff in pawns. It doesn't seem to affect performance at all.
STC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 28103 W: 6134 L: 6025 D: 15944
http://tests.stockfishchess.org/tests/view/5b7d70a20ebc5902bdbb1999
No functional change.
This custom predicate filter creates an unnecessary abstraction layer, but doesn't make the code any more readable. The code is clear enough without it.
No functional change.
The extra condition is used as a shortcut to skip the following 3 assignments:
```C++
Bitboard b1 = shift<UpRight>(pawnsOn7) & enemies;
Bitboard b2 = shift<UpLeft >(pawnsOn7) & enemies;
Bitboard b3 = shift<Up >(pawnsOn7) & emptySquares;
```
In case of EVASION with no target on 8th rank (the common case), we end up performing the 3 statements for nothing because b1 = b2 = b3 = 0.
But this is just a small micro-optimization and the condition is quite confusing, so just remove it and prefer a readable code instead.
STC
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 78020 W: 16978 L: 16967 D: 44075
http://tests.stockfishchess.org/tests/view/5c27b4fe0ebc5902ba135bb0
No functional change.
This generalizes exchange of signals between the ranks using a non-blocking all-reduce. It is now used for the stop signal and the node count, but should be easily generalizable (TB hits, and ponder still missing). It avoids having long-lived outstanding non-blocking collectives (removes an early posted Ibarrier). A bit too short a test, but not worse than before:
Score of new-r4-1t vs old-r4-1t: 459 - 401 - 1505 [0.512] 2365
Elo difference: 8.52 +/- 8.43
use a counter to track available elements.
Some elo gain, on 4 ranks:
Score of old-r4-1t vs new-r4-1t: 422 - 508 - 1694 [0.484] 2624
Elo difference: -11.39 +/- 7.90
avoid saving to TT the part of the receive buffer that actually originates from the same rank.
Now, on 1 mpi rank, we have the same bench as the non-mpi code on 1 thread.
since the logic for saving moves in the sendbuffer and the associated rehashing is expensive, only do it for TT stores of sufficient depth.
quite some gain in local testing with 4 ranks against the previous version.
Elo difference: 288.84 +/- 21.98
This starts to make the branch useful, but for on-node runs, difference remains to the standard threading.
In the original code, the position key stored in the TT is used to probe&store TT entries after message passing. Since we only store part of the bits in the TT, this leads to incorrect rehashing. This is fixed in this patch storing also the full key in the send buffers, and using that for hashing after message arrival.
Short testing with 4 ranks (old vs new) shows this is effective:
Score of mpiold vs mpinew: 84 - 275 - 265 [0.347] 624
Elo difference: -109.87 +/- 20.88
This reverts my earlier change that only the root node gets to output best move after fixing problem with MPI_Allreduce by our custom operator(BestMoveOp). This function is not commutable and we must ensure that its output is consistent among all nodes.
Previous behavior was to wait on all nodes to finish their search on their own TM and aggregate to root node via a blocking MPI_Allreduce call. This seems to be problematic.
In this commit a proper non-blocking signalling barrier was implemented to use TM from root node to control the cluster search, and disable TM on all non-root nodes.
Also includes some cosmetic fix to the nodes/NPS display.
Based on Peter Österlund's "Lazy Cluster" algorithm,
but with some simplifications.
To compile, point COMPCXX to the MPI C++ compiler wrapper (mpicxx).
I tried to improve the Readme because many people in my local
chess club do not understand some of the UCO options properly.
Starting point of this was Cfish's Readme by Ronald de Man,
some internet resources and the Stockfish code itself.
Closes https://github.com/official-stockfish/Stockfish/pull/1898
Initial commit by user @erbsenzaehler, with help from users
Adrian Petrescu, @alayan-stk-2 and Elvin Liu.
No functional change
Co-Authored-By: Alayan-stk-2 <alayan-stk-2@users.noreply.github.com>
Co-Authored-By: Adrian Petrescu <apetresc@gmail.com>
Co-Authored-By: Elvin Liu <solarlight2@users.noreply.github.com>
Recent tests by @xoto10, @Vizvezdenec, and myself seemed to hint that Elo could
be gained by expanding the number of cases where king safety is applied. Several
users (@Spliffjiffer, @Vizvezdenec) have anticipated benefits specifically in
evaluation of tactics. It appears that we actually do not need to restrict the
cases in which we initialize and evaluate king safety at all: initializing and
evaluating it in every position appears roughly Elo-neutral at STC and possibly
a substantial Elo gain at LTC.
Any explanation for this scaling is, at this point, conjecture. Assuming it is
not due to chance, my hypothesis is that initialization of king safety in all
positions is a mild slowdown, offset by an Elo gain of evaluating king safety
in all positions. At STC this produces Elo gains and losses that offset each
other, while at longer time control the slowdown is much less important, leaving
only the Elo gain. It probably helps SF to explore king attacks much earlier in
search with high numbers of enemy pieces concentrating but not essentially attacking
king ring.
Thanks to @xoto10 and @Vizvezdenec for helping run my LTC!
Closes https://github.com/official-stockfish/Stockfish/pull/1906
STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 35432 W: 7815 L: 7721 D: 19896
http://tests.stockfishchess.org/tests/view/5c24779d0ebc5902ba131b26
LTC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 12887 W: 2217 L: 2084 D: 8586
http://tests.stockfishchess.org/tests/view/5c25049a0ebc5902ba132586
Bench: 3163951
------------------
How to continue from there?
* Next step will be to tune all the king danger terms once more after that :-)
When iterating through 'SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX' structure, do not use structure member beyond known size.
API is guaranteed to provide us at lease one element upon successful, and no element in the structure can have a zero size.
No functional change.
This pull request fixes a rare crashing bug on Windows inside our NUMA code, first
reported by Dann Corbit in the following forum thread (thanks!):
https://groups.google.com/forum/?fromgroups=#!topic/fishcooking/gA6aoMEuOwg
The fix is to not use structure member beyond known size when iterating through
'SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX' structure. We note that the Microsoft
API is guaranteed to provide us at least one element upon successful, and no
element in the structure can have a zero size.
No functional change.
The `&& (ss-1)->killers[0] ` conditions are there seemingly to protect
accessing ss-5.
This is unneeded and not so intuitive (as the killer is checked for equality
with currentMove, and that one is non-zero once we're high enough in the stack,
this protects access to ss-5). We can just extend the stack from ss-4 to ss-5,
so we can call update_continuation_histories(ss-1, ..) always in search.
This goes a bit further than #1881 and addresses a comment in #1878.
passed STC:
http://tests.stockfishchess.org/tests/view/5c1aa8d50ebc5902ba127ad0
LLR: 3.12 (-2.94,2.94) [-3.00,1.00]
Total: 53515 W: 11734 L: 11666 D: 30115
passed LTC:
http://tests.stockfishchess.org/tests/view/5c1b272c0ebc5902ba12858d
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 140176 W: 23123 L: 23192 D: 93861
Bench: 3451321
Even when playing without endgame table bases, this particular endgame should
be a win 100% of the time when Stockfish is given a KRBK position, assuming
there are enough moves remaining in the FEN to finish the game without hitting
the 50 move rule.
PROBLEM: The issue with master here is that the PushClose difference per square
is 20, however, the difference in squares for the PushToCorners array is usually
less. Thus, the engine prefers to move the kings closer together rather than pushing
the weak king to the correct corner.
What happens is if the weak king is in a safe corner, SF still prefers pushing the
kings together. Occasionally, the strong king traps the weak king in the safe corner.
It takes a while for SF to figure it out, but often draws the game by the 50 move rule
(on shorter time controls).
This patch increases the PushToCorners values to correct this problem. We also added
an assert to catch any overflow problem if anybody would want to increase the array
values again in the future.
It was tested in a couple of matches starting with random KRBK positions and showed
increased winning rates, see https://github.com/official-stockfish/Stockfish/pull/1877
No functional change
* Update our continuous integration machinery
Ubuntu 16.04 can now be used with travis. Updating all the other stuff
when there.
Invoking the lld linker seems to save 5 minutes with clang on linux.
No functional change.
* fix
* Use a bit less code to calculate hashfull(). Change post increment to preincrement as is more standard
in the rest of stockfish. Scale result so we report 100% instead of 99.9% when completely full.
No functional change.
Although this is a compile-time constant, we stick the castlingSide into a CastlingRight, then pull it out again. This seems unecessarily complex.
No functional change.
We do not need to change the winnerKSq variable, so we can simplify
a little bit the logic of the code by changing only the loserKSq
variable when it is necessary. Also consolidate and clarify comments.
See the pull request thread for a proof that the code is correct:
https://github.com/official-stockfish/Stockfish/pull/1854
No functional change
~stronglyProtected is quite similar to ~attackedBy[Them][PAWN] & ~attackedBy2[Them],
the only difference appears to be that the former includes squares attacked twice
by both sides. The resulting logic is simpler, and the change appears to be at least
Elo-neutral at both STC and LTC.
STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 35924 W: 7978 L: 7885 D: 20061
http://tests.stockfishchess.org/tests/view/5c14a5c00ebc5902ba11ed72
LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 37078 W: 6125 L: 6030 D: 24923
http://tests.stockfishchess.org/tests/view/5c14ae880ebc5902ba11eed8
Bench: 3646542