This is a somewhat different patch. It fixes blindspots for
two knights vs pawn endgame.
With local testing starting from random KNNvKP positions where the
pawn has not advanced beyond the 4th rank (thanks @protonspring !)
at 15+0.15 (4 cores), this went +105=868-27 against master. All except
two losses were won in reverse.
The heuristic is simple but effective - the strategy in these endgames
is to push the opposing king to the corner, then move the knight that's
blocking the pawn in for the checkmate while the pawn is free to move
and prevents stalemate. This patch gives SF the little boost it needs
to search the relevant king-cornering mating lines.
See the discussion in pull request 1939 for some more good results for
this test in independant tests:
https://github.com/official-stockfish/Stockfish/pull/1939
Bench: 3310239
This is a functional simplification of the Outposts array
moving it to a single value. This is a duplicate PR because
I couldn't figure out how to fix the original one.
The idea is from @31m059 with formatting recommendations by @snicolet.
See #1940 for additional information.
STC
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 23933 W: 5279 L: 5162 D: 13492
http://tests.stockfishchess.org/tests/view/5c3575800ebc596a450c5ecb
LTC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 41718 W: 6919 L: 6831 D: 27968
http://tests.stockfishchess.org/tests/view/5c358c440ebc596a450c6117
bench 3783543
This is non-functional. These 5 arrays are simple enough to calculate real-time and maintaining an array for them does not help. Decreases the memory footprint.
This seems a tiny bit slower on my machine, but passed STC well enough. Could someone verify speed?
STC
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 44745 W: 9780 L: 9704 D: 25261
http://tests.stockfishchess.org/tests/view/5c47aa2d0ebc5902bca13fc4
The slowdown is minimal even in 32 bit case (thanks to @mstembera for testing):
Compiled using make build ARCH=x86-32 CXX=i686-w64-mingw32-c++ and benched
This patch only:
```
Results for 40 tests for each version:
Base Test Diff
Mean 1455204 1450033 5171
StDev 49452 34533 59621
p-value: 0.465
speedup: -0.004
```
No functional change.
A simple idea, but it makes sense: in current master the search is extended
for checks that are considered somewhat safe, and for for this we use the
static exchange evaluation which only considers the `to_sq` of a move.
This is not reliable for discovered checks, where another piece is giving
the check and is arguably a more dangerous type of check. Thus, if the check
is a discovered check, the result of SEE is not relevant and can be ignored.
STC:
LLR: 2.96 (-2.94,2.94) [0.50,4.50]
Total: 29370 W: 6583 L: 6274 D: 16513
http://tests.stockfishchess.org/tests/view/5c5062950ebc593af5d4d9b5
LTC:
LLR: 2.95 (-2.94,2.94) [0.00,3.50]
Total: 227341 W: 37972 L: 37165 D: 152204
http://tests.stockfishchess.org/tests/view/5c5094fb0ebc593af5d4dc2c
Bench: 3611854
There was a simplification attempt last week for the tropism
term in king danger, which passed STC but failed LTC. This
was an indirect sign that maybe the tropism factor was sightly
untuned in current master, so we tried to change it from 1/4
to 5/16.
STC:
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 28098 W: 6264 L: 5990 D: 15844
http://tests.stockfishchess.org/tests/view/5c518db60ebc593af5d4e306
LTC:
LLR: 2.95 (-2.94,2.94) [0.00,3.50]
Total: 103709 W: 17387 L: 16923 D: 69399
http://tests.stockfishchess.org/tests/view/5c52a5510ebc592fc7baea8b
Bench: 4016000
Remove overlapping safe checks from kingdanger:
- rook and queen checks from the same square: rook check is preferred
- bishop and queen checks form the same square: queen check is preferred
Increase bishop and rook check values as a compensation.
STC
LLR: 2.95 (-2.94,2.94) [0.50,4.50]
Total: 27480 W: 6111 L: 5813 D: 15556
http://tests.stockfishchess.org/tests/view/5c521d050ebc593af5d4e66a
LTC
LLR: 2.95 (-2.94,2.94) [0.00,3.50]
Total: 78500 W: 13145 L: 12752 D: 52603
http://tests.stockfishchess.org/tests/view/5c52b9460ebc592fc7baecc5
Closes https://github.com/official-stockfish/Stockfish/pull/1983
------------------------------------------
I have quite a few ideas of how to improve this patch.
- actually rethinking it now it will maybe be useful to discount
queen/bishop checks if there is only one square that they can
give check from and it's "occupied" by more valuable check. Right
now count of this squares does not really matter.
- maybe some small extra bonus can be given for overlapping checks.
- some ideas about using popcount() on safechecks can be retried.
- tune this safecheck values since they were more or less randomly handcrafted in this patch.
Bench: 3216489
This patch removes line 875 of search.cpp, which was updating pvHit after IID.
Bench testing at depth 22 shows that line 875 of search.cpp never changes the
value of pvHit at NonPV nodes, while at PV nodes it often changes the value
from true to false (and never the reverse). This is because the definition of
pvHit at line 642 is :
```
pvHit = (ttHit && tte->pv_hit()) || (PvNode && depth > 4 * ONE_PLY);
```
while the assignment after IID omits the ` (PvNode && depth > 4 * ONE_PLY) `
condition. As such, unlike the other two post-IID tte reads, this line of code
does not make SF's state more consistent, but rather introduces an inconsistency
in the definition of pvHit. Indeed, changing line 875 read
```
pvHit = (ttHit && tte->pv_hit()) || (PvNode && depth > 4 * ONE_PLY);
```
to match line 642 is functionally equivalent to removing the line entirely, as
this patch does.
STC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 62756 W: 13787 L: 13746 D: 35223
http://tests.stockfishchess.org/tests/view/5c446c850ebc5902bb5d4b75
LTC
LLR: 3.19 (-2.94,2.94) [-3.00,1.00]
Total: 61900 W: 10179 L: 10111 D: 41610
http://tests.stockfishchess.org/tests/view/5c45bf610ebc5902bb5d5d62
Bench: 3796134
This changes 2 parts with regards to static exchange evaluation.
Currently, we do not allow pinned pieces to recapture if *all* opponent
pinners are still in their starting squares. This changes that to having
a less strict requirement, checking if *any* pinners are still in their
starting square. This makes our SEE give more respect to the pinning
side with regards to exchanges, which makes sense because it helps our
search explore more tactical options.
Furthermore, we change the logic for saving pinners into our state
variable when computing slider_blockers. We will include double pinners,
where two sliders may be looking at the same blocker, a similar concept
to our mobility calculation for sliders in our evaluation section.
Interestingly, I think SEE is the only place where the pinners bitboard
is actually used, so as far as I know there are no other side effects
to this change.
An example and some insights:
White Bf2, Kg1
Black Qe3, Bc5
The move Qg3 will be given the correct value of 0. (Previously < 0)
The move Qd4 will be incorrectly given a value of 0. (Previously < 0)
It seems the tradeoff in search is worth it. Qd4 will likely be pruned
soon by something like probcut anyway, while Qg3 could help us spot
tactics at an earlier depth.
STC:
LLR: 2.96 (-2.94,2.94) [0.50,4.50]
Total: 62162 W: 13879 L: 13408 D: 34875
http://tests.stockfishchess.org/tests/view/5c4ba1a70ebc593af5d49c55
LTC: (Thanks to @alayant)
LLR: 3.40 (-2.94,2.94) [0.00,3.50]
Total: 140285 W: 23416 L: 22825 D: 94044
http://tests.stockfishchess.org/tests/view/5c4bcfba0ebc593af5d49ea8
Bench: 3937213
This patch saves (4-1) * 64 * 64 = 12KiB of cache.
STC
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 176120 W: 38944 L: 38087 D: 99089
http://tests.stockfishchess.org/tests/view/5c4c9f840ebc593af5d4a7ce
LTC
As a pure speed up, I've been informed it should not require LTC.
No functional change
Using point to point instead of a collective improves performance, and might be more flexible for future improvements.
Also corrects the condition for the number elements required to fill the send buffer.
The actual Elo gains depends a bit on the setup used for testing.
8mpi x 32t yields 141 - 102 - 957 ~ 11 Elo
8mpi x 1t yields 70 +- 9 Elo.
stopOnPonderhit is used to stop search quickly on a ponderhit. It is set by mainThread as part of its time management. However, master employs it as a signal between mainThread and the UCI thread. This is not necessary, it is sufficient for the UCI thread to signal that pondering finished, and mainThread should do its usual time-keeping job, and in this case stop immediately.
This patch implements this, removing stopOnPonderHit as an atomic variable from the ThreadPool,
and moving it as a normal variable to mainThread, reducing its scope. In MainThread::check_time() the search is stopped immediately if ponder switches to false, and the variable stopOnPonderHit is set.
Furthermore, ponder has been moved to mainThread, as the variable is only used to exchange signals between the UCI thread and mainThread.
The version has been tested locally (as fishtest doesn't support ponder):
Score of ponderSimp vs master: 2616 - 2528 - 8630 [0.503] 13774
Elo difference: 2.22 +/- 3.54
which indicates no regression.
No functional change.
Small changes in initiative(). For Pawn PSQT, endgame values for d6-e6 and d7-e7 are now symmetric. The MG value of d2 is now smaller than e2 (d2=13, e2=21 now compared to d2=19, e2=16 before). The MG values of h5-h6-h7 also increased so this might encourage stockfish for more h-pawn pushes.
STC
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 81141 W: 17933 L: 17777 D: 45431
http://tests.stockfishchess.org/tests/view/5c4017350ebc5902bb5cf237
LTC
LLR: 2.96 (-2.94,2.94) [0.00,4.00]
Total: 83078 W: 13883 L: 13466 D: 55729
http://tests.stockfishchess.org/tests/view/5c40763f0ebc5902bb5cff09
Bench: 3266398
This is a non-functional simplification that removes the AdjacentFiles array.
This array is simple enough to calculate that the pre-calculated array provides
no benefit. Reduces the memory footprint.
STC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 74839 W: 16390 L: 16373 D: 42076
http://tests.stockfishchess.org/tests/view/5c3d75920ebc596a450cfb67
No functionnal change
If we define dcCandidates with & pawnsNotOn7,
we don't have to & it both times.
This seems more clear to me as well.
Tested for no regression.
STC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 44042 W: 9663 L: 9585 D: 24794
http://tests.stockfishchess.org/tests/view/5c21d9120ebc5902ba12e84d
No functional change.
The new form is likely to trigger a bit more at LTC. Given that LTC
appears to be an improvement, I think that is fine.
The change is not very invasive: it does the same as before, use
potentially less time for moves that are very stable. Most of the
time, the full bonus was given if the bonus was given, so the gradual
part {3, 4, 5} didn't matter much. Whereas previously 'stable' was
expressed as the last 80% of iterations are the same, now I use a
fixed depth (10 iterations). For TCEC style TC, it will presumably
imply some more moves that are played quicker (and thus more time
on the clock when it potentially matters). Note that 10 iterations
of stability means we've been proposing that move for 99.9% of search
time.
passed STC
http://tests.stockfishchess.org/tests/view/5c30d2290ebc596a450c055b
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 70921 W: 15403 L: 15378 D: 40140
passed LTC
http://tests.stockfishchess.org/tests/view/5c31ae240ebc596a450c1881
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 17422 W: 2968 L: 2842 D: 11612
No functional change.
- add cluster info line
- provides basic info on positions received/stored in a cluster run,
useful to judge performance.
- document most cluster functionality in the readme.md
No functional change
Introducing new concept, saving principal lines into the transposition table
to generate a "critical search tree" which we can reuse later for intelligent
pruning/extension decisions.
For instance in this patch we just reduce reduction for these lines. But a lot
of other ideas are possible.
To go further : tune some parameters, how to add or remove lines from the
critical search tree, how to use these lines in search choices, etc.
STC :
LLR: 2.94 (-2.94,2.94) [0.50,4.50]
Total: 59761 W: 13321 L: 12863 D: 33577 +2.23 ELO
http://tests.stockfishchess.org/tests/view/5c34da5d0ebc596a450c53d3
LTC :
LLR: 2.96 (-2.94,2.94) [0.00,3.50]
Total: 26826 W: 4439 L: 4191 D: 18196 +2.9 ELO
http://tests.stockfishchess.org/tests/view/5c35ceb00ebc596a450c65b2
Special thanks to Miguel Lahoz for his help in transposition table in/out.
Bench: 3399866
This was inspired after reading about
[Multi-Cut](https://www.chessprogramming.org/Multi-Cut).
We now do non-singular cut node pruning. The idea is to prune when we
have a "backup plan" in case our expected fail high node does not fail
high on the ttMove.
For singular extensions, we do a search on all other moves but the
ttMove. If this fails high on our original beta, this means that both
the ttMove, as well as at least one other move was proven to fail high
on a lower depth search. We then assume that one of these moves will
work on a higher depth and prune.
STC:
LLR: 2.96 (-2.94,2.94) [0.50,4.50]
Total: 72952 W: 16104 L: 15583 D: 41265
http://tests.stockfishchess.org/tests/view/5c3119640ebc596a450c0be5
LTC:
LLR: 2.95 (-2.94,2.94) [0.00,3.50]
Total: 27103 W: 4564 L: 4314 D: 18225
http://tests.stockfishchess.org/tests/view/5c3184c00ebc596a450c1662
Bench: 3145487
Small tweak of parameters, yielding some Elo.
The cluster branch can now be considered to be in good shape. In local testing, it runs stable for >30k games. Performance benefits from an MPI implementation that is able to make asynchronous progress. The code should be run with 1 MPI rank per node, and threaded on the node.
Performance against master has now been measured. Master has been given 1 node with 32 cores/threads in standard SMP, the cluster branch has been given N=2..20 of those nodes, running the corresponding number of MPI processes, each with 32 threads. Time control has been 10s+0.1s, Hash 8MB/core, the book 8moves_v3.pgn, the number of games 400.
```
Score of cluster-2mpix32t vs master-32t: 96 - 27 - 277 [0.586] 400
Elo difference: 60.54 +/- 18.49
Score of cluster-3mpix32t vs master-32t: 101 - 18 - 281 [0.604] 400
Elo difference: 73.16 +/- 17.94
Score of cluster-4mpix32t vs master-32t: 126 - 18 - 256 [0.635] 400
Elo difference: 96.19 +/- 19.68
Score of cluster-5mpix32t vs master-32t: 110 - 5 - 285 [0.631] 400
Elo difference: 93.39 +/- 17.09
Score of cluster-6mpix32t vs master-32t: 117 - 9 - 274 [0.635] 400
Elo difference: 96.19 +/- 18.06
Score of cluster-7mpix32t vs master-32t: 142 - 10 - 248 [0.665] 400
Elo difference: 119.11 +/- 19.89
Score of cluster-8mpix32t vs master-32t: 125 - 14 - 261 [0.639] 400
Elo difference: 99.01 +/- 19.18
Score of cluster-9mpix32t vs master-32t: 137 - 7 - 256 [0.662] 400
Elo difference: 117.16 +/- 19.20
Score of cluster-10mpix32t vs master-32t: 145 - 8 - 247 [0.671] 400
Elo difference: 124.01 +/- 19.86
Score of cluster-16mpix32t vs master-32t: 153 - 6 - 241 [0.684] 400
Elo difference: 133.95 +/- 20.17
Score of cluster-20mpix32t vs master-32t: 134 - 8 - 258 [0.657] 400
Elo difference: 113.29 +/- 19.11
```
As the cluster parallelism is essentially lazyMPI, the nodes per second has been verified to scale perfectly to large node counts. Unfortunately, that is not necessarily indicative of playing strength. In the following 2min search from startPos, we reach about 4.8Gnps (128 nodes).
```
info depth 38 seldepth 51 multipv 1 score cp 53 nodes 576165794092 nps 4801341606 hashfull 1000 tbhits 0 time 120001 pv e2e4 c7c5 g1f3 d7d6 f1b5 c8d7 b5d7 d8d7 c2c4 b8c6 b1c3 g8f6 d2d4 d7g4 d4d5 c6d4 f3d4 g4d1 e1d1 c5d4 c3b5 a8c8 b2b3 a7a6 b5d4 f6e4 d1e2 g7g6 c1e3 f8g7 a1c1 e4c5 f2f3 f7f5 h1d1 e8g8 d4c2 c5d7 a2a4 a6a5 e3d4 f5f4 d4f2 f8f7 h2h3 d7c5
```
This addresses partially issue #1911 in that it documents in our
Readme the command that users can use to verifying the md5sum of
their downloaded tablebase files.
Additionally, a quick check of the file size (the size of each
tablebase file modulo 64 is 16 as pointed out by @syzygy1) has been
implemented at launch time in Stockfish.
Closes https://github.com/official-stockfish/Stockfish/pull/1927
and https://github.com/official-stockfish/Stockfish/issues/1911
No functional change.
Fixes one TODO, by moving the IO related to bestmove to the root, even if this move is found by a different rank.
This is needed to make sure IO from different ranks is ordered properly. If this is not done it is possible that e.g. a bestmove arrives before all info lines have been received, leading to output that confuses tools and humans alike (see e.g. https://github.com/cutechess/cutechess/issues/472)
Delay legality check of castling moves at search time,
just before making the move, as is the standard with all
the other move types.
This should avoid an useless and not trivial legality check
when the castling is then not tried later. For instance due
to a previous cut-off.
The patch is also a big simplification and allows to entirely
remove generate_castling()
Bench changes due to a different move sequence out of MovePicker.
STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 45073 W: 9918 L: 9843 D: 25312
http://tests.stockfishchess.org/tests/view/5c2f176f0ebc596a450bdfb3
LTC:
LLR: 3.15 (-2.94,2.94) [-3.00,1.00]
Total: 10156 W: 1707 L: 1560 D: 6889
http://tests.stockfishchess.org/tests/view/5c2e7dfd0ebc596a450bcdf4
Verified with perft both in standard and Chess960 cases.
Closes https://github.com/official-stockfish/Stockfish/pull/1929
Bench: 3559104
This rewrites in part the message passing part, using in place gather, and collecting, rather than merging, the data of all threads.
neutral with a single thread per rank:
Score of new-2mpi-1t vs old-2mpi-1t: 789 - 787 - 2615 [0.500] 4191
Elo difference: 0.17 +/- 6.44
likely progress with multiple threads per rank:
Score of new-2mpi-36t vs old-2mpi-36t: 76 - 53 - 471 [0.519] 600
Elo difference: 13.32 +/- 12.85
A single popcount in evaluate.cpp replaces all openFiles stuff in pawns. It doesn't seem to affect performance at all.
STC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 28103 W: 6134 L: 6025 D: 15944
http://tests.stockfishchess.org/tests/view/5b7d70a20ebc5902bdbb1999
No functional change.
This custom predicate filter creates an unnecessary abstraction layer, but doesn't make the code any more readable. The code is clear enough without it.
No functional change.
The extra condition is used as a shortcut to skip the following 3 assignments:
```C++
Bitboard b1 = shift<UpRight>(pawnsOn7) & enemies;
Bitboard b2 = shift<UpLeft >(pawnsOn7) & enemies;
Bitboard b3 = shift<Up >(pawnsOn7) & emptySquares;
```
In case of EVASION with no target on 8th rank (the common case), we end up performing the 3 statements for nothing because b1 = b2 = b3 = 0.
But this is just a small micro-optimization and the condition is quite confusing, so just remove it and prefer a readable code instead.
STC
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 78020 W: 16978 L: 16967 D: 44075
http://tests.stockfishchess.org/tests/view/5c27b4fe0ebc5902ba135bb0
No functional change.
This generalizes exchange of signals between the ranks using a non-blocking all-reduce. It is now used for the stop signal and the node count, but should be easily generalizable (TB hits, and ponder still missing). It avoids having long-lived outstanding non-blocking collectives (removes an early posted Ibarrier). A bit too short a test, but not worse than before:
Score of new-r4-1t vs old-r4-1t: 459 - 401 - 1505 [0.512] 2365
Elo difference: 8.52 +/- 8.43
use a counter to track available elements.
Some elo gain, on 4 ranks:
Score of old-r4-1t vs new-r4-1t: 422 - 508 - 1694 [0.484] 2624
Elo difference: -11.39 +/- 7.90