1
0
Fork 0
mirror of https://github.com/sockspls/badfish synced 2025-05-01 17:19:36 +00:00
Commit graph

4755 commits

Author SHA1 Message Date
Miguel Lahoz
242c566c1a Change pinning logic in Static Exchange Evaluation (SEE)
This changes 2 parts with regards to static exchange evaluation.

Currently, we do not allow pinned pieces to recapture if *all* opponent
pinners are still in their starting squares. This changes that to having
a less strict requirement, checking if *any* pinners are still in their
starting square. This makes our SEE give more respect to the pinning
side with regards to exchanges, which makes sense because it helps our
search explore more tactical options.

Furthermore, we change the logic for saving pinners into our state
variable when computing slider_blockers. We will include double pinners,
where two sliders may be looking at the same blocker, a similar concept
to our mobility calculation for sliders in our evaluation section.
Interestingly, I think SEE is the only place where the pinners bitboard
is actually used, so as far as I know there are no other side effects
to this change.

An example and some insights:

White Bf2, Kg1
Black Qe3, Bc5

The move Qg3 will be given the correct value of 0. (Previously < 0)
The move Qd4 will be incorrectly given a value of 0. (Previously < 0)

It seems the tradeoff in search is worth it. Qd4 will likely be pruned
soon by something like probcut anyway, while Qg3 could help us spot
tactics at an earlier depth.

STC:
LLR: 2.96 (-2.94,2.94) [0.50,4.50]
Total: 62162 W: 13879 L: 13408 D: 34875
http://tests.stockfishchess.org/tests/view/5c4ba1a70ebc593af5d49c55

LTC: (Thanks to @alayant)
LLR: 3.40 (-2.94,2.94) [0.00,3.50]
Total: 140285 W: 23416 L: 22825 D: 94044
http://tests.stockfishchess.org/tests/view/5c4bcfba0ebc593af5d49ea8

Bench: 3937213
2019-01-29 17:32:41 +01:00
Maciej Żenczykowski
8df1cd10df Use int8_t instead of int for SquareDistance[]
This patch saves (4-1) * 64 * 64 = 12KiB of cache.

STC
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 176120 W: 38944 L: 38087 D: 99089
http://tests.stockfishchess.org/tests/view/5c4c9f840ebc593af5d4a7ce

LTC
As a pure speed up, I've been informed it should not require LTC.

No functional change
2019-01-29 17:26:24 +01:00
Joost VandeVondele
bf17a410ec [Cluster] Use a sendrecv ring instead of allgather
Using point to point instead of a collective improves performance, and might be more flexible for future improvements.
Also corrects the condition for the number elements required to fill the send buffer.

The actual Elo gains depends a bit on the setup used for testing.

8mpi x 32t yields 141 - 102 - 957 ~ 11 Elo
8mpi x 1t yields 70 +- 9 Elo.
2019-01-24 10:39:24 +01:00
protonspring
2d0af36753 Simplify TrappedRook
Simplified TrappedRook to a single penalty removing the dependency on mobility.

STC
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 106718 W: 23530 L: 23577 D: 59611
http://tests.stockfishchess.org/tests/view/5c43f6bd0ebc5902bb5d4131

LTC
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 54053 W: 8890 L: 8822 D: 36341
http://tests.stockfishchess.org/tests/view/5c44932a0ebc5902bb5d4d59

bench 3665090
2019-01-22 09:54:10 +01:00
Joost VandeVondele
58d3ee6175 Simplify pondering time management (#1899)
stopOnPonderhit is used to stop search quickly on a ponderhit. It is set by mainThread as part of its time management. However, master employs it as a signal between mainThread and the UCI thread. This is not necessary, it is sufficient for the UCI thread to signal that pondering finished, and mainThread should do its usual time-keeping job, and in this case stop immediately.

This patch implements this, removing stopOnPonderHit as an atomic variable from the ThreadPool,
and moving it as a normal variable to mainThread, reducing its scope. In MainThread::check_time() the search is stopped immediately if ponder switches to false, and the variable stopOnPonderHit is set.

Furthermore, ponder has been moved to mainThread, as the variable is only used to exchange signals between the UCI thread and mainThread.

The version has been tested locally (as fishtest doesn't support ponder):

Score of ponderSimp vs master: 2616 - 2528 - 8630 [0.503] 13774
Elo difference: 2.22 +/- 3.54

which indicates no regression.

No functional change.
2019-01-20 19:14:24 +01:00
marotear
59b2486bc3 Simplify pvHit (#1953)
Removing unnecessary excludedMove condition (there is not excluded move for PvNodes) and re-ordering computation.

Non functional change.
2019-01-20 12:24:03 +01:00
protonspring
691a287bfe Clean-up some shifting in space calculation (#1955)
No functional change.
2019-01-20 12:21:16 +01:00
Jonathan D
3acacf8471 Tweak initiative and Pawn PSQT (#1957)
Small changes in initiative(). For Pawn PSQT, endgame values for d6-e6 and d7-e7 are now symmetric. The MG value of d2 is now smaller than e2 (d2=13, e2=21 now compared to d2=19, e2=16 before). The MG values of h5-h6-h7 also increased so this might encourage stockfish for more h-pawn pushes.

STC
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 81141 W: 17933 L: 17777 D: 45431
http://tests.stockfishchess.org/tests/view/5c4017350ebc5902bb5cf237

LTC
LLR: 2.96 (-2.94,2.94) [0.00,4.00]
Total: 83078 W: 13883 L: 13466 D: 55729
http://tests.stockfishchess.org/tests/view/5c40763f0ebc5902bb5cff09

Bench: 3266398
2019-01-20 12:20:21 +01:00
protonspring
3300517ecb Remove AdjacentFiles
This is a non-functional simplification that removes the AdjacentFiles array.
This array is simple enough to calculate that the pre-calculated array provides
no benefit. Reduces the memory footprint.

STC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 74839 W: 16390 L: 16373 D: 42076
http://tests.stockfishchess.org/tests/view/5c3d75920ebc596a450cfb67

No functionnal change
2019-01-17 08:11:09 +01:00
Joost VandeVondele
5e7777e9d0 [Cluster] adds missing line
one-liner fixes a merge error, resulting in a garbage output line. No influence on play.
2019-01-17 08:06:25 +01:00
protonspring
3732c55c18 Simplify pawn moves (#1900)
If we define dcCandidates with & pawnsNotOn7, 
we don't have to & it both times.

This seems more clear to me as well.

Tested for no regression.
STC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 44042 W: 9663 L: 9585 D: 24794
http://tests.stockfishchess.org/tests/view/5c21d9120ebc5902ba12e84d

No functional change.
2019-01-14 15:03:31 +01:00
Joost VandeVondele
230fb6e9ad Simplify time management a bit
The new form is likely to trigger a bit more at LTC. Given that LTC
appears to be an improvement, I think that is fine.

The change is not very invasive: it does the same as before, use
potentially less time for moves that are very stable. Most of the
time, the full bonus was given if the bonus was given, so the gradual
part {3, 4, 5} didn't matter much. Whereas previously 'stable' was
expressed as the last 80% of iterations are the same, now I use a
fixed depth (10 iterations). For TCEC style TC, it will presumably
imply some more moves that are played quicker (and thus more time
on the clock when it potentially matters). Note that 10 iterations
of stability means we've been proposing that move for 99.9% of search
time.

passed STC
http://tests.stockfishchess.org/tests/view/5c30d2290ebc596a450c055b
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 70921 W: 15403 L: 15378 D: 40140

passed LTC
http://tests.stockfishchess.org/tests/view/5c31ae240ebc596a450c1881
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 17422 W: 2968 L: 2842 D: 11612

No functional change.
2019-01-14 09:25:22 +01:00
Joost VandeVondele
10a920d7d7 [cluster] Improve user documentation
- add cluster info line
- provides basic info on positions received/stored in a cluster run,
  useful to judge performance.
- document most cluster functionality in the readme.md

No functional change
2019-01-14 09:11:33 +01:00
Joost VandeVondele
5446e6f408 Remove pvExact
The variable pvExact now overlaps with the pvHit concept. So you simplify
the logic with small code tweaks to have pvHit trigger where pvExact
previously triggered.

passed STC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 20558 W: 4497 L: 4373 D: 11688
http://tests.stockfishchess.org/tests/view/5c36e9fd0ebc596a450c7885

passed LTC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 23482 W: 3888 L: 3772 D: 15822
http://tests.stockfishchess.org/tests/view/5c37072d0ebc596a450c7a52

Bench: 3739723
2019-01-10 16:46:04 +01:00
mstembera
d07e782e22 Minor cleanup to recent 'Flag critical search tree in hash table' patch
No functional change
2019-01-10 16:36:59 +01:00
Joost VandeVondele
21819b7bf8 Merge branch 'master' into clusterMergeMaster3 2019-01-09 21:52:30 +01:00
Joost VandeVondele
d2acdac101 Small improvements to the CI infrastructure
- avoid inlining for the debug testing so that suppressions work
- provide more output for triggered errors

No functional change.
2019-01-09 16:57:24 +01:00
MJZ1977
70880b8e24 Flag critical search tree in hash table
Introducing new concept, saving principal lines into the transposition table
to generate a "critical search tree" which we can reuse later for intelligent
pruning/extension decisions.

For instance in this patch we just reduce reduction for these lines. But a lot
of other ideas are possible.

To go further : tune some parameters, how to add or remove lines from the
critical search tree, how to use these lines in search choices, etc.

STC :
LLR: 2.94 (-2.94,2.94) [0.50,4.50]
Total: 59761 W: 13321 L: 12863 D: 33577 +2.23 ELO
http://tests.stockfishchess.org/tests/view/5c34da5d0ebc596a450c53d3

LTC :
LLR: 2.96 (-2.94,2.94) [0.00,3.50]
Total: 26826 W: 4439 L: 4191 D: 18196 +2.9 ELO
http://tests.stockfishchess.org/tests/view/5c35ceb00ebc596a450c65b2

Special thanks to Miguel Lahoz for his help in transposition table in/out.

Bench: 3399866
2019-01-09 15:05:33 +01:00
Miguel Lahoz
f69106f7bb Introduce Multi-Cut
This was inspired after reading about
[Multi-Cut](https://www.chessprogramming.org/Multi-Cut).

We now do non-singular cut node pruning. The idea is to prune when we
have a "backup plan" in case our expected fail high node does not fail
high on the ttMove.

For singular extensions, we do a search on all other moves but the
ttMove. If this fails high on our original beta, this means that both
the ttMove, as well as at least one other move was proven to fail high
on a lower depth search. We then assume that one of these moves will
work on a higher depth and prune.

STC:
LLR: 2.96 (-2.94,2.94) [0.50,4.50]
Total: 72952 W: 16104 L: 15583 D: 41265
http://tests.stockfishchess.org/tests/view/5c3119640ebc596a450c0be5

LTC:
LLR: 2.95 (-2.94,2.94) [0.00,3.50]
Total: 27103 W: 4564 L: 4314 D: 18225
http://tests.stockfishchess.org/tests/view/5c3184c00ebc596a450c1662

Bench: 3145487
2019-01-06 16:02:31 +01:00
Joost VandeVondele
8c4338ae49 [Cluster] Param tweak.
Small tweak of parameters, yielding some Elo.

The cluster branch can now be considered to be in good shape. In local testing, it runs stable for >30k games. Performance benefits from an MPI implementation that is able to make asynchronous progress. The code should be run with 1 MPI rank per node, and threaded on the node.

Performance against master has now been measured. Master has been given 1 node with 32 cores/threads in standard SMP, the cluster branch has been given N=2..20 of those nodes, running the corresponding number of MPI processes, each with 32 threads. Time control has been 10s+0.1s, Hash 8MB/core, the book 8moves_v3.pgn, the number of games 400.

```
Score of cluster-2mpix32t vs master-32t: 96 - 27 - 277  [0.586] 400
Elo difference: 60.54 +/- 18.49

Score of cluster-3mpix32t vs master-32t: 101 - 18 - 281  [0.604] 400
Elo difference: 73.16 +/- 17.94

Score of cluster-4mpix32t vs master-32t: 126 - 18 - 256  [0.635] 400
Elo difference: 96.19 +/- 19.68

Score of cluster-5mpix32t vs master-32t: 110 - 5 - 285  [0.631] 400
Elo difference: 93.39 +/- 17.09

Score of cluster-6mpix32t vs master-32t: 117 - 9 - 274  [0.635] 400
Elo difference: 96.19 +/- 18.06

Score of cluster-7mpix32t vs master-32t: 142 - 10 - 248  [0.665] 400
Elo difference: 119.11 +/- 19.89

Score of cluster-8mpix32t vs master-32t: 125 - 14 - 261  [0.639] 400
Elo difference: 99.01 +/- 19.18

Score of cluster-9mpix32t vs master-32t: 137 - 7 - 256  [0.662] 400
Elo difference: 117.16 +/- 19.20

Score of cluster-10mpix32t vs master-32t: 145 - 8 - 247  [0.671] 400
Elo difference: 124.01 +/- 19.86

Score of cluster-16mpix32t vs master-32t: 153 - 6 - 241  [0.684] 400
Elo difference: 133.95 +/- 20.17

Score of cluster-20mpix32t vs master-32t: 134 - 8 - 258  [0.657] 400
Elo difference: 113.29 +/- 19.11
```

As the cluster parallelism is essentially lazyMPI, the nodes per second has been verified to scale perfectly to large node counts. Unfortunately, that is not necessarily indicative of playing strength. In the following 2min search from startPos, we reach about 4.8Gnps (128 nodes).

```
info depth 38 seldepth 51 multipv 1 score cp 53 nodes 576165794092 nps 4801341606 hashfull 1000 tbhits 0 time 120001 pv e2e4 c7c5 g1f3 d7d6 f1b5 c8d7 b5d7 d8d7 c2c4 b8c6 b1c3 g8f6 d2d4 d7g4 d4d5 c6d4 f3d4 g4d1 e1d1 c5d4 c3b5 a8c8 b2b3 a7a6 b5d4 f6e4 d1e2 g7g6 c1e3 f8g7 a1c1 e4c5 f2f3 f7f5 h1d1 e8g8 d4c2 c5d7 a2a4 a6a5 e3d4 f5f4 d4f2 f8f7 h2h3 d7c5
```
2019-01-06 15:38:31 +01:00
Joost VandeVondele
bb843a00c1 Check tablebase files
This addresses partially issue #1911 in that it documents in our
Readme the command that users can use to verifying the md5sum of
their downloaded tablebase files.

Additionally, a quick check of the file size (the size of each
tablebase file modulo 64 is 16 as pointed out by @syzygy1) has been
implemented at launch time in Stockfish.

Closes https://github.com/official-stockfish/Stockfish/pull/1927
and https://github.com/official-stockfish/Stockfish/issues/1911

No functional change.
2019-01-04 15:36:39 +01:00
Joost VandeVondele
8a3f8e21ae [Cluster] Move IO to the root.
Fixes one TODO, by moving the IO related to bestmove to the root, even if this move is found by a different rank.

This is needed to make sure IO from different ranks is ordered properly. If this is not done it is possible that e.g. a bestmove arrives before all info lines have been received, leading to output that confuses tools and humans alike (see e.g. https://github.com/cutechess/cutechess/issues/472)
2019-01-04 14:56:04 +01:00
Marco Costalba
3c576efa77 Delay castling legality check
Delay legality check of castling moves at search time,
just before making the move, as is the standard with all
the other move types.

This should avoid an useless and not trivial legality check
when the castling is then not tried later. For instance due
to a previous cut-off.

The patch is also a big simplification and allows to entirely
remove generate_castling()

Bench changes due to a different move sequence out of MovePicker.

STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 45073 W: 9918 L: 9843 D: 25312
http://tests.stockfishchess.org/tests/view/5c2f176f0ebc596a450bdfb3

LTC:
LLR: 3.15 (-2.94,2.94) [-3.00,1.00]
Total: 10156 W: 1707 L: 1560 D: 6889
http://tests.stockfishchess.org/tests/view/5c2e7dfd0ebc596a450bcdf4

Verified with perft both in standard and Chess960 cases.

Closes https://github.com/official-stockfish/Stockfish/pull/1929

Bench: 3559104
2019-01-04 14:23:14 +01:00
Joost VandeVondele
267ca781cd Always wait before posting the next call in _sync. 2019-01-02 11:16:24 +01:00
Joost VandeVondele
ac43bef5c5 [Cluster] Improve message passing part.
This rewrites in part the message passing part, using in place gather, and collecting, rather than merging, the data of all threads.

neutral with a single thread per rank:
Score of new-2mpi-1t vs old-2mpi-1t: 789 - 787 - 2615  [0.500] 4191
Elo difference: 0.17 +/- 6.44

likely progress with multiple threads per rank:
Score of new-2mpi-36t vs old-2mpi-36t: 76 - 53 - 471  [0.519] 600
Elo difference: 13.32 +/- 12.85
2019-01-02 11:16:24 +01:00
Marco Costalba
eb6d7f537d
Assorted trivial cleanups (#1894)
To address https://github.com/official-stockfish/Stockfish/issues/1862

No functional change.
2019-01-01 14:10:26 +01:00
protonspring
79c97625a4 Remove openFiles in pawns. (#1917)
A single popcount in evaluate.cpp replaces all openFiles stuff in pawns. It doesn't seem to affect performance at all.

STC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 28103 W: 6134 L: 6025 D: 15944
http://tests.stockfishchess.org/tests/view/5b7d70a20ebc5902bdbb1999

No functional change.
2019-01-01 13:38:09 +01:00
protonspring
7accf07c0b Remove "Any" predicate filter (#1914)
This custom predicate filter creates an unnecessary abstraction layer, but doesn't make the code any more readable. The code is clear enough without it.

No functional change.
2019-01-01 13:36:56 +01:00
protonspring
e2d3c163cb Remove as useless micro-optimization in pawns generation (#1915)
The extra condition is used as a shortcut to skip the following 3 assignments:

```C++
        Bitboard b1 = shift<UpRight>(pawnsOn7) & enemies;
        Bitboard b2 = shift<UpLeft >(pawnsOn7) & enemies;
        Bitboard b3 = shift<Up     >(pawnsOn7) & emptySquares;
```

In case of EVASION with no target on 8th rank (the common case), we end up performing the 3 statements for nothing because b1 = b2 = b3 = 0.

But this is just a small micro-optimization and the condition is quite confusing, so just remove it and prefer a readable code instead.

STC
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 78020 W: 16978 L: 16967 D: 44075
http://tests.stockfishchess.org/tests/view/5c27b4fe0ebc5902ba135bb0

No functional change.
2019-01-01 13:35:53 +01:00
Joost VandeVondele
7a32d26d5f [cluster] keep track of TB hits cluster-wide. 2018-12-29 15:34:57 +01:00
Joost VandeVondele
fb5c1f5bf5 Fix comment 2018-12-29 15:34:57 +01:00
Joost VandeVondele
87f0fa55a0 [cluster] keep track of node counts cluster-wide.
This generalizes exchange of signals between the ranks using a non-blocking all-reduce. It is now used for the stop signal and the node count, but should be easily generalizable (TB hits, and ponder still missing). It avoids having long-lived outstanding non-blocking collectives (removes an early posted Ibarrier). A bit too short a test, but not worse than before:

Score of new-r4-1t vs old-r4-1t: 459 - 401 - 1505  [0.512] 2365
Elo difference: 8.52 +/- 8.43
2018-12-29 15:34:57 +01:00
Joost VandeVondele
2f882309d5 fixup 2018-12-29 15:34:57 +01:00
Joost VandeVondele
86953b9392 [cluster] Fix non-mpi compile
fix compile of the cluster branch in the non-mpi case.

Add a TODO as a reminder for the new voting scheme.

No functional changes
2018-12-29 15:34:56 +01:00
Joost VandeVondele
ba1c639836 [cluster] fill sendbuffer better
use a counter to track available elements.

Some elo gain, on 4 ranks:

Score of old-r4-1t vs new-r4-1t: 422 - 508 - 1694  [0.484] 2624
Elo difference: -11.39 +/- 7.90
2018-12-29 15:34:56 +01:00
Joost VandeVondele
e526c5aa52 [cluster] Make bench compatible
Fix one TODO.

Takes care of output from bench.
Sum nodes over ranks.
2018-12-29 15:34:56 +01:00
Joost VandeVondele
9cd2c817db Add one more TODO 2018-12-29 15:34:56 +01:00
Joost VandeVondele
54a0a228f6 [cluster] Some formatting cleanup
standarize whitespace a bit.
Also adds two TODOs for follow up work.

No functional change.
2018-12-29 15:34:56 +01:00
Joost VandeVondele
1cd2c7861a [cluster] avoid creating MPI data type.
there is no need to make an MPI data type for the sendbuffer, simpler and faster.

No functional change
2018-12-29 15:34:56 +01:00
Joost VandeVondele
7af3f4da7a [cluster] Avoid TT saving our own TT entries.
avoid saving to TT the part of the receive buffer that actually originates from the same rank.

Now, on 1 mpi rank, we have the same bench as the non-mpi code on 1 thread.
2018-12-29 15:34:56 +01:00
Joost VandeVondele
271181bb31 [cluster] Add depth condition to cluster TT saves.
since the logic for saving moves in the sendbuffer and the associated rehashing is expensive, only do it for TT stores of sufficient depth.

quite some gain in local testing with 4 ranks against the previous version.
Elo difference: 288.84 +/- 21.98

This starts to make the branch useful, but for on-node runs, difference remains to the standard threading.
2018-12-29 15:34:56 +01:00
noobpwnftw
66b2c6b9f1 Implement best move voting system for cluster
This implements the cluster version of d96c1c32a2
2018-12-29 15:34:56 +01:00
Joost VandeVondele
2559c20c6e [cluster] Fix oversight in TT key reuse
In the original code, the position key stored in the TT is used to probe&store TT entries after message passing. Since we only store part of the bits in the TT, this leads to incorrect rehashing. This is fixed in this patch storing also the full key in the send buffers, and using that for hashing after message arrival.

Short testing with 4 ranks (old vs new) shows this is effective:
Score of mpiold vs mpinew: 84 - 275 - 265  [0.347] 624
Elo difference: -109.87 +/- 20.88
2018-12-29 15:34:55 +01:00
Joost VandeVondele
2659c407c4 Fix segfault.
the wrong data type was passed to an MPI call, leading to occasional segfaults. This patch fixes this.

No functional change.
2018-12-29 15:34:55 +01:00
noobpwnftw
3730ae1efb Small simplifications and code cleanup
Non-functional simplifications.
2018-12-29 15:34:55 +01:00
noobpwnftw
0d6cdc0c6d Implement yielding loop while waiting for input
Some MPI implementations use busy-wait pooling, which will turn MPI_Bcast into busy-wait loop, workaround with our own yielding loop.
2018-12-29 15:34:55 +01:00
noobpwnftw
80afeb0d3b Fix consistency between PV and bestmove output
In case that a non-root mainThread on a node is the new best thread in the cluster, it should always output its PV.
2018-12-29 15:34:55 +01:00
noobpwnftw
2405b38165 Fix search result aggregation
This reverts my earlier change that only the root node gets to output best move after fixing problem with MPI_Allreduce by our custom operator(BestMoveOp). This function is not commutable and we must ensure that its output is consistent among all nodes.
2018-12-29 15:34:55 +01:00
noobpwnftw
8a95d269eb Implement proper stop signalling from root node
Previous behavior was to wait on all nodes to finish their search on their own TM and aggregate to root node via a blocking MPI_Allreduce call. This seems to be problematic.

In this commit a proper non-blocking signalling barrier was implemented to use TM from root node to control the cluster search, and disable TM on all non-root nodes.

Also includes some cosmetic fix to the nodes/NPS display.
2018-12-29 15:34:55 +01:00
noobpwnftw
3b7b632aa5 Fix a bug of outputting multiple lines of bestmove 2018-12-29 15:34:55 +01:00