BadFish

mirror of https://github.com/sockspls/badfish synced 2025-05-01 17:19:36 +00:00

Author	SHA1	Message	Date
Miguel Lahoz	242c566c1a	Change pinning logic in Static Exchange Evaluation (SEE) This changes 2 parts with regards to static exchange evaluation. Currently, we do not allow pinned pieces to recapture if all opponent pinners are still in their starting squares. This changes that to having a less strict requirement, checking if any pinners are still in their starting square. This makes our SEE give more respect to the pinning side with regards to exchanges, which makes sense because it helps our search explore more tactical options. Furthermore, we change the logic for saving pinners into our state variable when computing slider_blockers. We will include double pinners, where two sliders may be looking at the same blocker, a similar concept to our mobility calculation for sliders in our evaluation section. Interestingly, I think SEE is the only place where the pinners bitboard is actually used, so as far as I know there are no other side effects to this change. An example and some insights: White Bf2, Kg1 Black Qe3, Bc5 The move Qg3 will be given the correct value of 0. (Previously < 0) The move Qd4 will be incorrectly given a value of 0. (Previously < 0) It seems the tradeoff in search is worth it. Qd4 will likely be pruned soon by something like probcut anyway, while Qg3 could help us spot tactics at an earlier depth. STC: LLR: 2.96 (-2.94,2.94) [0.50,4.50] Total: 62162 W: 13879 L: 13408 D: 34875 http://tests.stockfishchess.org/tests/view/5c4ba1a70ebc593af5d49c55 LTC: (Thanks to @alayant) LLR: 3.40 (-2.94,2.94) [0.00,3.50] Total: 140285 W: 23416 L: 22825 D: 94044 http://tests.stockfishchess.org/tests/view/5c4bcfba0ebc593af5d49ea8 Bench: 3937213	2019-01-29 17:32:41 +01:00
Maciej Żenczykowski	8df1cd10df	Use int8_t instead of int for SquareDistance[] This patch saves (4-1) * 64 * 64 = 12KiB of cache. STC LLR: 2.95 (-2.94,2.94) [0.00,4.00] Total: 176120 W: 38944 L: 38087 D: 99089 http://tests.stockfishchess.org/tests/view/5c4c9f840ebc593af5d4a7ce LTC As a pure speed up, I've been informed it should not require LTC. No functional change	2019-01-29 17:26:24 +01:00
Joost VandeVondele	bf17a410ec	[Cluster] Use a sendrecv ring instead of allgather Using point to point instead of a collective improves performance, and might be more flexible for future improvements. Also corrects the condition for the number elements required to fill the send buffer. The actual Elo gains depends a bit on the setup used for testing. 8mpi x 32t yields 141 - 102 - 957 ~ 11 Elo 8mpi x 1t yields 70 +- 9 Elo.	2019-01-24 10:39:24 +01:00
protonspring	2d0af36753	Simplify TrappedRook Simplified TrappedRook to a single penalty removing the dependency on mobility. STC LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 106718 W: 23530 L: 23577 D: 59611 http://tests.stockfishchess.org/tests/view/5c43f6bd0ebc5902bb5d4131 LTC LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 54053 W: 8890 L: 8822 D: 36341 http://tests.stockfishchess.org/tests/view/5c44932a0ebc5902bb5d4d59 bench 3665090	2019-01-22 09:54:10 +01:00
Joost VandeVondele	58d3ee6175	Simplify pondering time management (#1899 ) stopOnPonderhit is used to stop search quickly on a ponderhit. It is set by mainThread as part of its time management. However, master employs it as a signal between mainThread and the UCI thread. This is not necessary, it is sufficient for the UCI thread to signal that pondering finished, and mainThread should do its usual time-keeping job, and in this case stop immediately. This patch implements this, removing stopOnPonderHit as an atomic variable from the ThreadPool, and moving it as a normal variable to mainThread, reducing its scope. In MainThread::check_time() the search is stopped immediately if ponder switches to false, and the variable stopOnPonderHit is set. Furthermore, ponder has been moved to mainThread, as the variable is only used to exchange signals between the UCI thread and mainThread. The version has been tested locally (as fishtest doesn't support ponder): Score of ponderSimp vs master: 2616 - 2528 - 8630 [0.503] 13774 Elo difference: 2.22 +/- 3.54 which indicates no regression. No functional change.	2019-01-20 19:14:24 +01:00
marotear	59b2486bc3	Simplify pvHit (#1953 ) Removing unnecessary excludedMove condition (there is not excluded move for PvNodes) and re-ordering computation. Non functional change.	2019-01-20 12:24:03 +01:00
protonspring	691a287bfe	Clean-up some shifting in space calculation (#1955 ) No functional change.	2019-01-20 12:21:16 +01:00
Jonathan D	3acacf8471	Tweak initiative and Pawn PSQT (#1957 ) Small changes in initiative(). For Pawn PSQT, endgame values for d6-e6 and d7-e7 are now symmetric. The MG value of d2 is now smaller than e2 (d2=13, e2=21 now compared to d2=19, e2=16 before). The MG values of h5-h6-h7 also increased so this might encourage stockfish for more h-pawn pushes. STC LLR: -2.96 (-2.94,2.94) [0.00,4.00] Total: 81141 W: 17933 L: 17777 D: 45431 http://tests.stockfishchess.org/tests/view/5c4017350ebc5902bb5cf237 LTC LLR: 2.96 (-2.94,2.94) [0.00,4.00] Total: 83078 W: 13883 L: 13466 D: 55729 http://tests.stockfishchess.org/tests/view/5c40763f0ebc5902bb5cff09 Bench: 3266398	2019-01-20 12:20:21 +01:00
protonspring	3300517ecb	Remove AdjacentFiles This is a non-functional simplification that removes the AdjacentFiles array. This array is simple enough to calculate that the pre-calculated array provides no benefit. Reduces the memory footprint. STC LLR: 2.96 (-2.94,2.94) [-3.00,1.00] Total: 74839 W: 16390 L: 16373 D: 42076 http://tests.stockfishchess.org/tests/view/5c3d75920ebc596a450cfb67 No functionnal change	2019-01-17 08:11:09 +01:00
Joost VandeVondele	5e7777e9d0	[Cluster] adds missing line one-liner fixes a merge error, resulting in a garbage output line. No influence on play.	2019-01-17 08:06:25 +01:00
protonspring	3732c55c18	Simplify pawn moves (#1900 ) If we define dcCandidates with & pawnsNotOn7, we don't have to & it both times. This seems more clear to me as well. Tested for no regression. STC LLR: 2.96 (-2.94,2.94) [-3.00,1.00] Total: 44042 W: 9663 L: 9585 D: 24794 http://tests.stockfishchess.org/tests/view/5c21d9120ebc5902ba12e84d No functional change.	2019-01-14 15:03:31 +01:00
Joost VandeVondele	230fb6e9ad	Simplify time management a bit The new form is likely to trigger a bit more at LTC. Given that LTC appears to be an improvement, I think that is fine. The change is not very invasive: it does the same as before, use potentially less time for moves that are very stable. Most of the time, the full bonus was given if the bonus was given, so the gradual part {3, 4, 5} didn't matter much. Whereas previously 'stable' was expressed as the last 80% of iterations are the same, now I use a fixed depth (10 iterations). For TCEC style TC, it will presumably imply some more moves that are played quicker (and thus more time on the clock when it potentially matters). Note that 10 iterations of stability means we've been proposing that move for 99.9% of search time. passed STC http://tests.stockfishchess.org/tests/view/5c30d2290ebc596a450c055b LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 70921 W: 15403 L: 15378 D: 40140 passed LTC http://tests.stockfishchess.org/tests/view/5c31ae240ebc596a450c1881 LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 17422 W: 2968 L: 2842 D: 11612 No functional change.	2019-01-14 09:25:22 +01:00
Joost VandeVondele	10a920d7d7	[cluster] Improve user documentation - add cluster info line - provides basic info on positions received/stored in a cluster run, useful to judge performance. - document most cluster functionality in the readme.md No functional change	2019-01-14 09:11:33 +01:00
Joost VandeVondele	5446e6f408	Remove pvExact The variable pvExact now overlaps with the pvHit concept. So you simplify the logic with small code tweaks to have pvHit trigger where pvExact previously triggered. passed STC: LLR: 2.96 (-2.94,2.94) [-3.00,1.00] Total: 20558 W: 4497 L: 4373 D: 11688 http://tests.stockfishchess.org/tests/view/5c36e9fd0ebc596a450c7885 passed LTC: LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 23482 W: 3888 L: 3772 D: 15822 http://tests.stockfishchess.org/tests/view/5c37072d0ebc596a450c7a52 Bench: 3739723	2019-01-10 16:46:04 +01:00
mstembera	d07e782e22	Minor cleanup to recent 'Flag critical search tree in hash table' patch No functional change	2019-01-10 16:36:59 +01:00
Joost VandeVondele	21819b7bf8	Merge branch 'master' into clusterMergeMaster3	2019-01-09 21:52:30 +01:00
Joost VandeVondele	d2acdac101	Small improvements to the CI infrastructure - avoid inlining for the debug testing so that suppressions work - provide more output for triggered errors No functional change.	2019-01-09 16:57:24 +01:00
MJZ1977	70880b8e24	Flag critical search tree in hash table Introducing new concept, saving principal lines into the transposition table to generate a "critical search tree" which we can reuse later for intelligent pruning/extension decisions. For instance in this patch we just reduce reduction for these lines. But a lot of other ideas are possible. To go further : tune some parameters, how to add or remove lines from the critical search tree, how to use these lines in search choices, etc. STC : LLR: 2.94 (-2.94,2.94) [0.50,4.50] Total: 59761 W: 13321 L: 12863 D: 33577 +2.23 ELO http://tests.stockfishchess.org/tests/view/5c34da5d0ebc596a450c53d3 LTC : LLR: 2.96 (-2.94,2.94) [0.00,3.50] Total: 26826 W: 4439 L: 4191 D: 18196 +2.9 ELO http://tests.stockfishchess.org/tests/view/5c35ceb00ebc596a450c65b2 Special thanks to Miguel Lahoz for his help in transposition table in/out. Bench: 3399866	2019-01-09 15:05:33 +01:00
Miguel Lahoz	f69106f7bb	Introduce Multi-Cut This was inspired after reading about [Multi-Cut](https://www.chessprogramming.org/Multi-Cut). We now do non-singular cut node pruning. The idea is to prune when we have a "backup plan" in case our expected fail high node does not fail high on the ttMove. For singular extensions, we do a search on all other moves but the ttMove. If this fails high on our original beta, this means that both the ttMove, as well as at least one other move was proven to fail high on a lower depth search. We then assume that one of these moves will work on a higher depth and prune. STC: LLR: 2.96 (-2.94,2.94) [0.50,4.50] Total: 72952 W: 16104 L: 15583 D: 41265 http://tests.stockfishchess.org/tests/view/5c3119640ebc596a450c0be5 LTC: LLR: 2.95 (-2.94,2.94) [0.00,3.50] Total: 27103 W: 4564 L: 4314 D: 18225 http://tests.stockfishchess.org/tests/view/5c3184c00ebc596a450c1662 Bench: 3145487	2019-01-06 16:02:31 +01:00
Joost VandeVondele	8c4338ae49	[Cluster] Param tweak. Small tweak of parameters, yielding some Elo. The cluster branch can now be considered to be in good shape. In local testing, it runs stable for >30k games. Performance benefits from an MPI implementation that is able to make asynchronous progress. The code should be run with 1 MPI rank per node, and threaded on the node. Performance against master has now been measured. Master has been given 1 node with 32 cores/threads in standard SMP, the cluster branch has been given N=2..20 of those nodes, running the corresponding number of MPI processes, each with 32 threads. Time control has been 10s+0.1s, Hash 8MB/core, the book 8moves_v3.pgn, the number of games 400. ``` Score of cluster-2mpix32t vs master-32t: 96 - 27 - 277 [0.586] 400 Elo difference: 60.54 +/- 18.49 Score of cluster-3mpix32t vs master-32t: 101 - 18 - 281 [0.604] 400 Elo difference: 73.16 +/- 17.94 Score of cluster-4mpix32t vs master-32t: 126 - 18 - 256 [0.635] 400 Elo difference: 96.19 +/- 19.68 Score of cluster-5mpix32t vs master-32t: 110 - 5 - 285 [0.631] 400 Elo difference: 93.39 +/- 17.09 Score of cluster-6mpix32t vs master-32t: 117 - 9 - 274 [0.635] 400 Elo difference: 96.19 +/- 18.06 Score of cluster-7mpix32t vs master-32t: 142 - 10 - 248 [0.665] 400 Elo difference: 119.11 +/- 19.89 Score of cluster-8mpix32t vs master-32t: 125 - 14 - 261 [0.639] 400 Elo difference: 99.01 +/- 19.18 Score of cluster-9mpix32t vs master-32t: 137 - 7 - 256 [0.662] 400 Elo difference: 117.16 +/- 19.20 Score of cluster-10mpix32t vs master-32t: 145 - 8 - 247 [0.671] 400 Elo difference: 124.01 +/- 19.86 Score of cluster-16mpix32t vs master-32t: 153 - 6 - 241 [0.684] 400 Elo difference: 133.95 +/- 20.17 Score of cluster-20mpix32t vs master-32t: 134 - 8 - 258 [0.657] 400 Elo difference: 113.29 +/- 19.11 ``` As the cluster parallelism is essentially lazyMPI, the nodes per second has been verified to scale perfectly to large node counts. Unfortunately, that is not necessarily indicative of playing strength. In the following 2min search from startPos, we reach about 4.8Gnps (128 nodes). ``` info depth 38 seldepth 51 multipv 1 score cp 53 nodes 576165794092 nps 4801341606 hashfull 1000 tbhits 0 time 120001 pv e2e4 c7c5 g1f3 d7d6 f1b5 c8d7 b5d7 d8d7 c2c4 b8c6 b1c3 g8f6 d2d4 d7g4 d4d5 c6d4 f3d4 g4d1 e1d1 c5d4 c3b5 a8c8 b2b3 a7a6 b5d4 f6e4 d1e2 g7g6 c1e3 f8g7 a1c1 e4c5 f2f3 f7f5 h1d1 e8g8 d4c2 c5d7 a2a4 a6a5 e3d4 f5f4 d4f2 f8f7 h2h3 d7c5 ```	2019-01-06 15:38:31 +01:00
Joost VandeVondele	bb843a00c1	Check tablebase files This addresses partially issue #1911 in that it documents in our Readme the command that users can use to verifying the md5sum of their downloaded tablebase files. Additionally, a quick check of the file size (the size of each tablebase file modulo 64 is 16 as pointed out by @syzygy1) has been implemented at launch time in Stockfish. Closes https://github.com/official-stockfish/Stockfish/pull/1927 and https://github.com/official-stockfish/Stockfish/issues/1911 No functional change.	2019-01-04 15:36:39 +01:00
Joost VandeVondele	8a3f8e21ae	[Cluster] Move IO to the root. Fixes one TODO, by moving the IO related to bestmove to the root, even if this move is found by a different rank. This is needed to make sure IO from different ranks is ordered properly. If this is not done it is possible that e.g. a bestmove arrives before all info lines have been received, leading to output that confuses tools and humans alike (see e.g. https://github.com/cutechess/cutechess/issues/472)	2019-01-04 14:56:04 +01:00
Marco Costalba	3c576efa77	Delay castling legality check Delay legality check of castling moves at search time, just before making the move, as is the standard with all the other move types. This should avoid an useless and not trivial legality check when the castling is then not tried later. For instance due to a previous cut-off. The patch is also a big simplification and allows to entirely remove generate_castling() Bench changes due to a different move sequence out of MovePicker. STC: LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 45073 W: 9918 L: 9843 D: 25312 http://tests.stockfishchess.org/tests/view/5c2f176f0ebc596a450bdfb3 LTC: LLR: 3.15 (-2.94,2.94) [-3.00,1.00] Total: 10156 W: 1707 L: 1560 D: 6889 http://tests.stockfishchess.org/tests/view/5c2e7dfd0ebc596a450bcdf4 Verified with perft both in standard and Chess960 cases. Closes https://github.com/official-stockfish/Stockfish/pull/1929 Bench: 3559104	2019-01-04 14:23:14 +01:00
Joost VandeVondele	267ca781cd	Always wait before posting the next call in _sync.	2019-01-02 11:16:24 +01:00
Joost VandeVondele	ac43bef5c5	[Cluster] Improve message passing part. This rewrites in part the message passing part, using in place gather, and collecting, rather than merging, the data of all threads. neutral with a single thread per rank: Score of new-2mpi-1t vs old-2mpi-1t: 789 - 787 - 2615 [0.500] 4191 Elo difference: 0.17 +/- 6.44 likely progress with multiple threads per rank: Score of new-2mpi-36t vs old-2mpi-36t: 76 - 53 - 471 [0.519] 600 Elo difference: 13.32 +/- 12.85	2019-01-02 11:16:24 +01:00
Marco Costalba	eb6d7f537d	Assorted trivial cleanups (#1894 ) To address https://github.com/official-stockfish/Stockfish/issues/1862 No functional change.	2019-01-01 14:10:26 +01:00
protonspring	79c97625a4	Remove openFiles in pawns. (#1917 ) A single popcount in evaluate.cpp replaces all openFiles stuff in pawns. It doesn't seem to affect performance at all. STC LLR: 2.96 (-2.94,2.94) [-3.00,1.00] Total: 28103 W: 6134 L: 6025 D: 15944 http://tests.stockfishchess.org/tests/view/5b7d70a20ebc5902bdbb1999 No functional change.	2019-01-01 13:38:09 +01:00
protonspring	7accf07c0b	Remove "Any" predicate filter (#1914 ) This custom predicate filter creates an unnecessary abstraction layer, but doesn't make the code any more readable. The code is clear enough without it. No functional change.	2019-01-01 13:36:56 +01:00
protonspring	e2d3c163cb	Remove as useless micro-optimization in pawns generation (#1915 ) The extra condition is used as a shortcut to skip the following 3 assignments: ```C++ Bitboard b1 = shift<UpRight>(pawnsOn7) & enemies; Bitboard b2 = shift<UpLeft >(pawnsOn7) & enemies; Bitboard b3 = shift<Up >(pawnsOn7) & emptySquares; ``` In case of EVASION with no target on 8th rank (the common case), we end up performing the 3 statements for nothing because b1 = b2 = b3 = 0. But this is just a small micro-optimization and the condition is quite confusing, so just remove it and prefer a readable code instead. STC LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 78020 W: 16978 L: 16967 D: 44075 http://tests.stockfishchess.org/tests/view/5c27b4fe0ebc5902ba135bb0 No functional change.	2019-01-01 13:35:53 +01:00
Joost VandeVondele	7a32d26d5f	[cluster] keep track of TB hits cluster-wide.	2018-12-29 15:34:57 +01:00
Joost VandeVondele	fb5c1f5bf5	Fix comment	2018-12-29 15:34:57 +01:00
Joost VandeVondele	87f0fa55a0	[cluster] keep track of node counts cluster-wide. This generalizes exchange of signals between the ranks using a non-blocking all-reduce. It is now used for the stop signal and the node count, but should be easily generalizable (TB hits, and ponder still missing). It avoids having long-lived outstanding non-blocking collectives (removes an early posted Ibarrier). A bit too short a test, but not worse than before: Score of new-r4-1t vs old-r4-1t: 459 - 401 - 1505 [0.512] 2365 Elo difference: 8.52 +/- 8.43	2018-12-29 15:34:57 +01:00
Joost VandeVondele	2f882309d5	fixup	2018-12-29 15:34:57 +01:00
Joost VandeVondele	86953b9392	[cluster] Fix non-mpi compile fix compile of the cluster branch in the non-mpi case. Add a TODO as a reminder for the new voting scheme. No functional changes	2018-12-29 15:34:56 +01:00
Joost VandeVondele	ba1c639836	[cluster] fill sendbuffer better use a counter to track available elements. Some elo gain, on 4 ranks: Score of old-r4-1t vs new-r4-1t: 422 - 508 - 1694 [0.484] 2624 Elo difference: -11.39 +/- 7.90	2018-12-29 15:34:56 +01:00
Joost VandeVondele	e526c5aa52	[cluster] Make bench compatible Fix one TODO. Takes care of output from bench. Sum nodes over ranks.	2018-12-29 15:34:56 +01:00
Joost VandeVondele	9cd2c817db	Add one more TODO	2018-12-29 15:34:56 +01:00
Joost VandeVondele	54a0a228f6	[cluster] Some formatting cleanup standarize whitespace a bit. Also adds two TODOs for follow up work. No functional change.	2018-12-29 15:34:56 +01:00
Joost VandeVondele	1cd2c7861a	[cluster] avoid creating MPI data type. there is no need to make an MPI data type for the sendbuffer, simpler and faster. No functional change	2018-12-29 15:34:56 +01:00
Joost VandeVondele	7af3f4da7a	[cluster] Avoid TT saving our own TT entries. avoid saving to TT the part of the receive buffer that actually originates from the same rank. Now, on 1 mpi rank, we have the same bench as the non-mpi code on 1 thread.	2018-12-29 15:34:56 +01:00
Joost VandeVondele	271181bb31	[cluster] Add depth condition to cluster TT saves. since the logic for saving moves in the sendbuffer and the associated rehashing is expensive, only do it for TT stores of sufficient depth. quite some gain in local testing with 4 ranks against the previous version. Elo difference: 288.84 +/- 21.98 This starts to make the branch useful, but for on-node runs, difference remains to the standard threading.	2018-12-29 15:34:56 +01:00
noobpwnftw	66b2c6b9f1	Implement best move voting system for cluster This implements the cluster version of `d96c1c32a2`	2018-12-29 15:34:56 +01:00
Joost VandeVondele	2559c20c6e	[cluster] Fix oversight in TT key reuse In the original code, the position key stored in the TT is used to probe&store TT entries after message passing. Since we only store part of the bits in the TT, this leads to incorrect rehashing. This is fixed in this patch storing also the full key in the send buffers, and using that for hashing after message arrival. Short testing with 4 ranks (old vs new) shows this is effective: Score of mpiold vs mpinew: 84 - 275 - 265 [0.347] 624 Elo difference: -109.87 +/- 20.88	2018-12-29 15:34:55 +01:00
Joost VandeVondele	2659c407c4	Fix segfault. the wrong data type was passed to an MPI call, leading to occasional segfaults. This patch fixes this. No functional change.	2018-12-29 15:34:55 +01:00
noobpwnftw	3730ae1efb	Small simplifications and code cleanup Non-functional simplifications.	2018-12-29 15:34:55 +01:00
noobpwnftw	0d6cdc0c6d	Implement yielding loop while waiting for input Some MPI implementations use busy-wait pooling, which will turn MPI_Bcast into busy-wait loop, workaround with our own yielding loop.	2018-12-29 15:34:55 +01:00
noobpwnftw	80afeb0d3b	Fix consistency between PV and bestmove output In case that a non-root mainThread on a node is the new best thread in the cluster, it should always output its PV.	2018-12-29 15:34:55 +01:00
noobpwnftw	2405b38165	Fix search result aggregation This reverts my earlier change that only the root node gets to output best move after fixing problem with MPI_Allreduce by our custom operator(BestMoveOp). This function is not commutable and we must ensure that its output is consistent among all nodes.	2018-12-29 15:34:55 +01:00
noobpwnftw	8a95d269eb	Implement proper stop signalling from root node Previous behavior was to wait on all nodes to finish their search on their own TM and aggregate to root node via a blocking MPI_Allreduce call. This seems to be problematic. In this commit a proper non-blocking signalling barrier was implemented to use TM from root node to control the cluster search, and disable TM on all non-root nodes. Also includes some cosmetic fix to the nodes/NPS display.	2018-12-29 15:34:55 +01:00
noobpwnftw	3b7b632aa5	Fix a bug of outputting multiple lines of bestmove	2018-12-29 15:34:55 +01:00

... 3 4 5 6 7 ...

4755 commits