The recently committed Fail-High patch (081af90805)
had a number of changes beyond adjusting the depth of search on fail high, with
some undesirable side effects.
1) Decreasing depth on PV output, confusing GUIs and players alike as described in
issue #1787. The depth printed is anyway a convention, let's consider adjustedDepth
an implementation detail, and continue to print rootDepth. Depth, nodes, time and
move quality all increase as we compute more. (fixing this output has no effect on
play).
2) Fixes go depth output (now based on rootDepth again, no effect on play), also
reported in issue #1787
3) The depth lastBestDepth is used to compute how long a move is stable, a new move
found during fail-high is incorrectly considered stable if based on adjustedDepth
instead of rootDepth (this changes time management). Reverting this passed STC
and LTC:
STC
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 82982 W: 17810 L: 17808 D: 47364
http://tests.stockfishchess.org/tests/view/5bd391a80ebc595e0ae1e993
LTC
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 109083 W: 17602 L: 17619 D: 73862
http://tests.stockfishchess.org/tests/view/5bd40c820ebc595e0ae1f1fb
4) In the thread voting scheme, the rank of the fail-high thread is now artificially
low, incorrectly since the quality of the move is much better than what adjustedDepth
suggests (e.g. if it takes 10 iterations to find VALUE_KNOWN_WIN, it has very low
depth). Further evidence comes from a test that showed that the move of highest
depth is not better than that of the last PV (which is potentially of much lower
adjustedDepth).
I.e. this test http://tests.stockfishchess.org/tests/view/5bd37a120ebc595e0ae1e7c3
failed SPRT[0, 5]:
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 10609 W: 2266 L: 2345 D: 5998
In a running 5+0.05 th 8 test (more than 10000 games) a positive Elo estimate is
shown (strong enough for a [-3,1], possibly not [0,4]):
http://tests.stockfishchess.org/tests/view/5bd421be0ebc595e0ae1f315
LLR: -0.13 (-2.94,2.94) [0.00,4.00]
Total: 13644 W: 2573 L: 2532 D: 8539
Elo 1.04 [-2.52,4.61] / LOS 71%
Thus, restore old behavior as a bugfix, keeping the core of the fail-high patch
idea as resolving scheme. This is non-functional for bench, but changes searches
via time management and in the threaded case.
Bench: 3556672
This helps resolving consecutive FH's during aspiration more efficiently
STC:
http://tests.stockfishchess.org/tests/view/5bc857920ebc592439f85765
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 4992 W: 1134 L: 980 D: 2878 Elo +10.72
LTC:
http://tests.stockfishchess.org/tests/view/5bc868050ebc592439f857ef
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 8123 W: 1363 L: 1210 D: 5550 Elo +6.54
No-Regression test with 8 threads, tc=15+0.15:
http://tests.stockfishchess.org/tests/view/5bc874ca0ebc592439f85938
LLR: 2.94 (-2.94,2.94) [-3.00,1.00]
Total: 24740 W: 3977 L: 3863 D: 16900 Elo +1.60
This was a cooperation between me and Michael Stembera:
-me recognizing SF having problems with resolving FH's efficiently at
high depths, thus starting some tests based on consecutive FH's.
-mstembera picking up the idea with first success at STC & LTC (so full
credits to him!)
-me suggesting how to resolve the issues pinpointed by S.G on PR #1768
and finally restricting the logic to the main thread so that it don't
regresses at multi-thread.
bench: 3314347
Enable numa machinery only for STRICTLY MORE than 8 threads. Reason for this
change is that nowadays SMP tests are always done with 8 threads. That is a
problem for multi-socket Windows machines running on fishtest.
No functional change
The patch adds a small random component (+-1) to VALUE_DRAW for the evaluation
of draw positions (mostly 3folds). This random component is not static, but
potentially different for each visit of the node (hence derived from the node
counter). The effect is that in positions with many 3fold draw lines, different
lines are followed at each iteration. This keeps the search much more dynamic,
as opposed to being locked to one particular 3fold.
An example of a position where master suffers from 3fold-blindness and this patch
solves quickly is the famous TCEC game 53:
FEN: 3r2k1/pr6/1p3q1p/5R2/3P3p/8/5RP1/3Q2K1 b - - 0 51
master doesn't see that this is a lost position (draw eval up to depth 50) as
Qf6-e6 d4-d5 (found by patch at depth 23) leads to a loss.
The 3fold-blindness is more important at longer TC, the patch was yellow STC and
LTC, but passed VLTC:
STC
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 46328 W: 10048 L: 9953 D: 26327
http://tests.stockfishchess.org/tests/view/5b9c0ca20ebc592cf275f7c7
LTC
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 54663 W: 8938 L: 8846 D: 36879
http://tests.stockfishchess.org/tests/view/5b9ca1610ebc592cf27601d3
VLTC
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 31789 W: 4512 L: 4284 D: 22993
http://tests.stockfishchess.org/tests/view/5b9d1a670ebc592cf276076d
Credit to @crossbr for pointing to this problem repeatedly, and giving the hint
that many draw lines are typical in those situations.
Bench: 4756639
Currently we update (track up) the pv even in the fail high case.
However most times in such cases the pv in the ply below remains unset
because there we have value == alpha and so finally we see truncated
pv's (=just one move) in fail high cases.
Of course tracking down these pv's (+sending them to the gui) comes at a
certian cost, but no-regression tests passed:
STC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 16300 W: 3556 L: 3424 D: 9320
http://tests.stockfishchess.org/tests/view/5b9b73500ebc592cf275ea92
LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 202411 W: 32734 L: 32897 D: 136780
http://tests.stockfishchess.org/tests/view/5b9baed10ebc592cf275ef6d
N.B.: Digging also into qsearch was tried in another version but seemed
not to pass the tests. This means that we don't always will get a pv
until the very tips.
No functional change
This PR is a combination of two unrelated [0, 4] patches that appeared promising
but not quite strong enough to pass on their own. The combination initially failed
STC with a positive score after a long run, and the subsequent speculative LTC test
passed.
* tweak_threatOnQueen4 :
Increase the middlegame components of ThreatByMinor[QUEEN]
and ThreatByRook[QUEEN] by 15 each. Bryan's (@crossbr) analysis of CCC Bonus Game 10
inspired several tests on penalizing a queen with limited safe mobility. While
attempting to implement this idea, I noticed that when I did not include the queen's
current square in the calculations, the Elo gains seemed to vanish--and only then did
I have the idea to revisit ThreatByMinor[QUEEN] and ThreatByRook[QUEEN], adding a
corresponding value to each. Without Bryan's work, this test would never have been
submitted. I would also like to recognize the efforts and contributions of @SFisGOD,
who also vigorously worked on this idea.
* Use pure static eval for null move pruning :
This idea was directly re-purposed from a promising test by Jerry Donald Watson
(@jerrydonaldwatson) in August. It was also independently developed and tested by
Stefan Geschwentner (@locutus2) previously.
Thank you all!
STC (failed yellow):
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 83913 W: 17986 L: 17825 D: 48102
http://tests.stockfishchess.org/tests/view/5bbc59300ebc592439f76aa5
LTC:
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 137198 W: 22351 L: 21772 D: 93075
http://tests.stockfishchess.org/tests/view/5bbce35f0ebc592439f77639
Bench: 4312846
Note by snicolet: I use this non-functional change patch
as a pretext to correct the wrong bench number I introduced
in the message of the previous commit.
Bench: 4059356
Storing unconditionally the current generation and bound is equivalent to master.
Part of the condition was added as a speed optimization in #429.
Here the branch is fully eliminated.
passed STC single-threaded:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 73515 W: 16378 L: 16359 D: 40778
http://tests.stockfishchess.org/tests/view/5b2fc38c0ebc5902b2e57fd5
passed STC multi-threaded:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 63725 W: 12916 L: 12874 D: 37935
http://tests.stockfishchess.org/tests/view/5b307b8f0ebc5902b2e5895f
The multithreaded test was run after a plausible suggestion by @mstembera that the effect of this could be larger with many cores. The result seems to indicate this doesn't really matter on the 8core architecture abundantly available on fishtest.
No functional change
a) Reduce PSQT values along the long diagonals on non-central squares
and increase the LongDiagonal bonus accordingly. The effect is to penalise
bishops on the long diagonal which can not "see" the 2 central squares.
The "good" bishops still have more or less the same bonus as current master.
b) For a bishop on a central square, because of the "| s" term in the code,
the LongDiagonalBonus was always given. So while being there, remove the "| s"
and compensate the central Bishop PSQT accordingly.
Passed STC
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 44498 W: 9658 L: 9323 D: 25517
http://tests.stockfishchess.org/tests/view/5b8992770ebc592cf2748942
Passed LTC
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 63092 W: 10324 L: 9975 D: 42793
http://tests.stockfishchess.org/tests/view/5b89a17a0ebc592cf2748b59
Closes https://github.com/official-stockfish/Stockfish/pull/1760
bench: 4693901
It looks like PawnsOnBothFlanks can be removed from initiative().
A barrage of tests seem to confirm that the adjustment to -110
does not gain elo to offset any potential loss by removing
PawnsOnBothFlanks.
STC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 22014 W: 4760 L: 4639 D: 12615
http://tests.stockfishchess.org/tests/view/5b7f50cc0ebc5902bdbb3a3e
LTC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 40561 W: 6667 L: 6577 D: 27317
http://tests.stockfishchess.org/tests/view/5b801f9f0ebc5902bdbb4467
The barrage of 0,4 tests on the -136 value are in my ps_tunetests branch.
http://tests.stockfishchess.org/tests/user/protonspring
Closes https://github.com/official-stockfish/Stockfish/pull/1751
Bench: 4413173
-------------
How to continue from there?
The fact that endgames with all the pawns on only one flank are
drawish is a well-known chess idea, so it seems quite strange that
this can be removed so easily without losing Elo.
In the past there had been attempts to improve on PawnsOnBothFlanks
with similar concepts (for instance using the pawn span value), but
the tests were at best neutral. Maybe Stockfish is now mature enough
that these refined ideas would work to replace PawnsOnBothFlanks?
There is no need to make this as large as 65536 just for the sake of the
single 7-man tablebase that happens to have the key 0xf9247fff. Idea for the
fix by Ronald de Man, who suggested simply to allow more buckets past the end.
We also implement Robin Hood hashing for the hash table, which takes the worst
-case search for full 7-man tablebases down from 68 to 11 probes (Also takes
the average probe length from 2.06 to 2.05). For a table with 8K entries, the
corresponding numbers would be worst-case from 9 to 4, with average from 1.30
to 1.29.
https://github.com/official-stockfish/Stockfish/pull/1747
No functional change
This commit tries to make the new pure static eval code more readable by
splitting up the nested assignments into separate lines and making a few
more cosmetic tweaks.
No functional change.
This is a non-functional change. By pre-incrementing minKingPawnDistance
instead of post-incrementing, we can remove this -1.
This also makes DistanceRing more consistent with the rest of stockfish
since it now holds an actual "distance" instead of a less natural distance-1.
In current master, PseudoAttacks[KING][ksq] == DistanceRingBB[ksq][0]
With this patch, it will be PseudoAttacks[KING][ksq] == DistanceRingBB[ksq][1]
ie squares at distance 1 from the king. This is more natural use of distance.
The current array size DistanceRingBB[SQUARE_NB][8] is still OK with the new
definition, because maximum distance between two squares on a chess board is
seven (for example Kh1 and a8).
No functional change.
Follow-up for the previous patch: we use an affine formula to mix stats
and evaluation in search. The idea is to give a bonus if the previous
move of the opponent was historically bad, and a malus if the previous
move of the opponent was historically good.
More precisely, if x is the stat score of the previous move by the opponent,
we implement the following formulas to tweak the evaluation at an internal
node of the tree for our pruning decisions at this node:
if x = 0, use v' = eval(P)
if x > 0, use v' = eval(P) - 5 - x/1024
if x < 0, use v' = eval(P) + 5 - x/1024
For reference, the previous master had this simpler rule:
if x > 0, use v' = eval(P) - 10
if x <= 0, use v' = eval(P)
STC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 29322 W: 6359 L: 6088 D: 16875
http://tests.stockfishchess.org/tests/view/5b76a5980ebc5902bdba957f
LTC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 30893 W: 5154 L: 4910 D: 20829
http://tests.stockfishchess.org/tests/view/5b76ca6d0ebc5902bdba9914
Closes https://github.com/official-stockfish/Stockfish/pull/1740
Bench: 4592766
Mix search stats with evaluation: if the opponent's move has a good historyStat,
then decrease the evaluation of the internal node a bit for the pruning decisions
during search.
STC;
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 72083 W: 15683 L: 15203 D: 41197
http://tests.stockfishchess.org/tests/view/5b74c3ea0ebc5902bdba7d41
LTC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 29104 W: 4867 L: 4630 D: 19607
http://tests.stockfishchess.org/tests/view/5b7565000ebc5902bdba851b
Closes https://github.com/official-stockfish/Stockfish/pull/1738
Bench: 4514101
-----------
How to continue from there?
• the use of the previous stat score can probably be simplified in lines 587 and 716
• we could try to use a continuous bonus based on the previous stat score, instead
of just a fixed offset of -10 when the opponent previous move was good.
----------
Comments by Stefan Geschwentner:
Interesting idea. Because only the eval in search is tweak this should only
influence the eval and static eval used at inner nodes, and not on the return
search value (which comes in the end from quiescence search), except through
saving in TT followed by a TT cutoff.
So essentialy this effects diverse pruning/reduction parts -- eval and static
eval are lowered for good opponent moves:
• tt cutoff (ttValue)
• improving (static eval)
• more razoring (eval)
• less futility pruning (eval)
• less null move pruning (eval + static eval) (but with little more depth)
• more probcut (static eval)
• more move futility pruning (static eval)
We double in this patch the weight of the capture history table in the
local scoring of captures for move ordering.
The capture history table is indexed by the triplet (capturing piece,
capture square, captured piece) and gets information like "it seems to
have been historically good in that part of the search tree to capture
a pawn with a rook on g3, even if it seems to lose material", and affect
the normaly pure « Most Valuable Victim » ordering of captures.
Finished yellow at STC after 228842 games (posting a +1.36 Elo gain):
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 228842 W: 50894 L: 50152 D: 127796
http://tests.stockfishchess.org/tests/view/5b714bb00ebc5902bdba332d
Passed LTC:
LLR: 2.96 (-2.94,2.94) [0.00,4.00]
Total: 43251 W: 7425 L: 7131 D: 28695
http://tests.stockfishchess.org/tests/view/5b71c7d40ebc5902bdba3e51
Thanks to user Vizvezdenec for running the LTC test.
Closes https://github.com/official-stockfish/Stockfish/pull/1736
Bench: 4272361
This patch introduces a non-linear bonus for pawns, along with some
(linear) corrections for the other pieces types.
The original values were obtained by a massive non-linear tuning of both
pawns and other pieces by GuardianRM, while Alain Savard and Chris Cain
later simplified the patch by observing that, apart from the pawn case, the
tuned corrections were in fact almost affine and could be incorporated in
our current code base via the piece values in types.h (offset) and the diagonal
of the quadratic matrix (slope). See discussion in PR#1725 :
https://github.com/official-stockfish/Stockfish/pull/1725
STC:
LLR: 2.97 (-2.94,2.94) [0.00,5.00]
Total: 42948 W: 9662 L: 9317 D: 23969
http://tests.stockfishchess.org/tests/view/5b6ff6e60ebc5902bdba1d87
LTC:
LLR: 2.97 (-2.94,2.94) [0.00,5.00]
Total: 19683 W: 3409 L: 3206 D: 13068
http://tests.stockfishchess.org/tests/view/5b702dbd0ebc5902bdba216b
How to continue from there?
- Maybe the non-linearity for the pawn value could be somewhat tempered
again and a simpler linear correction for pawns would work?
Closes https://github.com/official-stockfish/Stockfish/pull/1734
Bench: 4681496
We simplify the razoring logic by applying it to all nodes at depth 1 only.
An added advantage is that only one razor margin is needed now, and we treat
PV and Non-PV nodes in the same manner.
How to continue?
- There may be some conditions in which depth 2 razoring is beneficial.
- We can see whether the razor margin can be tuned, perhaps even with a
different value for PV nodes.
- Perhaps we can unify the treatment of PV and Non-PV nodes in other parts
of the search as well.
STC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 5474 W: 1281 L: 1127 D: 3066
http://tests.stockfishchess.org/tests/view/5b6de3b20ebc5902bdba0d1e
LTC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 62670 W: 10749 L: 10697 D: 41224
http://tests.stockfishchess.org/tests/view/5b6dee340ebc5902bdba0eb0
In addition, we ran a fixed LTC test against a similar patch which also
passed SPRT [-3, 1]:
ELO: 0.23 +-2.1 (95%) LOS: 58.6%
Total: 36412 W: 6168 L: 6144 D: 24100
http://tests.stockfishchess.org/tests/view/5b6e83940ebc5902bdba1485
We are opting for this patch as the more logical and simple of the two,
and it appears to be no less strong. Thanks in particular to @DU-jdto
for input into this patch.
Bench: 4476945
Currently, we do not consider pawns passed if there is another pawn of
the same color in front of them. It appears that this condition is not
necessary. The idea is that the doubled pawns are likely to be weak and
one of them will be likely captured anyway. On the other hand, if we do
somehow manage to promote a pawn, then the pawn behind it becomes passed
as well. In any case, the end result is we end up with an extra
potentially passed pawn. The current evaluation for passed pawns already
handles this case by also scaling down this effect.
STC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 28291 W: 6287 L: 6178 D: 15826
http://tests.stockfishchess.org/tests/view/5b6c4b960ebc5902bdb9f256
LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 30717 W: 5256 L: 5151 D: 20310
http://tests.stockfishchess.org/tests/view/5b6c82980ebc5902bdb9f863
Bench: 4938285
Unify the "quiet" and "non-quiet" reduction rules for use at any kind of moves.
The idea behind it was that both rules reduce at similiar cases in master:
one directly for late previous moves and the other indirectly by using a
bad stat score which is used for most move sorting and so approximates the
late move condition.
For captures/promotions the old rule was triggered in 25% but the new
rule only for 3% of all cases (so now more reductions are done, whereas
for quiet moves reductions keep the same level).
STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 162327 W: 35976 L: 36134 D: 90217
http://tests.stockfishchess.org/tests/view/5b6a9a430ebc5902bdb9d5c1
LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 29570 W: 5083 L: 4976 D: 19511
http://tests.stockfishchess.org/tests/view/5b6bc5d00ebc5902bdb9e9d6
Bench: 4526980
Currently, we first calculate some bitboards at the top of Evaluation::space()
and then check whether we actually need them. Invert the ordering. Of course this
does not make a difference in current master because the constexpr bitboard
calculations are in fact done at compile time by any decent compiler, but I find
my version a bit healthier since it will always meet or exceed current implementation
even if we eventually change the spaceMask to something not contsexpr.
No functional change.
After a session of tuning for King Psqt I got some new values, which was later
tweaked manually by me Fauzi, to result in an Elo-gain patch which seems to scale
pretty well:
STC: LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 100653 W: 22550 L: 22314 D: 55789 [Yellow patch]
LTC: LLR: 2.96 (-2.94,2.94) [0.00,4.00]
Total: 147079 W: 25584 L: 24947 D: 96548 [Green Patch]
Bench: 4669050
Introduce voting system for best move selction in multi-threads mode.
Joint work with Stefan Geschwentner, based on ideas introduced by
Michael Stembera.
Moves are upvoted by every thread using the margin to the minimum score
across threads and the completed depth.
First thread voting for the winner move is selected as best thread.
Passed STC, LTC. A further LTC test with only 4 threads failed with positive
score. A LTC with 31 threads was stopped with LLR 0.77 after 25k games to
avoid use of excessive resources (equivalent to 1.5M STC games).
Similar ideas were proposed by Michael Stembera 2 years ago #507, #508.
This implementation seems simpler and more understandable, the results
slightly more promising.
Further possible work:
1) Tweak of the formula using for assigning votes.
2) Use a different baseline for the score dependent part: maximum score
or winning probability could make more sense.
3) Assign votes in `Thread::Search` as iterations are completed and use
voting results to stop search.
4) Select best thread as the threads voting for best move with the highest
completed depth or, alternatively, vote on PV moves.
Link to SPRT tests
[stopped LTC, 31 threads 20+0.02](http://tests.stockfishchess.org/tests/view/5b61dc090ebc5902bdb95192)
LLR: 0.77 (-2.94,2.94) [0.00,5.00]
Total: 25602 W: 3977 L: 3850 D: 17775
Elo: 1.70 [-0.68,4.07] (95%)
[passed LTC, 8 threads 20+0.02](http://tests.stockfishchess.org/tests/view/5b5df5180ebc5902bdb9162d)
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 44478 W: 7602 L: 7300 D: 29576
Elo: 1.92 [-0.29,3.94] (95%)
[failed LTC, 4 threads 20+0.02](http://tests.stockfishchess.org/tests/view/5b5f39ef0ebc5902bdb92792)
LLR: -2.94 (-2.94,2.94) [0.00,5.00]
Total: 29922 W: 5286 L: 5285 D: 19351
Elo: 0.48 [-1.98,3.10] (95%)
[passed STC, 4 threads 5+0.05](http://tests.stockfishchess.org/tests/view/5b5dbf0f0ebc5902bdb9131c)
LLR: 2.97 (-2.94,2.94) [0.00,5.00]
Total: 9108 W: 2033 L: 1858 D: 5217
Elo: 6.11 [1.26,10.89] (95%)
No functional change (in simple threat mode)