Use a per-thread counter to reduce contention
with many cores and endgame positions.
Measured around 1% speed-up on a 12 core and 8%
on 28 cores with 6-men, searching on:
7R/1p3k2/2p2P2/3nR1P1/8/3b1P2/7K/r7 b - - 3 38
Also retire the unused set_nodes_searched() and fix
a couple of return types and naming conventions.
No functional change.
We reference (ss-1)->currentMove, i.e. we peek
current move of the parent node, so currentMove
should be valid in the main move loop, when we
search() the subtree, but outside of main loop
it is useless.
No functional change.
Also a speedup since we don't need to recalculate SEE
for extensions...as it already determined to be positive.
Results for 12 tests for each version:
Base Test Diff
Mean 2132395 2191002 -58607
StDev 128058 85917 134239
p-value: 0.669
speedup: 0.027
Non functional change.
Stephane's patch removes the only usage of Position::see, where the
returned value isn't immediately compared with a value. So I replaced
this function by its optimised and more specific version see_ge. This
function also supersedes the function Position::see_sign.
bool Position::see_ge(Move m, Value v) const;
This function tests if the SEE of a move is greater or equal than a
given value. We use forward iteration on captures instread of backward
one, therefore we don't need the swapList array. Also we stop as soon
as we have enough information to obtain the result, avoiding unnecessary
calls to the min_attacker function.
Speed tests (Windows 7), 20 runs for each engine:
Test engine: mean 866648, st. dev. 5964
Base engine: mean 846751, st. dev. 22846
Speedup: 1.023
Speed test by Stephane Nicolet
Fishtest STC test:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 26040 W: 4675 L: 4442 D: 16923
http://tests.stockfishchess.org/tests/view/57f648990ebc59038170fa03
No functional change.
This is a bit tricky because we don't want
to prune the only legal evasions, even if
with negative SEE. So add an assert to avoid
this subtle bug to slip in later.
STC:
LLR: 2.96 (-2.94,2.94) [0.00,4.00]
Total: 14140 W: 2625 L: 2421 D: 9094
LTC:
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 11558 W: 1555 L: 1379 D: 8624
bench: 5256717
Rename shift_bb() to shift(), and DELTA_S to SOUTH, etc.
to improve code readability, especially in evaluate.cpp
when they are used together:
old b = shift_bb<DELTA_S>(pos.pieces(PAWN))
new b = shift<SOUTH>(pos.pieces(PAWN))
While there fix some small code style issues.
No functional change.
Drop useless condition
abs(ttValue) < VALUE_KNOWN_WIN
And extend singular extension search to cases when ttValue
stores a mate score. This improves mate finding and does
not introduce any regression.
Yery tested this patch against current master on the 6500+
Chest mate suite with 200K fixed nodes:
shortest mates found: master: 1206 patch:1205
any mate found: master: 1903 patch: 2003
with 1 sec time:
shortest mates found: master: 2667 patch: 2628
any mate found: master: 3585 patch: 3646
Verified for no regression:
STC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 25655 W: 4578 L: 4465 D: 16612
LTC
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 66247 W: 8618 L: 8557 D: 49072
bench: 6335042
Both Tablebases::filter_root_moves() and
extract_ponder_from_tt(9 were unable to handle
a mate/stalemate position.
Spotted and reported by Dann Corbit.
Added some mate/stalemate positions to bench so
to early catch this regression in the future.
No functional change.
Rescales the king danger variables in evaluate_king() to
suppress the KingDanger[] array. This avoids the cost of
the memory accesses to the array and simplifies the non-linear
transformation used.
Full credits to "hxim" for the seminal idea and implementation,
see pull request #786.
https://github.com/official-stockfish/Stockfish/pull/786
Passed STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 9649 W: 1829 L: 1689 D: 6131
Passed LTC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 53494 W: 7254 L: 7178 D: 39062
Bench: 6116200
Drops a scalability bottleneck due to memory contention
of a single shared table across threads. The effect starts
to be sensible with a high number of threads. Specifically
we have a small regression with 7 threads both at 60 and
180 seconds TC:
10000 @ 60+0.6 th 7
ELO: -2.46 +-3.2 (95%) LOS: 6.5%
Total: 9896 W: 1037 L: 1107 D: 7752
5000 @ 180+0.6 th 7
ELO: -1.95 +-4.1 (95%) LOS: 17.7%
Total: 5000 W: 444 L: 472 D: 4084
We have a regression because counterMoveHistory table is
quite big and it takes time for a single thread to fill it.
Sharing the table yields to a higher fill rate and better
quality of moves and up to 7 threads the benefits of sharing
more then compensate the loss in speed due to contention.
Interestingly even with a 3X longer TC, so with more time
for the single thread to catch up, the improvment is quite
limited and below noise level. It seems we really need much
longer TC to saturate the table.
When we move to high threads number it's another story:
5000 @ 60+0.6 th 22
ELO: 3.49 +-4.3 (95%) LOS: 94.6%
Total: 4880 W: 490 L: 441 D: 3949
2000 @ 60+0.6 th 32
ELO: 8.34 +-6.9 (95%) LOS: 99.1%
Total: 2000 W: 229 L: 181 D: 1590
As expected the speed-up more than compensates the filling
rate, and we expect that with tournament TC, where single
thread is able to saturate the table, the difference will
be even stronger. For instance for TCEC 9 super-final time
control will be 180 minutes + 15 seconds and this scalability
improvement seems definitely the way to go.
So, summarizing:
GOOD:
Measured big improvement in high core scenario
Suitable for TCEC 9 superfinal (big hardware, very long TC)
Consistent and natural patch that extends to counterMoveHistory
what we already do for remaining history tables, that are all per-thread
Non functional change for the common case of a single core
Very simple (just 6 lines modified, no added ones)
BAD:
Small regression (within 2-3 ELO) with few threads and short TC
bench: 5341477
In case we have installed a not complete set of 6-men tables and
there is 6 piece position on board, but no corresponding
tablebase engine is not using any syzygy at all.
Reported by Jouni Uski, fix by Peter Österlund,
confirmed as a bug by Ronald de Man.
bench: 7591630
This greately simplifies usage because hides to the
search the implementation specific CheckInfo.
This is based on the work done by Marco in pull request #716,
implementing on top of it the ideas in the discussion: caching
the calls to slider_blockers() in the CheckInfo structure,
and simplifying the slider_blockers() function by removing its
first parameter.
Compared to master, bench is identical but the number of calls
to slider_blockers() during bench goes down from 22461515 to 18853422,
hopefully being a little bit faster overall.
archlinux, gcc-6
make profile-build ARCH=x86-64-bmi2
50 runs each
bench:
base = 2356320 +/- 981
test = 2403811 +/- 981
diff = 47490 +/- 1828
speedup = 0.0202
P(speedup > 0) = 1.0000
perft 6:
base = 175498484 +/- 429925
test = 183997959 +/- 429925
diff = 8499474 +/- 469401
speedup = 0.0484
P(speedup > 0) = 1.0000
perft 7 (but only 10 runs):
base = 185403228 +/- 468705
test = 188777591 +/- 468705
diff = 3374363 +/- 476687
speedup = 0.0182
P(speedup > 0) = 1.0000
$ ./pyshbench ../Stockfish/master ../Stockfish/test 20
run base test diff
...
base = 2501728 +/- 182034
test = 2532997 +/- 182034
diff = 31268 +/- 5116
speedup = 0.0125
P(speedup > 0) = 1.0000
No functional change.
This non-functional change patch is a deep work to allow SF to be independent
from the actual value of ONE_PLY (currently set to 1). I have verified SF is
now independent for ONE_PLY values 1, 2, 4, 8, 16, 32 and 256.
This patch gives consistency to search code and enables future work, opening
the door to safely tweaking the ONE_PLY value for any reason.
Verified for no speed regression at STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 95643 W: 17728 L: 17737 D: 60178
No functional change.
Rewritten in a way to have explicit in the search
the bonus/penalty we apply: hopefully this will lead
to further simplification/fix of current rather messy
stats update code.
No functional change.
STC: (Yellow)
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 69115 W: 12880 L: 12797 D: 43438
LTC:
LLR: 2.96 (-2.94,2.94) [0.00,4.00]
Total: 124163 W: 16923 L: 16442 D: 90798
Note: Note based off past experiments / patches... history pruning
is quite TC sensitive. I believe the reason for this TC dependency
is that the CMH/FMH is a very large table that takes time to fill
up with. In addition having more time for will increase the accuracy
of the stats' value.
Bench: 7351698
Checking for legality of a possible ponder move
must be done before we undo the first pv move,
of course. (spotted by mohammed li.)
This obviously only has any effect when playing in ponder mode.
No functional change.
Take advantage that VALUE_NONE = 32002 to remove
the condition.
Commented out and not removed becuase it is tricky
to rely on the hidden value of VALUE_NONE and code
can break in case we change VALUE_NONE in the future.
No functional change.
This code was added before the accurate pv patch, when
we retrieved PV directly from TT.
It's not required for correct (and long) PVs any more and
should be safe to remove it.
Also, allowing helper threads to repeatedly over-write
TT doesn't seem to make sense(that was probably an un-intended
side-effect of lazy smp). Before Lazy SMP only Main Thread used
to run ID loop and insert PV into TT.
STC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 74346 W: 13946 L: 13918 D: 46482
LTC
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 47265 W: 6531 L: 6447 D: 34287
bench: 8819179
Currently root moves are copied to all teh threads
but are DTZ filtered only in main thread at the
beginning of teh search.
This patch moves the TB filtering before the
copy of root moves fixing issue #679https://github.com/official-stockfish/Stockfish/issues/679
No bench change.
There are two concepts with this patch:
Limit check extensions by using move count.
The idea is to limit search explosion.
Always extend check if the first move gives check.
The idea is to save expensive SEE calls, since the vast
majority of first move will have SEE value >= 0, also
first move may still be strong even if the SEE is negative.
STC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 16503 W: 3068 L: 2873 D: 10562
LTC:
LLR: 2.97 (-2.94,2.94) [0.00,5.00]
Total: 37202 W: 5261 L: 5014 D: 26927
bench: 8543366