make skipEarlyPruning a search argument instead of managing this by hand.
Verified for no regression at STC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 96754 W: 17089 L: 17095 D: 62570
No functional change.
* Refactor bonus and penalty calculation
Compute common terms in a helper function.
No functional change.
* Further refactoring
Remove some parenthesis that are now useless.
Define prevSq once, use repeatedly.
No functional change.
bench: 5884767 (bench of previous patch is wrong)
Makes the actual number of nodes searched match closely
the number of nodes requested, by increasing the frequency
of checking the number of nodes searched at low node count.
All other searches retain the default checking frequency of
once per 4096 nodes, and are thus unaffected.
Passed STC as non-regression
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 26643 W: 4766 L: 4655 D: 17222
No functional change.
In 10 of 12 calls total to Position::do_move()the givesCheck argument is
simply gives_check(m). So it's reasonable to make an overload without
this parameter, which wraps the existing version.
No functional change.
Non functional change, tests under sanitizers OK.
Rationales for change
- Offset in code is in range -4 ... 2
- There was an error by (pathological) corner case MAX_PLY=0
No functional change.
Rewrite the code in SF style, simplify and
document it.
Code is now much clear and bug free (no mem-leaks and
other small issues) and is also smaller (more than
600 lines of code removed).
All the code has been rewritten but root_probe() and
root_probe_wdl() that are completely misplaced and should
be retired altogheter. For now just leave them in the
original version.
Code is fully and deeply tested for equivalency both in
functionality and in speed with hundreds of games and
test positions and is guaranteed to be 100% equivalent
to the original.
Tested with tb_dbg branch for functional equivalency on
more than 12M positions.
stockfish.exe bench 128 1 16 syzygy.epd
Position: 2016/2016
Total 12121156 Hits 0 hit rate (%) 0
Total time (ms) : 4417851
Nodes searched : 1100151204
Nodes/second : 249024
Tested with 5,000 games match against master, 1 Thread,
128 MB Hash each, tc 40+0.4, which is almost equivalent
to LTC in Fishtest on this machine. 3-, 4- and 5-men syzygy
bases on SSD, 12-moves opening book to emphasize mid- and endgame.
Score of SF-SyzygyC++ vs SF-Master: 633 - 617 - 3750 [0.502] 5000
ELO difference: 1
No functional change.
Instead of outputting "info nodes ... time ..." when the last
iteration is interrupted, simply call UCI::pv() to output the PV.
I thought about calling UCI:pv() with bounds -VALUE_INFINITE, VALUE_INFINITE
to avoid "lowerbound" or "upperbound" appearing in it, but I'm not sure that
would be any better.
This patch fixes rare inconsistencies between the first move of
the last PV output and the bestmove played. It also makes sure
that all the latest statistics are sent to the GUI (not only nodes
and time but also nps, tbhits, hashfull).
No functional change.
Use a per-thread counter to reduce contention
with many cores and endgame positions.
Measured around 1% speed-up on a 12 core and 8%
on 28 cores with 6-men, searching on:
7R/1p3k2/2p2P2/3nR1P1/8/3b1P2/7K/r7 b - - 3 38
Also retire the unused set_nodes_searched() and fix
a couple of return types and naming conventions.
No functional change.
We reference (ss-1)->currentMove, i.e. we peek
current move of the parent node, so currentMove
should be valid in the main move loop, when we
search() the subtree, but outside of main loop
it is useless.
No functional change.
Also a speedup since we don't need to recalculate SEE
for extensions...as it already determined to be positive.
Results for 12 tests for each version:
Base Test Diff
Mean 2132395 2191002 -58607
StDev 128058 85917 134239
p-value: 0.669
speedup: 0.027
Non functional change.
Stephane's patch removes the only usage of Position::see, where the
returned value isn't immediately compared with a value. So I replaced
this function by its optimised and more specific version see_ge. This
function also supersedes the function Position::see_sign.
bool Position::see_ge(Move m, Value v) const;
This function tests if the SEE of a move is greater or equal than a
given value. We use forward iteration on captures instread of backward
one, therefore we don't need the swapList array. Also we stop as soon
as we have enough information to obtain the result, avoiding unnecessary
calls to the min_attacker function.
Speed tests (Windows 7), 20 runs for each engine:
Test engine: mean 866648, st. dev. 5964
Base engine: mean 846751, st. dev. 22846
Speedup: 1.023
Speed test by Stephane Nicolet
Fishtest STC test:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 26040 W: 4675 L: 4442 D: 16923
http://tests.stockfishchess.org/tests/view/57f648990ebc59038170fa03
No functional change.
This is a bit tricky because we don't want
to prune the only legal evasions, even if
with negative SEE. So add an assert to avoid
this subtle bug to slip in later.
STC:
LLR: 2.96 (-2.94,2.94) [0.00,4.00]
Total: 14140 W: 2625 L: 2421 D: 9094
LTC:
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 11558 W: 1555 L: 1379 D: 8624
bench: 5256717
Rename shift_bb() to shift(), and DELTA_S to SOUTH, etc.
to improve code readability, especially in evaluate.cpp
when they are used together:
old b = shift_bb<DELTA_S>(pos.pieces(PAWN))
new b = shift<SOUTH>(pos.pieces(PAWN))
While there fix some small code style issues.
No functional change.
Drop useless condition
abs(ttValue) < VALUE_KNOWN_WIN
And extend singular extension search to cases when ttValue
stores a mate score. This improves mate finding and does
not introduce any regression.
Yery tested this patch against current master on the 6500+
Chest mate suite with 200K fixed nodes:
shortest mates found: master: 1206 patch:1205
any mate found: master: 1903 patch: 2003
with 1 sec time:
shortest mates found: master: 2667 patch: 2628
any mate found: master: 3585 patch: 3646
Verified for no regression:
STC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 25655 W: 4578 L: 4465 D: 16612
LTC
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 66247 W: 8618 L: 8557 D: 49072
bench: 6335042
Both Tablebases::filter_root_moves() and
extract_ponder_from_tt(9 were unable to handle
a mate/stalemate position.
Spotted and reported by Dann Corbit.
Added some mate/stalemate positions to bench so
to early catch this regression in the future.
No functional change.
Rescales the king danger variables in evaluate_king() to
suppress the KingDanger[] array. This avoids the cost of
the memory accesses to the array and simplifies the non-linear
transformation used.
Full credits to "hxim" for the seminal idea and implementation,
see pull request #786.
https://github.com/official-stockfish/Stockfish/pull/786
Passed STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 9649 W: 1829 L: 1689 D: 6131
Passed LTC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 53494 W: 7254 L: 7178 D: 39062
Bench: 6116200
Drops a scalability bottleneck due to memory contention
of a single shared table across threads. The effect starts
to be sensible with a high number of threads. Specifically
we have a small regression with 7 threads both at 60 and
180 seconds TC:
10000 @ 60+0.6 th 7
ELO: -2.46 +-3.2 (95%) LOS: 6.5%
Total: 9896 W: 1037 L: 1107 D: 7752
5000 @ 180+0.6 th 7
ELO: -1.95 +-4.1 (95%) LOS: 17.7%
Total: 5000 W: 444 L: 472 D: 4084
We have a regression because counterMoveHistory table is
quite big and it takes time for a single thread to fill it.
Sharing the table yields to a higher fill rate and better
quality of moves and up to 7 threads the benefits of sharing
more then compensate the loss in speed due to contention.
Interestingly even with a 3X longer TC, so with more time
for the single thread to catch up, the improvment is quite
limited and below noise level. It seems we really need much
longer TC to saturate the table.
When we move to high threads number it's another story:
5000 @ 60+0.6 th 22
ELO: 3.49 +-4.3 (95%) LOS: 94.6%
Total: 4880 W: 490 L: 441 D: 3949
2000 @ 60+0.6 th 32
ELO: 8.34 +-6.9 (95%) LOS: 99.1%
Total: 2000 W: 229 L: 181 D: 1590
As expected the speed-up more than compensates the filling
rate, and we expect that with tournament TC, where single
thread is able to saturate the table, the difference will
be even stronger. For instance for TCEC 9 super-final time
control will be 180 minutes + 15 seconds and this scalability
improvement seems definitely the way to go.
So, summarizing:
GOOD:
Measured big improvement in high core scenario
Suitable for TCEC 9 superfinal (big hardware, very long TC)
Consistent and natural patch that extends to counterMoveHistory
what we already do for remaining history tables, that are all per-thread
Non functional change for the common case of a single core
Very simple (just 6 lines modified, no added ones)
BAD:
Small regression (within 2-3 ELO) with few threads and short TC
bench: 5341477
In case we have installed a not complete set of 6-men tables and
there is 6 piece position on board, but no corresponding
tablebase engine is not using any syzygy at all.
Reported by Jouni Uski, fix by Peter Österlund,
confirmed as a bug by Ronald de Man.
bench: 7591630
This greately simplifies usage because hides to the
search the implementation specific CheckInfo.
This is based on the work done by Marco in pull request #716,
implementing on top of it the ideas in the discussion: caching
the calls to slider_blockers() in the CheckInfo structure,
and simplifying the slider_blockers() function by removing its
first parameter.
Compared to master, bench is identical but the number of calls
to slider_blockers() during bench goes down from 22461515 to 18853422,
hopefully being a little bit faster overall.
archlinux, gcc-6
make profile-build ARCH=x86-64-bmi2
50 runs each
bench:
base = 2356320 +/- 981
test = 2403811 +/- 981
diff = 47490 +/- 1828
speedup = 0.0202
P(speedup > 0) = 1.0000
perft 6:
base = 175498484 +/- 429925
test = 183997959 +/- 429925
diff = 8499474 +/- 469401
speedup = 0.0484
P(speedup > 0) = 1.0000
perft 7 (but only 10 runs):
base = 185403228 +/- 468705
test = 188777591 +/- 468705
diff = 3374363 +/- 476687
speedup = 0.0182
P(speedup > 0) = 1.0000
$ ./pyshbench ../Stockfish/master ../Stockfish/test 20
run base test diff
...
base = 2501728 +/- 182034
test = 2532997 +/- 182034
diff = 31268 +/- 5116
speedup = 0.0125
P(speedup > 0) = 1.0000
No functional change.
This non-functional change patch is a deep work to allow SF to be independent
from the actual value of ONE_PLY (currently set to 1). I have verified SF is
now independent for ONE_PLY values 1, 2, 4, 8, 16, 32 and 256.
This patch gives consistency to search code and enables future work, opening
the door to safely tweaking the ONE_PLY value for any reason.
Verified for no speed regression at STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 95643 W: 17728 L: 17737 D: 60178
No functional change.
Rewritten in a way to have explicit in the search
the bonus/penalty we apply: hopefully this will lead
to further simplification/fix of current rather messy
stats update code.
No functional change.
STC: (Yellow)
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 69115 W: 12880 L: 12797 D: 43438
LTC:
LLR: 2.96 (-2.94,2.94) [0.00,4.00]
Total: 124163 W: 16923 L: 16442 D: 90798
Note: Note based off past experiments / patches... history pruning
is quite TC sensitive. I believe the reason for this TC dependency
is that the CMH/FMH is a very large table that takes time to fill
up with. In addition having more time for will increase the accuracy
of the stats' value.
Bench: 7351698