Avoid shifting negative signed integers and use typed
enum to avoids decrementing a variable beyond its defined
range, like:
for (Rank r = RANK_8; r >= RANK_1; --r)
Changes were tested individually and passed SPRT[-3, 1].
With this patch gcc --sanitize builds cleanly.
No functional change.
Instead of outputting "info nodes ... time ..." when the last
iteration is interrupted, simply call UCI::pv() to output the PV.
I thought about calling UCI:pv() with bounds -VALUE_INFINITE, VALUE_INFINITE
to avoid "lowerbound" or "upperbound" appearing in it, but I'm not sure that
would be any better.
This patch fixes rare inconsistencies between the first move of
the last PV output and the bestmove played. It also makes sure
that all the latest statistics are sent to the GUI (not only nodes
and time but also nps, tbhits, hashfull).
No functional change.
Restore original behaviour to reset
the counter before a new move search.
Also fixed some warnings and added const
qualifier to a couple of functions, as
suggested by m_stembera.
Thanks to Werner Bergmans for reporting
the regression.
No functional change.
Use a per-thread counter to reduce contention
with many cores and endgame positions.
Measured around 1% speed-up on a 12 core and 8%
on 28 cores with 6-men, searching on:
7R/1p3k2/2p2P2/3nR1P1/8/3b1P2/7K/r7 b - - 3 38
Also retire the unused set_nodes_searched() and fix
a couple of return types and naming conventions.
No functional change.
For a default bench, this fixes the last valgrind
error (jump on uninitialised value).
Passed STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 187869 W: 33303 L: 33463 D: 121103
No functional change.
We reference (ss-1)->currentMove, i.e. we peek
current move of the parent node, so currentMove
should be valid in the main move loop, when we
search() the subtree, but outside of main loop
it is useless.
No functional change.
Also a speedup since we don't need to recalculate SEE
for extensions...as it already determined to be positive.
Results for 12 tests for each version:
Base Test Diff
Mean 2132395 2191002 -58607
StDev 128058 85917 134239
p-value: 0.669
speedup: 0.027
Non functional change.
The target:
Odroid U3 (http://www.hardkernel.com/main/products/prdt_info.php?g_code=g138745696275)
Debian Jessie
As listed in #550 and #638 three modifications are needed for compilation to work:
float-abi flag for GCC If an FPU is present and supported by the installed os then passed value need to be hard.
I didn't find any better solution than using readelf to check for the availibilty of Tag_ABI_VFP_args which sould indicate support for the FPU. The check is only done if the arch is arm and if readelf is not present
on the system, there will be an error (/bin/sh: 1: readelf: not found) but it will not break and will continue with the default softfp value. Outputing the error is not really acceptable but I wanted some feedback on the
check itself.
-lpthread is needed on armv7 outside of Android
I replaced UNAME with KERNEL and OS to allow to differentiate Android.
m32 flag
My understanding is that outside of Android the flag is generating errors on armv7.
These modifications should introduce change only for non Android armv7 build.
No functional change.
The target:
Odroid U3 (http://www.hardkernel.com/main/products/prdt_info.php?g_code=g138745696275)
Debian Jessie
As listed in #550 and #638 three modifications are needed for compilation to work:
float-abi flag for GCC If an FPU is present and supported by the installed os then passed value need to be hard.
I didn't find any better solution than using readelf to check for the availibilty of Tag_ABI_VFP_args which sould indicate support for the FPU. The check is only done if the arch is arm and if readelf is not present
on the system, there will be an error (/bin/sh: 1: readelf: not found) but it will not break and will continue with the default softfp value. Outputing the error is not really acceptable but I wanted some feedback on the
check itself.
-lpthread is needed on armv7 outside of Android
I replaced UNAME with KERNEL and OS to allow to differentiate Android.
m32 flag
My understanding is that outside of Android the flag is generating errors on armv7.
These modifications should introduce change only for non Android armv7 build.
No functional change.
Stephane's patch removes the only usage of Position::see, where the
returned value isn't immediately compared with a value. So I replaced
this function by its optimised and more specific version see_ge. This
function also supersedes the function Position::see_sign.
bool Position::see_ge(Move m, Value v) const;
This function tests if the SEE of a move is greater or equal than a
given value. We use forward iteration on captures instread of backward
one, therefore we don't need the swapList array. Also we stop as soon
as we have enough information to obtain the result, avoiding unnecessary
calls to the min_attacker function.
Speed tests (Windows 7), 20 runs for each engine:
Test engine: mean 866648, st. dev. 5964
Base engine: mean 846751, st. dev. 22846
Speedup: 1.023
Speed test by Stephane Nicolet
Fishtest STC test:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 26040 W: 4675 L: 4442 D: 16923
http://tests.stockfishchess.org/tests/view/57f648990ebc59038170fa03
No functional change.
This is a bit tricky because we don't want
to prune the only legal evasions, even if
with negative SEE. So add an assert to avoid
this subtle bug to slip in later.
STC:
LLR: 2.96 (-2.94,2.94) [0.00,4.00]
Total: 14140 W: 2625 L: 2421 D: 9094
LTC:
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 11558 W: 1555 L: 1379 D: 8624
bench: 5256717
Rename shift_bb() to shift(), and DELTA_S to SOUTH, etc.
to improve code readability, especially in evaluate.cpp
when they are used together:
old b = shift_bb<DELTA_S>(pos.pieces(PAWN))
new b = shift<SOUTH>(pos.pieces(PAWN))
While there fix some small code style issues.
No functional change.
Drop useless condition
abs(ttValue) < VALUE_KNOWN_WIN
And extend singular extension search to cases when ttValue
stores a mate score. This improves mate finding and does
not introduce any regression.
Yery tested this patch against current master on the 6500+
Chest mate suite with 200K fixed nodes:
shortest mates found: master: 1206 patch:1205
any mate found: master: 1903 patch: 2003
with 1 sec time:
shortest mates found: master: 2667 patch: 2628
any mate found: master: 3585 patch: 3646
Verified for no regression:
STC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 25655 W: 4578 L: 4465 D: 16612
LTC
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 66247 W: 8618 L: 8557 D: 49072
bench: 6335042
Both Tablebases::filter_root_moves() and
extract_ponder_from_tt(9 were unable to handle
a mate/stalemate position.
Spotted and reported by Dann Corbit.
Added some mate/stalemate positions to bench so
to early catch this regression in the future.
No functional change.
Use the following transformations:
- to check that A is included in B, testing "(A & ~B) == 0" is faster
than "(A & B) == A"
- to remove the intersection of A and B from A, doing "A &= ~B;" is as
fast as "if (A & B) A &= ~B;" but is simpler.
Overall, the simpler patch version is 0.3% than current master.
No functional change.
Correct pinners calculation and fix bug with pinned
pieces giving check. With this patch 'pinners' only
returns sliders with exactly one defensive piece between
the slider and the attacked square (in other words, pinners
returns exact pinners).
This was a co-operation between Marco Costalba,
Stphane Nicolet and me.
Special thanks to Ronald de Man for reporting the bug with
pinned pieces giving check, discussed here:
https://groups.google.com/forum/?fromgroups=#!topic/fishcooking/S_4E_Xs5HaE
STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 132118 W: 23578 L: 23645 D: 84895
LTC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 36424 W: 4770 L: 4670 D: 26984
bench: 6272231
Instrumentation shows that in make_score(mg, eg) calls, the mg value is
zero in 25,9% of the calls while the eg value is zero in 36,8% of the
calls.
Swapping the internal fields of mg and eg in the internal
representation of Score allows the compiler to optimize away the shift
in (eg << 16) + mg in more cases, thus resulting in a 0.3% speed-up
overall.
No functional change
Rescales the king danger variables in evaluate_king() to
suppress the KingDanger[] array. This avoids the cost of
the memory accesses to the array and simplifies the non-linear
transformation used.
Full credits to "hxim" for the seminal idea and implementation,
see pull request #786.
https://github.com/official-stockfish/Stockfish/pull/786
Passed STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 9649 W: 1829 L: 1689 D: 6131
Passed LTC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 53494 W: 7254 L: 7178 D: 39062
Bench: 6116200
Drops a scalability bottleneck due to memory contention
of a single shared table across threads. The effect starts
to be sensible with a high number of threads. Specifically
we have a small regression with 7 threads both at 60 and
180 seconds TC:
10000 @ 60+0.6 th 7
ELO: -2.46 +-3.2 (95%) LOS: 6.5%
Total: 9896 W: 1037 L: 1107 D: 7752
5000 @ 180+0.6 th 7
ELO: -1.95 +-4.1 (95%) LOS: 17.7%
Total: 5000 W: 444 L: 472 D: 4084
We have a regression because counterMoveHistory table is
quite big and it takes time for a single thread to fill it.
Sharing the table yields to a higher fill rate and better
quality of moves and up to 7 threads the benefits of sharing
more then compensate the loss in speed due to contention.
Interestingly even with a 3X longer TC, so with more time
for the single thread to catch up, the improvment is quite
limited and below noise level. It seems we really need much
longer TC to saturate the table.
When we move to high threads number it's another story:
5000 @ 60+0.6 th 22
ELO: 3.49 +-4.3 (95%) LOS: 94.6%
Total: 4880 W: 490 L: 441 D: 3949
2000 @ 60+0.6 th 32
ELO: 8.34 +-6.9 (95%) LOS: 99.1%
Total: 2000 W: 229 L: 181 D: 1590
As expected the speed-up more than compensates the filling
rate, and we expect that with tournament TC, where single
thread is able to saturate the table, the difference will
be even stronger. For instance for TCEC 9 super-final time
control will be 180 minutes + 15 seconds and this scalability
improvement seems definitely the way to go.
So, summarizing:
GOOD:
Measured big improvement in high core scenario
Suitable for TCEC 9 superfinal (big hardware, very long TC)
Consistent and natural patch that extends to counterMoveHistory
what we already do for remaining history tables, that are all per-thread
Non functional change for the common case of a single core
Very simple (just 6 lines modified, no added ones)
BAD:
Small regression (within 2-3 ELO) with few threads and short TC
bench: 5341477
Measured bench speed up goes from 0,7% to 2%,
given the unreliable measure a reverse simmplification
test was done on fishtest:
master vs patch
LLR: -2.94 (-2.94,2.94) [-3.00,1.00]
Total: 15499 W: 2685 L: 2867 D: 9947
Test result is positive, master is weaker.
No functional change.
This is the most compact and neatest version
is was able to produce.
On normal builds I have a small slowdown:
normal builds base vs. simplification (gcc 4.8.1 Win7-64 i7-3770 @ 3.4GHz x86-64-modern)
Results for 20 tests for each version:
Base Test Diff
Mean 1974744 1969333 5411
StDev 11825 10281 5874
p-value: 0,178
speedup: -0,003
On pgo-builds however I measure a nice 1.1% speedup
pgo-builds base vs. simplification
Results for 20 tests for each version:
Base Test Diff
Mean 1974119 1995444 -21325
StDev 8703 5717 4623
p-value: 1
speedup: 0,011
No functional change.
Don't allow pinned pieces to attack the exchange-square as long all
pinners (this includes also potential ones) are on their original
square.
As soon a pinner moves to the exchange-square or get captured on it, we
fall back to standard SEE behaviour.
This correctly handles the majority of cases with absolute pins.
bench: 6883133
In evaluate, we start by initializing the pos.psq_score
and adding the material imbalance. After that, we check
whether a specialized eval exists and if yes we return
that value and discard whatever we have computed until now.
It sounds more logical to first probe material entry and
return if we have a specialized eval, and only if it is
not the case initialize eval with some values. There is
no measurable speed-difference on my computer.
Non functional change.
In case we have installed a not complete set of 6-men tables and
there is 6 piece position on board, but no corresponding
tablebase engine is not using any syzygy at all.
Reported by Jouni Uski, fix by Peter Österlund,
confirmed as a bug by Ronald de Man.
bench: 7591630
If the opponent has a cramped position, opening a file often
helps him/her to exchange pieces, so it makes sense to reduce
the space bonus if there are open files.
Credits: Leonardo Ljubičić for the strategic idea, Alain Savard for the
implementation of the open files calculation, "CrunchyNYC" for the
compensation of the numerator.
STC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 49112 W: 9239 L: 8900 D: 30973
LTC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 89415 W: 12014 L: 11601 D: 65800
Bench: 7591630