There are two concepts with this patch:
Limit check extensions by using move count.
The idea is to limit search explosion.
Always extend check if the first move gives check.
The idea is to save expensive SEE calls, since the vast
majority of first move will have SEE value >= 0, also
first move may still be strong even if the SEE is negative.
STC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 16503 W: 3068 L: 2873 D: 10562
LTC:
LLR: 2.97 (-2.94,2.94) [0.00,5.00]
Total: 37202 W: 5261 L: 5014 D: 26927
bench: 8543366
Moving a few lines from evaluate_threats to evaluate_pieces allows to
a) Remove a condition pos.count<QUEEN>(Them) == 1
b) use precalculated s instead of pos.square(Them)
c) do not check the condition at all in queenless endings
Passed STC
http://tests.stockfishchess.org/tests/view/5752e0410ebc59029919b1f4
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 67175 W: 12194 L: 12152 D: 42829
and LTC
http://tests.stockfishchess.org/tests/view/57587b140ebc59029919b2f4
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 20276 W: 2774 L: 2653 D: 14849
bench: 7907962
In this position: 3K4/8/3k4/8/4p3/4B3/5P2/8 w - - 0 5
Current DTZ probe returns 1 instead of 15
What happens is that the double push f4 is erroneously detected as a win move.
After the push we have:
[D]3K4/8/3k4/8/4pP2/4B3/8/8 b - f3 0 5
And here the code misses the possible ep capture exf3.
The bug is in probe_dtz_no_ep() where is used probe_ab() that is
blind to ep captures so it returns v == 2 (win) for position
3K4/8/3k4/8/4pP2/4B3/8/8 b - f3 0 5
Note that at the caller site the original position did not have any
possible ep capture, so probe_dtz() returns immediately after calling
probe_dtz_no_ep().
The fix is to call the ep-aware probe_wdl() instead of probe_ab()
I have verified that DTZ is correct now and also there are no more
mistmatches compared to the new 'syzygy' branch. Tested on a set of
more than 600 endgame positions, included some tricky ones.
For people interested to redo the test or doing additional tests
please pull branch tb_dbg from https://github.com/mcostalba/Stockfish repo.
bench: 8450534 (bench unaffected because syzygy is not exercized during bench)
A middle ground patch of two successful tuning patches,
one at STC, the other at LTC, which now passed both.
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 67893 W: 12777 L: 12384 D: 42732
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 30165 W: 4189 L: 3960 D: 22016
bench: 9209507
Introducing a new multi-purpose penalty related to King safety, which
includes all kind of potential checks (from unsafe or unavailable
squares currently occupied by some other piece)
This will indirectly detect and reward some pins, discovered checks, and
motifs such as square vacation, or rook behind its pawn and aligned with
King (example Black Rg8, g7 against Kg1),
and penalize some pawn blockers (if they move, it allows a discovered
check by the pawn).
And since it looks also at protected squares, it detects some potential
defense overloading.
Finally, the rook contact checks had been removed some time ago. This
test will give a small bonus for them, as well as for bishop contact
checks.
Passed STC
http://tests.stockfishchess.org/tests/view/5729ec740ebc59301a354b36
LLR: 2.94 (-2.94,2.94) [0.00,5.00]
Total: 13306 W: 2477 L: 2296 D: 8533
and LTC
http://tests.stockfishchess.org/tests/view/572a5be00ebc59301a354b65
LLR: 2.97 (-2.94,2.94) [0.00,5.00]
Total: 20369 W: 2750 L: 2565 D: 15054
bench: 9298175
Just use _mm_popcnt_u64() that is available
both for MSVC abd Intel compiler.
Verified on MSVC that the produced assembly
has the hardware 'popcnt' instruction.
No functional change.
Currently, helper threads will only search up to the
specified depth limit. Now let them search until the
main thread has finished the specified depth.
On the other hand, we don't want to pick a thread with
a higher search depth.
This may be considered cheating. ;-)
No functional change.
It seems that icc used our fallback version of popcount.
Now use intrinsics.
icc version 16.0.2 (gcc version 5.3.0 compatibility)
bmi2 compile
uname -r 4.5.1-1-ARCH
20xbench gives a nice speedup
./stockfish-icc-master 2161515 +- 34462
./stockfish-icc-sse42 2260857 +- 50349
I could not find anything documented that is necessary that prepending -mbmi to -mbmi2 gives some benefit.
Instead at
https://gcc.gnu.org/onlinedocs/gcc/x86-Built-in-Functions.html#x86-Built-in-Functions
The following built-in functions are available when -mbmi is used. All of them generate the machine instruction that is part of the name.
unsigned int __builtin_ia32_bextr_u32(unsigned int, unsigned int);
unsigned long long __builtin_ia32_bextr_u64 (unsigned long long, unsigned long long);
The following built-in functions are available when -mbmi2 is used. All of them generate the machine instruction that is part of the name.
unsigned int _bzhi_u32 (unsigned int, unsigned int)
unsigned int _pdep_u32 (unsigned int, unsigned int)
unsigned int _pext_u32 (unsigned int, unsigned int)
unsigned long long _bzhi_u64 (unsigned long long, unsigned long long)
unsigned long long _pdep_u64 (unsigned long long, unsigned long long)
unsigned long long _pext_u64 (unsigned long long, unsigned long long)
and at
https://gcc.gnu.org/ml/gcc/2014-02/msg00204.html
( "... The real optimization comes from being able to use pext
(parallel bit extract), which can implement several bextr expressions in
parallel.")
Apart from that we don't use all -msse -msse2 -msse3 -msse4.2 etc. but just -msse3 (or -msse4.2) only.
As regards to the speedup within noise level - this pull request is actually reversal of mcostalba#198 wherein prepending -mbmi to -mbmi2 was claimed to be 0.3% faster and here (removing -mbmi) gives 0.4% speed gain.
Seems to give around 1% speed-up for CPUs with popcnt support.
Seems to give a very minor speed-up for CPUs without popcnt.
No functional change
Resolves#646
In this position we should have draw for repetition:
position fen rnbqkbnr/2pppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1 moves g1f3 g8f6 f3g1
go infinite
But latest patch broke it.
Actually we had two(!) very subtle bugs, the first is that Position::set()
clears the passed state and in particular 'previous' member, so
that on passing setupStates, 'previous' pointer was reset.
Second bug is even more subtle: SetupStates was based on std::vector
as container, but when vector grows, std::vector copies all its contents
to a new location invalidating all references to its entries. Because
all StateInfo records are linked by 'previous' pointer, this made pointers
go stale upon adding more element to setupStates. So revert to use a
std::deque that ensures references are preserved when pushing back new
elements.
No functional change.
And passed in do_move(), this ensures maximum efficiency and
speed and at the same time unlimited move numbers.
The draw back is that to handle Position init we need to
reserve a StateInfo inside Position itself and use at
init time and when copying from another Position.
After lazy SMP we don't need anymore this gimmick and we can
get rid of this special case and always pass an external
StateInfo to Position object.
Also rewritten and simplified Position constructors.
Verified it does not regress with a 3 threads SMP test:
ELO: -0.00 +-12.7 (95%) LOS: 50.0%
Total: 1000 W: 173 L: 173 D: 654
No functional change.
When starting search in a mate or stalemate position, Stockfish does not
even care to reinitialize and start worker threads. However after search
all threads are checked for the best move.
This can lead to bestmove and info beeing carried over from the last
search.
Example session:
setoption name threads value 7
go movetime 4000
position startpos moves f2f3 e7e5 g2g4 d8h4
go movetime 4000
Actual output is like (almost always):
[...]
bestmove e2e4
info depth 0 score mate 0
info depth 20 seldepth 29 multipv 1 score cp 28 [...] pv e2e4
bestmove e2e4
Expected output / output after fix:
[...]
bestmove e2e4 ponder e7e6
info depth 0 score mate 0
bestmove (none)
Resolves#623
Broken after "32-bit/64-bit Makefile fix" commit.
Ubuntu "Precise" 12.04.5 supports multilib only until
g++ 4.6 that is not enough to compile Stockfish.
So move to Ubuntu 14.04.4 LTS (Trusty Tahr)
No functional change.
There was already a penalty for squares only defended by King (undefended)
This test records a penalty for completely undefended squares in the so called extended king-ring
(so if we exclude squares defended by a Kg8 for example, we only look at h6 g6 and f6)
We also exclude squares occupied by opponent pieces in this computation,
based on the following results
Was yellow at STC
LLR: -2.97 (-2.94,2.94) [0.00,5.00]
Total: 112499 W: 20649 L: 20293 D: 71557
and passed LTC
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 36805 W: 5100 L: 4857 D: 26848
Bench: 8430233
Resolves: #619
On top of the usual conditions
a) some opponent in front (but no lever)
b) some neighbours (in front) (but no neighbour behind or same rank)
c) < rank_5
to find out if a pawn is backward we look at the squares in front of this pawn to reach the same rank as the next neighbour.
In current master, a pawn is backward if any of those squares is controlled by an enemy pawn on an adjacent file
In this version, a pawn is ALSO backward if any of those squares is occupied by an enemy pawn.
STC:
http://tests.stockfishchess.org/tests/view/56fe7efd0ebc59301a3541f1
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 19051 W: 3557 L: 3433 D: 12061
LTC:
http://tests.stockfishchess.org/tests/view/56febc2d0ebc59301a354209
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 40810 W: 5619 L: 5526 D: 29665
Bench: 7525245
Resolves#614
Also a speedup(about 1%) on 64-bit w/o hardware popcnt
Retire Max15 and Full template parameters
(Contributed by Marco Costalba)
Now that we have just SW and HW versions, use
template default parameter to get rid of explicit
template parameters.
Retire bitcount.h and move the only defined
function to bitboard.h
No functional change
Resolves#620
Counter intuitively, make build ARCH=x86-32 does NOT produce a 32-bit compile
when running a 64-bit OS. Nor would ARCH=x86-64 produce a 64-bit compile when
running a 32-bit OS (assuming it compiled w/o errors).
No functional change
Resolves#621
lsb(b) and msb(b) are undefined when b == 0. This can lead to subtle bugs, where
the resulting code behaves differently on different configurations:
- It can be the home grown software LSB/MSB
- It can be the compiler generated software LSB/MSB (when using compiler
intrinsics without the right compiler flags to allow compiler to use hardware
LSB/MSB). Which of course depends on the compiler.
- It can be hardware LSB/MSB generated by the compiler.
- Not to mention that hardware LSB/MSB can return different value on different
hardware when b == 0.
No functional change
Resolves#610
Use compiler intrinsics when possible to
avoid writing platform specific asm code.
Tested on Windows 7 with MSVC 2013 and mingw 4.8.3 (32 and 64 bit)
and on Linux Mint with g++ 4.8.4 and clang 3.4 (32 and 64 bit).
No functional change
Resolves#609
Give more weight to the pawns number and
the vertical king distance in evaluate_initiative()
Passed STC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 26729 W: 5067 L: 4825 D: 16837
and LTC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 60480 W: 8338 L: 8016 D: 44126
Bench: 8295162
Resolves#594