Easier for tuning psq tables:
TUNE(myParameters, PSQT::init);
Also move PSQT code in a new *.cpp file, and retire the
old and hacky psqtab.h that required to be included only
once to work correctly, this is not idiomatic for a header
file.
Give wide visibility to psq tables (previously visible only
in position.cpp), this will easy the use of psq tables outside
Position, for instance in move ordering.
Finally trivial code style fixes of the latest patches.
Original patch of Lucas Braesch.
No functional change.
Currently Zobrist::castling[] are not properly zeroed
and rely on the compiler to do this at startup, but this
makes Position::init() to set different values every time
it is called!
This is a bit odd, and although not impacting normal usage,
can yield to subtle misbehaviour, very difficult to track
down, in case we happen to call it more than once for some
reason. I found this while developing tuning support and
it took me a while to track it down.
So properly init Zobrist::castling[]
No functional change.
Resolves#329
This micro-optimization only complicates the code and provides no benefit.
Removing it is even a speedup on my machine (i7-3770k, linux, gcc 4.9.1):
stat test master diff
mean 2,403,118 2,390,904 12,214
stdev 12,043 10,620 3,677
speedup 0.51%
P(speedup>0) 100.0%
No functional change.
Also remove useless StateCopySize64 optimization:
compiler uses SSE movups instruction anyhow and
does not need this trick (verified with fishbench).
No functional change.
Funny enough, gcc __builtin_prefetch() expects
already a void*, instead Windows's _mm_prefetch()
requires a char*.
The patch allows to remove ugly casts from caller
sites.
No functional change.
Results for 10 tests for each version (gcc 4.8.3 on mingw):
Base Test Diff
Mean 1502447 1507917 -5470
StDev 3119 1364 4153
p-value: 0,906
speedup: 0,004
Results for 10 tests for each version (MSVC 2013):
Base Test Diff
Mean 1400899 1403713 -2814
StDev 1273 2804 2700
p-value: 0,851
speedup: 0,002
No functional change.
It is up to material (and pawn) table look up
code to know where the per-thread tables are,
so change API to reflect this.
Also some comment fixing while there
No functional change.
This patch replaces RKISS by a simpler and faster PRNG, xorshift64* proposed
by S. Vigna (2014). It is extremely simple, has a large enough period for
Stockfish's needs (2^64), requires no warming-up (allowing such code to be
removed), and offers slightly better randomness than MT19937.
Paper: http://xorshift.di.unimi.it/
Reference source code (public domain):
http://xorshift.di.unimi.it/xorshift64star.c
The patch also simplifies how init_magics() searches for magics:
- Old logic: seed the PRNG always with the same seed,
then use optimized bit rotations to tailor the RNG sequence per rank.
- New logic: seed the PRNG with an optimized seed per rank.
This has two advantages:
1. Less code and less computation to perform during magics search (not ROTL).
2. More choices for random sequence tuning. The old logic only let us choose
from 4096 bit rotation pairs. With the new one, we can look for the best seeds
among 2^64 values. Indeed, the set of seeds[][] provided in the patch reduces
the effort needed to find the magics:
64-bit SF:
Old logic -> 5,783,789 rand64() calls needed to find the magics
New logic -> 4,420,086 calls
32-bit SF:
Old logic -> 2,175,518 calls
New logic -> 1,895,955 calls
In the 64-bit case, init_magics() take 25 ms less to complete (Intel Core i5).
Finally, when playing with strength handicap, non-determinism is achieved
by setting the seed of the static RNG only once. Afterwards, there is no need
to skip output values.
The bench only changes because the Zobrist keys are now different (since they
are random numbers straight out of the PRNG).
The RNG seed has been carefully chosen so that the
resulting Zobrist keys are particularly well-behaved:
1. All triplets of XORed keys are unique, implying that it
would take at least 7 keys to find a 64-bit collision
(test suggested by ceebo)
2. All pairs of XORed keys are unique modulo 2^32
3. The cardinality of { (key1 ^ key2) >> 48 } is as close
as possible to the maximum (65536)
Point 2 aims at ensuring a good distribution among the bits
that determine an TT entry's cluster, likewise point 3
among the bits that form the TT entry's key16 inside a
cluster.
Details:
Bitset card(key1^key2)
------ ---------------
RKISS
key16 64894 = 99.020% of theoretical maximum
low18 180117 = 99.293%
low32 305362 = 99.997%
Xorshift64*, old seed
key16 64918 = 99.057%
low18 179994 = 99.225%
low32 305350 = 99.993%
Xorshift64*, new seed
key16 65027 = 99.223%
low18 181118 = 99.845%
low32 305371 = 100.000%
Bench: 9324905
Resolves#148
This is a regression from 428962a
We have to cast to char here, otherwise the compiler
interprets it as an integer, and writes a number.
No functional change
Resolves#122
Objects that are only accessible at file-scope should be put in the anonymous namespace.
This is what the C++ standard recommends, rather than using static, which is really C-style and results in static linkage.
Stockfish already does this throughout the code. So let's weed out the few exceptions,
because... they have no reason to be exceptional.
No functional change.
Resolves#84
It is more idiomatic, we didn't used it
in the past because Position::pretty(Move)
had a calling argument, but now we can.
As an added benefit, we avoid a lot of string
copies in the process because now we avoid
std::ostringstream ss.
No functional change.
Where they better belong.
Also, this removes '#include <string>' from types.h, which reduces the amount of code to compile (every
translation unit includes types.h).
No functional change.
The eval already returns zero in KK, KBK, KNK (see material.cpp). The difference is:
- we lose the "TB pruning" benefit of the draw rule (ie. search goes on even if eval is zero)
- we gain some speed by removing a useless test from the hot path
STC:
LLR: 0.05 (-2.94,2.94) [-3.00,1.00]
Total: 128000 W: 21357 L: 21560 D: 85083
LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 33023 W: 4613 L: 4509 D: 23901
bench 7461881
First, remove some dead code (function never called with a Move argument).
Then, remove printing of legal moves, which does not belong here. Let's keep commands orthogonal and minimal:
- the "d" command should display the board, nothing more, or less.
- "perft 1" will display the list of legal moves.
No functional change.
This apparentely silly tweak allows
to speed up the bench by almost 3%.
Not clear why, repeating with perft,
the speed up vanishes.
Suggested by Jonathan Calovski.
No functional change.