This L1/L2 optimization has an incredible +4.7% speedup
in perft test where this function is the most time consumer.
Verified a speed up also in normal bench, although smaller.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
And also store the node counter in Position and not in Thread.
This will allow to properly count nodes also in sub trees with
SMP active.
This requires a surprisingly high number of changes
in a lot of places to make it work properly.
No functional change but node count changed for obvious reasons.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Fix warning: "Source and destination overlap in memcpy"
This happens when we call multiple time do_move() with the
same state, for instance when we don't need to undo the move.
This is what valgrind docs say:
You don't want the two blocks to overlap because one of them could
get partially overwritten by the copying.
You might think that Memcheck is being overly pedantic reporting this
in the case where 'dst' is less than 'src'. For example, the obvious way
to implement memcpy() is by copying from the first byte to the last.
However, the optimisation guides of some architectures recommend copying
from the last byte down to the first. Also, some implementations of
memcpy() zero 'dst' before copying, because zeroing the destination's
cache line(s) can improve performance.
In addition, for many of these functions, the POSIX standards have wording
along the lines "If copying takes place between objects that overlap,
the behavior is undefined." Hence overlapping copies violate the standard.
The moral of the story is: if you want to write truly portable code, don't
make any assumptions about the language implementation.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Mostly suggested by Justin (UncombedCoconut), the 0ULL -> 0 conversion
is mine.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Get rid of macros and use templates instead,
this is safer and allows us fix the warning:
ISO C++ forbids braced-groups within expressions
That broke compilation with -pedantic flag under
gcc and POPCNT enabled.
No functional and no performance change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Greatly cleanup SEE code and now it is also a bit
faster on gcc, about +0.6%.
Thanks to Mike Whiteley new SEE code that gave me
fresh ideas on how to cleanup this old stuff.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
We don't need that !
We can infere from starting fen string if we are in
a Chess960 game or not. And note that this is a per-position
property, not an application wide one.
A nice trick is to use a custom manipulator (that is an
enum actually) to keep using the handy operator<<() on the
move when sending to std::cout, yes, I have indulged a
bit here ;-)
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Plus a bunch of other minor optimizations.
With this power pack we have an increase
of a whopping 1.4% :-)
...and it took 3 good hours of profiling + hacking to get it out !
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Where they belong.
Note that array PieceValueMidgame[] and PieceValueEndgame[]
are now declared extern in the header and moved in piece.cpp
so to avoid allocate the array each time the header is
included !
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Doing the conversion the compiler is now able to
spot two possible ambiguity calls that now we can
easily fix.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
There is a functional change because we now skip
more moves and because do_move() / undo_move() is
well known to be not reversible we end up with a
change in node count, although there is actually
no change but a bit speed up.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
To be uniform across the sources. As a nice side effect
I quickly spotted a couple of needed renames:
captured_piece() -> captured_piece_type()
st->capture -> st->capturedType
Proposed by Ralph and done with QtCreator
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Broken by recent patch. Also better document what's
happening there.
Verified to restore original behaviour.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Less invasive then previous patches, but still a good
enhancement.
Also some indulge on STL algorithms :-)
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Complete rewrite the function and extend compatibility
also to X-FEN notation for Chess960.
We are now able to read standard FEN, Shredder-FEN and X-FEN.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Using multiple threads and good opening book is
much better and more reliable source of randomness than
spoiling psqt-tables
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
This code is platform specific and has nothing to
do with TT class, so move to misc.cpp
This patch is a prerequisite to use extend prefetch use
also to other hash tables apart from Transposition Table.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Search ply and game ply are rwo different things !
Revert bogus commit.
No functional change on bench, but it changes in real games
when engine sends all the moves up to current one.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
This is the best place because when we split we do a
copy of the position and also threadID, once set in a
given position, never changes anymore.
Forbid use of Position's default and copy c'tor to avoid
nasty bugs in case a position is created without explicitly
setting the threadID.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
And avoid a redundant one passed as argument in
search calls.
Also renamed gamePly in ply to better clarify this
is used as search ply and is set to zero at the
beginning of the search.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
And remove from main do_move() flow. Just a small speedup
because we avoid two branches in the common case.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
The index at 0 was reserved for no-pieces
information. But we don't need that.
This is a prerequisite for next patch.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
They are both 64 bits unsigned integer, but it
is correct to use the proper type.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Array castleRightsMask[] is not static because it can
be different for different positions, so let it be
a Position member data. This allows to remove tricky
hacks to take in account that although it was defined
static it could change.
Theoretically now copying a position is a bit slower because
we need to copy also an array of 64 integers, but because in
split() we don't copy the position anymore, but just keep the
pointer, the added burden is not mesurable even in MP case.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
It will be used by future patches and also rearranges some
half cooked code that mistakenly ended up in master in the
past.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>