But this time with the guarantee of an always aligned
access so that prefetching is not adversely impacted.
On Joona PC
1+0, 64Mb hash:
Orig - Mod: 174 - 237 - 359
Instead after 1000 games at 1+0 with 128MB hash size
we are at + 1 ELO (just 4 games of difference).
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
After fixing the cpu frequency with RightMark tool I was
able to test speed all the different prefetch combinations.
Here the results:
OS Windows Vista 32bit, MSVC compile
CPU Intecl Core 2 Duo T5220 1.55 GHz
bench on depth 12, 1 thread, 26552844 nodes searched
results in nodes/sec
no-prefetch
402486, 402005, 402767, 401439, 403060
single prefetch (aligned 64)
410145, 409159, 408078, 410443, 409652
double prefetch (aligned 64) 0+32
414739, 411238, 413937, 414641, 413834
double prefetch (aligned 64) 0+64
413537, 414337, 413537, 414842, 414240
And now also some crazy stuff:
single prefetch (aligned 128)
410145, 407395, 406230, 410050, 409949
double prefetch (aligned 64) 0+0
409753, 410044, 409456
single prefetch (aligned 64) +32
408379, 408272, 406809
single prefetch (aligned 64) +64
408279, 409059, 407395
So it seems the best is a double prefetch at the addres + 32 or +64,
I will choose the second one because it seems more natural to me.
It is still a mystery why it doesn't work under Linux :-(
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Prefetch always form a chache line boundary. It seems
that if prefetch address is not cache line aligned then
performance is adversely impacted.
Hopefully we will resuse that 32 bits of padding for something
useful in the future.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
This fix a compile error under Linux with gcc when
there aren't the intel dev libraries.
Also simplify the previous patch moving TT definition
from search.cpp to tt.cpp so to avoid using passing a
pointer to TT to the current position.
Finally simplify do_move(), now we miss a prefetch in the
rare case of setting an en-passant square but code is
much cleaner and performance penalty is almost zero.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Move prefetching code inside do_move() so to allow a
very early prefetching and to put as many instructions
as possible between prefetching and following retrieve().
With this patch retrieve() times are cutted of another 25%
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
TT.retrieve() is the most time consuming function
because almost always involves a very slow RAM access.
TT table is so big that is never cached. This patch
prefetches TT data just after a move is done, so that
subsequent TT.retrieve will be very fast.
Profiling with VTune shows that TT:retrieve() times are
almost cutted in half !
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Shrink key to 32 bits instead of 64. To still avoid
collisions use the high 32 bits of position key as the
TT key and the low 32 bits to retrieve the correct
cluster index in the table.
With this patch size og TTentry shrinks to 96 bits instead
of 128 and the cluster of 4 TTEntry sums to 48 bytes instead
of 64.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
previous used move_is_legal to verify that the move from the TT
was legal, and the old version of move_is_legal only works when
the side to move is not in check. Fixed this by adding a separate,
slower version of move_is_legal which works even when the side to
move is in check.
Centralize in a single object all the global resources
management and avoid a bunch of sparse exit() calls.
This is more reliable and clean and more stick to C++ coding
practices.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Instead of a position because the key is all that we
need.
Interface is more clear and also very very little bit faster.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Max hash size is 4096 MB, not 1024 MB, see the corresponding
"Hash" UCI parameter in ucioption.cpp
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Strangely enough it seems that optimization doesn't work.
After 760 games at 1+0: +155 -184 =421 -13 ELO
Probably the overhead, although small, for setting the flag
is not compensated by the saved evaluation call.
This could be due to the fact that after a TT value is stored,
if and when we hit the position again the stored TT value is
actually used as a cut-off so that we don't need to go on
with another search and evaluation is avoided in any case.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
If we want to store a value of type VALUE_TYPE_EVAL for
a given position and we found an already exsisting entry
for the same position then we skip.
We don't want to overwrite a more valuable score with a
lesser one. Note that also in case the exsisting entry is
of VALUE_TYPE_EVAL type the overwrite is unuseful because
we would store the same score again.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
When the stored TT value equals the static value set a
proper flag so to not call evaluation() if we hit the
same position again but use the stored TT value instead.
This is another trick to avoid calling costly evaluation()
in qsearch.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Fix a bug in the way a move is stored and read in a TT entry.
We use a mask of 19 bits insteaad of 17 so that the last
two bits in the TT entry end up to be random data.
This bug will bite us when we will use these two until now
unused bits.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Instead of just drop evaluation score after stand pat
logic save it in TT so to be reused if the same position
occurs again.
Note that we NEVER use the cached value apart to avoid an
evaluation call, in particulary we never return to caller
after a succesful tt hit.
To accomodate this a new value type VALUE_TYPE_EVAL has been
introduced so that ok_to_use_TT() always returns false.
With this patch we cut about 15% of total evaluation calls.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
We implicitly considered the minimum depth stored in TT
to be Depth(0), but because we store values in TT also in
qsearch() where depth is < 0, we need to use a negative
number as minimum depth.
Bug spotted by Joona Kiiski.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Note that some pawns and material info has been switched
to int from int8_t.
This is a waste of space but it is not clear if we have a
faster or slower code (or nothing changed), some test should be
needed.
Few warnings still are alive.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
We don't backup anymore but use the renamed StateInfo
argument passed in do_move() to store the new position
state when doing a move.
Backup is now just revert to previous StateInfo that we know
because we store a pointer to it.
Note that now backing store is up to the caller, Position is
stateless in that regard, state is accessed through a pointer.
This patch will let us remove all the backup/restore copying,
just a pointer switch is now necessary.
Note that do_null_move() still uses StateInfo as backup.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Make the logic work as advertised in the function
description.
Still a fallback from TT cleanup.
This should be less serious then the one in retrieve(),
but it's still a bug.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>