mirror of
https://github.com/sockspls/badfish
synced 2025-07-11 19:49:14 +00:00
Double prefetch on Windows
After fixing the cpu frequency with RightMark tool I was able to test speed all the different prefetch combinations. Here the results: OS Windows Vista 32bit, MSVC compile CPU Intecl Core 2 Duo T5220 1.55 GHz bench on depth 12, 1 thread, 26552844 nodes searched results in nodes/sec no-prefetch 402486, 402005, 402767, 401439, 403060 single prefetch (aligned 64) 410145, 409159, 408078, 410443, 409652 double prefetch (aligned 64) 0+32 414739, 411238, 413937, 414641, 413834 double prefetch (aligned 64) 0+64 413537, 414337, 413537, 414842, 414240 And now also some crazy stuff: single prefetch (aligned 128) 410145, 407395, 406230, 410050, 409949 double prefetch (aligned 64) 0+0 409753, 410044, 409456 single prefetch (aligned 64) +32 408379, 408272, 406809 single prefetch (aligned 64) +64 408279, 409059, 407395 So it seems the best is a double prefetch at the addres + 32 or +64, I will choose the second one because it seems more natural to me. It is still a mystery why it doesn't work under Linux :-( Signed-off-by: Marco Costalba <mcostalba@gmail.com>
This commit is contained in:
parent
f4140ecc0c
commit
8d369600ec
1 changed files with 4 additions and 2 deletions
|
@ -174,12 +174,14 @@ TTEntry* TranspositionTable::retrieve(const Key posKey) const {
|
||||||
/// blocking function and do not stalls the CPU waiting for data
|
/// blocking function and do not stalls the CPU waiting for data
|
||||||
/// to be loaded from RAM, that can be very slow. When we will
|
/// to be loaded from RAM, that can be very slow. When we will
|
||||||
/// subsequently call retrieve() the TT data will be already
|
/// subsequently call retrieve() the TT data will be already
|
||||||
/// quickly accessible in L1/l2 CPU cache.
|
/// quickly accessible in L1/L2 CPU cache.
|
||||||
|
|
||||||
void TranspositionTable::prefetch(const Key posKey) const {
|
void TranspositionTable::prefetch(const Key posKey) const {
|
||||||
|
|
||||||
#if defined(_MSC_VER)
|
#if defined(_MSC_VER)
|
||||||
_mm_prefetch((char*)first_entry(posKey), _MM_HINT_T0);
|
char* addr = (char*)first_entry(posKey);
|
||||||
|
_mm_prefetch(addr, _MM_HINT_T0);
|
||||||
|
_mm_prefetch(addr+64, _MM_HINT_T0);
|
||||||
#else
|
#else
|
||||||
// We need to force an asm volatile here because gcc builtin
|
// We need to force an asm volatile here because gcc builtin
|
||||||
// is optimized away by Intel compiler.
|
// is optimized away by Intel compiler.
|
||||||
|
|
Loading…
Add table
Reference in a new issue