TCEC season 3, which is due to start in a few weeks, just
had its server upgraded to 64GB RAM and will therefore allow
16GB hash to be used per engine.
This is almost the upper limit without changing the
type of size and hashMask. After this we need to
move to uint64_t instead of uint32_t.
No functional change.
1/ eval margin and gains removed:
16bit are now free on TT entries, due to the removal of eval margin. may be useful
in the future :) gains removed: use instead by Value(128). search() and qsearch()
are now consistent in this regard.
2/ futility_margin()
linear formula instead of complex (log(depth), movecount) formula.
3/ unify pre & post futility pruning
pre futility pruning used depth < 7 plies, while post futility pruning used
depth < 4 plies. Now it's always depth < 7.
Tested with fixed number of games both at short TC:
ELO: 0.82 +-2.1 (95%) LOS: 77.3%
Total: 40000 W: 7939 L: 7845 D: 24216
And long TC
ELO: 0.59 +-2.0 (95%) LOS: 71.9%
Total: 40000 W: 6876 L: 6808 D: 26316
bench 7243575
1/ eval margin and gains removed:
- gains removed by Value(128): search() and qsearch() now behave consistently!
2/ futility_margin()
- testing showed that there is no added value in this weird (log(depth), movecount)
formula, and a much simpler linear formula is just as good. In fact, it is most
likely better, as it is not yet optimally tuned.
- the new simplified formula also means we get rid of FutilityMargins[], its
initialization code, and more importantly ss->futilityMoveCount, and the hacky
code that updates it throughout the search().
- the current formula gives negative futility margins, and there is a hidden interaction
between the move coutn pruning formula and the futility margin one: what happens is
that MCP is supposed to be triggered before we use the non-sensical negative futility
margins.
3/ unify pre & post futility pruning
- pre futility pruning (what SF calls value based pruning) used depth < 7 plies,
while post futility pruning (what SF calls static null move pruning) used depth < 4 plies.
- also the condition depth < 7 in pre futility pruning was not obvious, and it seemd
to be depth < 16 (futility_margin() returns an infinite value when depth >= 7).
Tested with fixed number of games both at short TC:
ELO: 0.82 +-2.1 (95%) LOS: 77.3%
Total: 40000 W: 7939 L: 7845 D: 24216
And long TC
ELO: 0.59 +-2.0 (95%) LOS: 71.9%
Total: 40000 W: 6876 L: 6808 D: 26316
bench: 10206576
And #ifdef instead of #if defined
This is more standard form (see for example iostream file).
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Function calloc() already initializes memory to
zero, so avoid calling clear() afterwards.
Also some renaming while there (inspired by DiscoCheck).
No functional change.
But this time do not play with pointers, in
particular do not assume that size_t is an
unsigned type of the same width as pointers.
This code should be fully portable.
No functional change.
Let TT clusters (16*4=64 bytes) to hold on a singe cache line.
This avoids the need for the double prefetch.
Original patches by Lucas and Jean-Francois that has also tested
on his AMD FX:
BIG HASHTABLE
./stockfish bench 1024 1 18 > /dev/null
Before:
1437642 nps
1426519 nps
1438493 nps
After:
1474482 nps
1476375 nps
1475877 nps
SMALL HASHTABLE
./stockfish bench 128 1 18 > /dev/null
Before:
1435207 nps
1435586 nps
1433741 nps
After:
1479143 nps
1471042 nps
1472286 nps
No functional change.
We can encode the ClusterSize directly in the
hashMask, this allows to skip the left shift.
There is no real change, but bench number is now
different because instead of using the lowest order
bits of the key to index the start of the cluster,
now we don't use the last two lsb bits that are
always set to zero (cluster size is 4). So for
instance, if 10 bits are used to index the cluster,
instead of bits [9..0] now we use bits [11..2].
This changes the positions that end up in the same
cluster affecting TT hits and so bench is different.
Also some renaming while there.
bench: 5383795
Do a 32bit bitwise 'and' instead of a 64bit
subtract and bitwise 'and'.
This is possible because even in the biggest
hash table case (8GB) the number of entries
is 2^29 so storable in an unsigned int.
No functional change.
And return on using TT as backing store for position
evaluations.
Tests (even on single thread) show eval cache was a regression.
In multi thread result should be even worst because eval cache
is a per-thread struct, while TT is shared.
After 4957 games at 15"+0.05 (single thread)
eval cache vs master 969 - 1093 - 2895 -9 ELO
So previous reported result of +18 ELO was probably due to an
issue in the testing framework (a bug in cutechess-cli) that
has been fixed in the meanwhile.
bench: 5386711
Test by Joona prooves the new feature don't value 70 added lines of code.
Grand totals after 10040 games (crashes: 0) for tt_both
master_9edc7 - 6a93488_6a934: 1756 - 1688 - 6596 ELO +2 (+- 2.7)
Confirmed by test of Gary:
After 8680 games:
ELO: 0.80 +- 99%: 9.62 95%: 7.31
LOS: 65.38%
Wins: 1288 Losses: 1268 Draws: 6130
Thanks a lot to both for testing it !!!
bench 5149248
This is more complex than what I'd like but I
was unable to split in small chunks.
Here we add 2 slots to TTEntry (valueUpper and depthUpper)
so that sizeof(TTEntry) returns to the original 16 bytes
and we can pack exactly 4 entries in a 64 bytes cache line.
Now we save an upper bound score alongside a lower (exact)
score. The idea is to increase TT cut-offs rates becuase
there is now an higher probability for a node to use TT info.
This patch is highly experimental and probably needs further
steps as is hinted by an unrealistic bench number:
bench: 2022385
Allows some code semplification and avoids directly
allocation and managing heap memory.
Also the usual renaming while there.
No functional change and no speed regression.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
And add final touches to this long patch series.
All the series has been verified against regression with
20K games at fast TC.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Associate platform OS thread to the Thread class instead of
creating it from ThreadsManager.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
This prevent crashing on mobile devices with limited RAM,
currently with MAX_THREADS = 32 we would need 44MB that
could be too much for a poor cellphone.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
This change allows to remove some quite a bit of code
and seems the natural thing to do.
Introduced file thread.cpp to move away from search.cpp a lot
of threads related stuff.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
The only interesting thing is that a backward or isolated
pawn cannot be a candidate passer, so code this condition.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
If size_t is defined as a 32 bit quanitity then we have an
overflow in the left term of the while condition if mbSize
is bigger then 2048.
For instance if mbSize is 2049 then when newSize will reach
0x80000000 (2048MB) comparison is still true, 'while' loops
again and we have an overflow in the expression (2*newSize)
so that result is 0 and at that point 'while' keeps looping
forever hanging the application.
This patch fixes the bug and also makes operator new do not
throw an exception upon failure but return a NULL pointer
instead.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Shrink of 1 bit so to fit a move in an uint_16 and
possibly a MoveStack in an uint_32.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
A bit faster on 32 bits machines, more similar to
TranspositionTable::first_entry() and same result.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
It seems HP's ANSI C++ doesn't understand very well
standard function-style cast.
Reported by Richard Lloyd.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
When we store this value in TT we cut this to 9 bits,
so we need a smaller variable otherwise comparisons
like:
replace->generation() == generation
Are always false if generation is bigger then the maximum
TT storable value.
This fixes a very nasty and difficult to spot bug (2 weeks for
regression hunting).
Signed-off-by: Marco Costalba <mcostalba@gmail.com>