TCEC season 3, which is due to start in a few weeks, just
had its server upgraded to 64GB RAM and will therefore allow
16GB hash to be used per engine.
This is almost the upper limit without changing the
type of size and hashMask. After this we need to
move to uint64_t instead of uint32_t.
No functional change.
1/ eval margin and gains removed:
16bit are now free on TT entries, due to the removal of eval margin. may be useful
in the future :) gains removed: use instead by Value(128). search() and qsearch()
are now consistent in this regard.
2/ futility_margin()
linear formula instead of complex (log(depth), movecount) formula.
3/ unify pre & post futility pruning
pre futility pruning used depth < 7 plies, while post futility pruning used
depth < 4 plies. Now it's always depth < 7.
Tested with fixed number of games both at short TC:
ELO: 0.82 +-2.1 (95%) LOS: 77.3%
Total: 40000 W: 7939 L: 7845 D: 24216
And long TC
ELO: 0.59 +-2.0 (95%) LOS: 71.9%
Total: 40000 W: 6876 L: 6808 D: 26316
bench 7243575
1/ eval margin and gains removed:
- gains removed by Value(128): search() and qsearch() now behave consistently!
2/ futility_margin()
- testing showed that there is no added value in this weird (log(depth), movecount)
formula, and a much simpler linear formula is just as good. In fact, it is most
likely better, as it is not yet optimally tuned.
- the new simplified formula also means we get rid of FutilityMargins[], its
initialization code, and more importantly ss->futilityMoveCount, and the hacky
code that updates it throughout the search().
- the current formula gives negative futility margins, and there is a hidden interaction
between the move coutn pruning formula and the futility margin one: what happens is
that MCP is supposed to be triggered before we use the non-sensical negative futility
margins.
3/ unify pre & post futility pruning
- pre futility pruning (what SF calls value based pruning) used depth < 7 plies,
while post futility pruning (what SF calls static null move pruning) used depth < 4 plies.
- also the condition depth < 7 in pre futility pruning was not obvious, and it seemd
to be depth < 16 (futility_margin() returns an infinite value when depth >= 7).
Tested with fixed number of games both at short TC:
ELO: 0.82 +-2.1 (95%) LOS: 77.3%
Total: 40000 W: 7939 L: 7845 D: 24216
And long TC
ELO: 0.59 +-2.0 (95%) LOS: 71.9%
Total: 40000 W: 6876 L: 6808 D: 26316
bench: 10206576
Now that we use pre-increment on enums, it
make sense, for code style uniformity, to
swith to pre-increment also for native types,
although there is no speed difference.
No functional change.
Function calloc() already initializes memory to
zero, so avoid calling clear() afterwards.
Also some renaming while there (inspired by DiscoCheck).
No functional change.
But this time do not play with pointers, in
particular do not assume that size_t is an
unsigned type of the same width as pointers.
This code should be fully portable.
No functional change.
Let TT clusters (16*4=64 bytes) to hold on a singe cache line.
This avoids the need for the double prefetch.
Original patches by Lucas and Jean-Francois that has also tested
on his AMD FX:
BIG HASHTABLE
./stockfish bench 1024 1 18 > /dev/null
Before:
1437642 nps
1426519 nps
1438493 nps
After:
1474482 nps
1476375 nps
1475877 nps
SMALL HASHTABLE
./stockfish bench 128 1 18 > /dev/null
Before:
1435207 nps
1435586 nps
1433741 nps
After:
1479143 nps
1471042 nps
1472286 nps
No functional change.
We can encode the ClusterSize directly in the
hashMask, this allows to skip the left shift.
There is no real change, but bench number is now
different because instead of using the lowest order
bits of the key to index the start of the cluster,
now we don't use the last two lsb bits that are
always set to zero (cluster size is 4). So for
instance, if 10 bits are used to index the cluster,
instead of bits [9..0] now we use bits [11..2].
This changes the positions that end up in the same
cluster affecting TT hits and so bench is different.
Also some renaming while there.
bench: 5383795
Do a 32bit bitwise 'and' instead of a 64bit
subtract and bitwise 'and'.
This is possible because even in the biggest
hash table case (8GB) the number of entries
is 2^29 so storable in an unsigned int.
No functional change.
And return on using TT as backing store for position
evaluations.
Tests (even on single thread) show eval cache was a regression.
In multi thread result should be even worst because eval cache
is a per-thread struct, while TT is shared.
After 4957 games at 15"+0.05 (single thread)
eval cache vs master 969 - 1093 - 2895 -9 ELO
So previous reported result of +18 ELO was probably due to an
issue in the testing framework (a bug in cutechess-cli) that
has been fixed in the meanwhile.
bench: 5386711
Test by Joona prooves the new feature don't value 70 added lines of code.
Grand totals after 10040 games (crashes: 0) for tt_both
master_9edc7 - 6a93488_6a934: 1756 - 1688 - 6596 ELO +2 (+- 2.7)
Confirmed by test of Gary:
After 8680 games:
ELO: 0.80 +- 99%: 9.62 95%: 7.31
LOS: 65.38%
Wins: 1288 Losses: 1268 Draws: 6130
Thanks a lot to both for testing it !!!
bench 5149248
This is more complex than what I'd like but I
was unable to split in small chunks.
Here we add 2 slots to TTEntry (valueUpper and depthUpper)
so that sizeof(TTEntry) returns to the original 16 bytes
and we can pack exactly 4 entries in a 64 bytes cache line.
Now we save an upper bound score alongside a lower (exact)
score. The idea is to increase TT cut-offs rates becuase
there is now an higher probability for a node to use TT info.
This patch is highly experimental and probably needs further
steps as is hinted by an unrealistic bench number:
bench: 2022385
It seems more accurate: lsb is clear while 'first
bit' depends from where you look at the bitboard.
And fix compile in case of 64 bits platforms that
do not use BSFQ intrinsics.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
This patch removes a condition that allows a PV entry to remain
in TT across games for an unlimited time.
Although this produces a nice ELO boost in the long term it
is an artifact that affects tests results bewteen version
with and without this feature.
So remove now and readd before to release because it actually
seems a strong feature.
As example a verification tournament against SF 2.0.1 starting around
+10 ELO after 4K games sligltly climbed to +21 ELO after 14K games !!!
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Biggest advantage is be able to analize positions
without "loss of memory" when goind back/forth in
a position.
Patch has proven to fix analysys problems and is even
worths some elo points.
After 5811 games Mod- Orig:
1037 - 902 - 3872 +8 ELO (+- 3.6) LOS 97%
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
If size_t is defined as a 32 bit quanitity then we have an
overflow in the left term of the while condition if mbSize
is bigger then 2048.
For instance if mbSize is 2049 then when newSize will reach
0x80000000 (2048MB) comparison is still true, 'while' loops
again and we have an overflow in the expression (2*newSize)
so that result is 0 and at that point 'while' keeps looping
forever hanging the application.
This patch fixes the bug and also makes operator new do not
throw an exception upon failure but return a NULL pointer
instead.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
It is a redundant boiler plate, just call initialization and
resource release directly from main()
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Language guarantees that c'tor is called, but without any c'tor
it happens to work by accident because OS zeroes out the freshly
allocated pages. The problem is that if I deallocate and allocate
again, the second time pages are no more newly come by the OS and
so could contain stale info.
A practical case could be if we change TT size or numbers of
threads on the fly while already running.
Bug spotted by Justin Blanchard.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Update outdated and even misleading documentation.
Also check #include-directives
No functional change
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
These functions have little to do with TranspositionTable
class and more with the search and in particular with the PV
handling. So move them where they belong.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Considering only VALUE_TYPE_EXACT nodes is too
restrictive and has a number of side-effects, most
notably the truncation of PV line after a fail high
at root.
Note that in this way we are no more guaranteed that
PV line is built up with PV nodes only, because
it could happen that a side search overwrites with a
cut-off move a PV node and this cut-off move ends up
in PV.
Change should be almost not measurable, perhaps with
ponder on we could have some beneficial effect.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>