The heuristic to avoid thread binding if less than 8 threads are requested resulted in the first 7 threads not being bound.
The branch was verified to yield a roughly 13% speedup by @CoffeeOne on the appropriate hardware and OS, and an earlier version of this patch tested well on his machine:
http://tests.stockfishchess.org/tests/view/5a3693480ebc590ccbb8be5a
ELO: 9.24 +-4.6 (95%) LOS: 100.0%
Total: 5000 W: 634 L: 501 D: 3865
To make sure all threads (including mainThread) are bound as soon as the total number exceeds 7, recreate all threads on a change of thread number.
To do this, unify Threads::init, Threads::exit and Threads::set are unified in a single Threads::set function that goes through the needed steps.
The code includes several suggestions from @joergoster.
Fixes issue #1312
No functional change
For efficiency reasons current master only allows for transposition table sizes that are N = 2^k in size, the index computation can be done efficiently as (hash % N) can be written instead as (hash & 2^k - 1). On a typical computer (with 4, 8... etc Gb of RAM), this implies roughly half the RAM is left unused in analysis.
This issue was mentioned on fishcooking by Mindbreaker:
http://tests.stockfishchess.org/tests/view/5a3587de0ebc590ccbb8be04
Recently a neat trick was proposed to map a hash into the range [0,N[ more efficiently than (hash % N) for general N, nearly as efficiently as (hash % 2^k):
https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
namely computing (hash * N / 2^32) for 32 bit hashes. This patch implements this trick and now allows for general hash sizes. Note that for N = 2^k this just amounts to using a different subset of bits from the hash. Master will use the lower k bits, this trick will use the upper k bits (of the 32 bit hash).
There is no slowdown as measured with [-3, 1] test:
http://tests.stockfishchess.org/tests/view/5a3587de0ebc590ccbb8be04
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 128498 W: 23332 L: 23395 D: 81771
There are two (smaller) caveats:
1) the patch is implemented for a 32 bit hash (so that a 64 bit multiply can be used), this effectively limits the number of clusters that can be used to 2^32 or to 128Gb of transpostion table. That's a change in the maximum allowed TT size, which could bother those using 256Gb or more regularly.
2) Already in master, an excluded move is hashed into the position key in rather simple way, essentially only affecting the lower 16 bits of the key. This is OK in master, since bits 0-15 end up in the index, but not in the new scheme, which picks the higher bits. This is 'fixed' by shifting the excluded move a few bits up. Eventually a better hashing scheme seems wise.
Despite these two caveats, I think this is a nice improvement in usability.
Bench: 5346341
Current master can yield different staticEvals depending on the path
used to reach the position. The reason for this is that the evaluation after a
null move is always computed subtracting 2 * Eval::Tempo, while this is not
the case for lazy or specialized evals. This patch always adds tempo to evals,
which doesn't affect playing strength:
LTC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 59911 W: 7616 L: 7545 D: 44750
STC
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 104947 W: 18897 L: 18919 D: 67131
Fixes issue #1335
Bench: 5208264
Simplify the other check penalty computation. Compared to current master,
a) it uses a 143 kingDanger penalty instead of S(10, 10) for the "otherCheck"
(credits to ElbertoOne for finding a suitable kingDanger range to replace the score
and to Guardian for showing this could also be a neutral change at LTC).
This makes our king safety model more consistent and simpler.
b) it might also score more than one "otherCheck" penalty for a given piece type instead of just one
c) it might score many pinned penalties instead of just one.
d) It also remove 3 conditionals and uses simpler expressions.
So it was tested as a SPRT[-3, 1]
Passed STC
http://tests.stockfishchess.org/tests/view/5a2b560b0ebc590ccbb8ba6b
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 11705 W: 2217 L: 2080 D: 7408
And LTC
http://tests.stockfishchess.org/tests/view/5a2bfd0d0ebc590ccbb8bab0
LLR: 2.94 (-2.94,2.94) [-3.00,1.00]
Total: 26812 W: 3575 L: 3463 D: 19774
Trying to improve on b) another attempt was made to score also the
"otherchecks" for piece types which had some safe checks, but this
failed STC http://tests.stockfishchess.org/tests/view/5a2c79e60ebc590ccbb8badd
bench: 5149133
* A better contempt implementation for Stockfish
The round 2 of TCEC season 10 demonstrated the benefit of having a nice contempt implementation: it gives the strongest programs in the tournament the ability to slow down the game when they feel the position is slightly worse, prefering to stay in a complicated (even if slightly risky) middle game rather than simplifying by force into a drawn endgame.
The current contempt implementation of Stockfish is inadequate, and this patch is an attempt to provide a better one.
Passed STC non-regression test against master:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 83360 W: 15089 L: 15075 D: 53196
http://tests.stockfishchess.org/tests/view/5a1bf2de0ebc590ccbb8b370
This contempt implementation is showing promising results in certains situations. For instance, it obtained a nice +30 Elo gain when playing with contempt=40 against Stockfish 7, compared to current master:
• master against SF 7 (20000 games at LTC): +121.2 Elo
• this patch with contempt=40 (20000 games at LTC): +154.11 Elo
This was the result of real cooperative work from the Stockfish team, with key ideas coming from Stefan Geschwentner (locutus2) and Chris Cain (ceebo) while most of the community helped with feedback and computer time.
In this commit the bench is unchanged by default, but you can test at home with the new contempt in the UCI options. The style of play will change a lot when using contempt different of zero (I repeat: not done in this version by default, however)!
The Stockfish team is still deliberating over the best default contempt value in self-play and the best contempt modeling strategy, to help users choosing a contempt value when playing against much weaker programs. These informations will be given in future commits when available :-)
Bench: 5051254
* Remove the prefetch
No functional change.
Currently the NORTH/WEST/SOUTH/EAST values are of type Square, but conceptually they are not squares but directions. This patch separates these values into a Direction enum and overloads addition and subtraction to allow adding a Square to a Direction (to get a new Square).
I have also slightly trimmed the possible overloadings to improve type safety. For example, it would normally not make sense to add a Color to a Color or a Piece to a Piece, or to multiply or divide them by an integer. It would also normally not make sense to add a Square to a Square.
This is a non-functional change.
Add the -fno-exceptions flag to the Makefile to avoid the unecessary exceptions support in the executable (we do not use any exception in Stockfish at the moment).
This change gives a 9.2% reduction in size for the executable binary.
Before : executable size = 376956 bytes
After: executable size = 347652 bytes
No functional change.
Four very minor edits. Note that tte->save() uses posKey and
not pos.key() in other places.
Originally I also added a futility_move_counts() function to
make things more consistent with the futility_margin() and
reduction() functions. But then razor_margin[] should probably
also be turned into a function, etc. Maybe a good idea, maybe not.
So I did not include it.
Non functional change.
The new "weak" expression helps simplify the safe check calculations for rooks or minors, (but the end result for all the safe checks is the exactly the same as in current master)
The only functional change is for the "outer king ring" (for example, squares f3 g3 h3 when white king is on g1). In current master, there was a 191 penalty if any of these was not defended at all.
With this pr, there is this 191 penalty if any of these is not defended at all or is only defended by a white queen.
Tested as a simplification
STC
http://tests.stockfishchess.org/tests/view/59fb03d80ebc590ccbb89fee
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 66167 W: 12015 L: 11971 D: 42181
(against master (Update Copyright year inMakefile))
LTC
http://tests.stockfishchess.org/tests/view/5a0106ae0ebc590ccbb8a55f
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 15790 W: 2095 L: 1968 D: 11727
(against master (Handle BxN trade as good capture when history scor))
same as #1296 but rebased on latest master
bench: 5109559
In terms of technical changes this patch eliminates the return
statements from the main loop of pos.see_ge() and replaces two conditional
computations with a single bitwise negation.
No functional change
Stockfish currently relies on the "filter_root_moves" function also
having the side effect of clamping Cardinality against MaxCardinality
(the actual piece count in the tablebases). So if we skip this function,
we will end up probing in the search even without tablebases installed.
We cannot bail out of this function before this check is done, so move
the MultiPV hack a few lines below.
If the PV leads to a draw (3-fold / 50-moves) position
and we're ahead of time, think a little longer, possibly
finding a better way.
As this is most likely effective at higher draw rates,
tried speculative LTC after a yellow STC:
STC:
http://tests.stockfishchess.org/tests/view/59eb173a0ebc590ccbb8975d
LLR: -2.95 (-2.94,2.94) [0.00,5.00]
Total: 56095 W: 10013 L: 9902 D: 36180
elo = 0.688 +- 1.711 LOS: 78.425%
LTC:
http://tests.stockfishchess.org/tests/view/59eba1670ebc590ccbb897b4
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 59579 W: 7577 L: 7273 D: 44729
elo = 1.773 +- 1.391 LOS: 99.381%
bench: 5234652
In x/y time controls there was a theoretical possibility
to use all available time few moves before the clock will
be updated with new time. This patch fixes that issue.
Tested at 60/15 time control:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 113963 W: 20008 L: 20042 D: 73913
The test was done without adjudication rules!
Bench 5234652
A band-aid patch to workaround current TB code
limitations with multi PV.
Hopefully this will be removed after committing the
big update of TB impementation, now under discussion.
No functional change.
If the search is quit before skill.pick_best is called,
skill.best_move might be MOVE_NONE.
Ensure skill.best is always assigned anyhow.
Also retire the tricky best_move() and let the underlying
semantic to be clear and explicit.
No functional change.
The first change (ss->statScore >= 0) does nothing.
The second change ((ss-1)->statScore >= 0 ) has a massive change.
(ss-1)->statScore is not set until (ss-1) begins to apply LMR to moves.
So we now increase the reduction for bad quiets when our opponent is
running through the first captures and the hash move.
STC
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 57762 W: 10533 L: 10181 D: 37048
LTC
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 19973 W: 2662 L: 2480 D: 14831
Bench: 5037819
This patch lets ss->ply be equal to 0 at the root of the search.
Currently, the root has ss->ply == 1, which is less intuitive:
- Setting the rootNode bool has to check (ss-1)->ply == 0.
- All mate values are off by one: the code seems to assume that mated-in-0
is -VALUE_MATE, mate-1-in-ply is VALUE_MATE-1, mated-in-2-ply is VALUE_MATE+2, etc.
But the mate_in() and mated_in() functions are called with ss->ply, which is 1 in
at the root.
- The is_draw() function currently needs to explain why it has "ply - 1 > i" instead
of simply "ply > i".
- The ss->ply >= MAX_PLY tests in search() and qsearch() already assume that
ss->ply == 0 at the root. If we start at ss->ply == 1, it would make more sense to
go up to and including ss->ply == MAX_PLY, so stop at ss->ply > MAX_PLY. See also
the asserts testing for 0 <= ss->ply && ss->ply < MAX_PLY.
The reason for ss->ply == 1 at the root is the line "ss->ply = (ss-1)->ply + 1" at
the start for search() and qsearch(). By replacing this with "(ss+1)->ply = ss->ply + 1"
we keep ss->ply == 0 at the root. Note that search() already clears killers in (ss+2),
so there is no danger in accessing ss+1.
I have NOT changed pv[MAX_PLY + 1] to pv[MAX_PLY + 2] in search() and qsearch().
It seems to me that MAX_PLY + 1 is exactly right:
- MAX_PLY entries for ss->ply running from 0 to MAX_PLY-1, and 1 entry for the
final MOVE_NONE.
I have verified that mate scores are reported correctly. (They were already reported
correctly due to the extra ply being rounded down when converting to moves.)
The value of seldepth output to the user should probably not change, so I add 1 to it.
(Humans count from 1, computers from 0.)
A small optimisation I did not include: instead of setting ss->ply in every invocation
of search() and qsearch(), it could be set once for all plies at the start of
Thread::search(). This saves a couple of instructions per node.
No functional change (unless the search searches a branch MAX_PLY deep), so bench
does not change.
This shoudl reduce time losses experienced by
users after new time management code.
Verified for no regression in very short TC (4sec + 0.1)
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 35262 W: 7426 L: 7331 D: 20505
Bench 5322108
Use different penalties for weaknesses in the pawn shelter
depending on whether it is on the king's file or not.
STC
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 71617 W: 13471 L: 13034 D: 45112
LTC
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 48708 W: 6463 L: 6187 D: 36058
Bench: 5322108
Two simplifications:
- Remove the initialisation to 0 of occupied, which is now unnecessary.
- Remove the initial check for nextVictim == KING
If nextVictim == KING, then PieceValue[MG][nextVictim] will be 0, so that
balance >= threshold is true. So see_ge() returns true anyway.
No functional change.
Compile with -Werror flag. To make debugging easier
also show compile ourput.
This flag is enabled only in Travis CI, not in the shipped
Makefile becuase we can't test on every possible platform.
In light of issue #1232, a test was performed about the value of '-fno-exceptions' and a second one of the combination '-fno-exceptions -fno-rtti'. It turns out these options are can be removed without introducing slowdown.
STC for removing '-fno-exceptions'
LLR: 2.94 (-2.94,2.94) [-3.00,1.00]
Total: 13678 W: 2572 L: 2439 D: 8667
STC for removing '-fno-exceptions -fno-rtti' (current patch)
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 32557 W: 6074 L: 5973 D: 20510
No functional change.