1
0
Fork 0
mirror of https://github.com/sockspls/badfish synced 2025-05-03 18:19:35 +00:00
Commit graph

275 commits

Author SHA1 Message Date
Marco Costalba
66c5eaebd8 Re-apply the fix for Limits::ponder race
But this time correctly set Threads.ponder

We avoid using 'limits' for passing pondering
flag because we don't want to have 2 ponder
variables in search scope: Search::Limits.ponder
and Threads.ponder. This would be confusing also
because limits.ponder is set at the beginning of
the search and never changes, instead Threads.ponder
can change value asynchronously during search.

No functional change.
2017-08-10 12:47:31 -07:00
Marco Costalba
44236f4ed9 Revert "Fix a race on Limits::ponder"
This reverts commit 5410424e3d.

After the commit pondering is broken, so revert for now. I will
resubmit with a proper fix.

The issue is mine, Joost original code is correct.

No functional change.
2017-08-10 10:59:38 -07:00
Joost VandeVondele
5410424e3d Fix a race on Limits::ponder
Limits::ponder was used as a signal between uci and search threads,
but is not an atomic variable, leading to the following race as
flagged by a sanitized binary.

Expect input:
```
 spawn  ./stockfish
 send "uci\n"
 expect "uciok"
 send "setoption name Ponder value true\n"
 send "go wtime 4000 btime 4000\n"
 expect "bestmove"
 send "position startpos e2e4 d7d5\n"
 send "go wtime 4000 btime 4000 ponder\n"
 sleep 0.01
 send "ponderhit\n"
 expect "bestmove"
 send "quit\n"
 expect eof
```

Race:
```
WARNING: ThreadSanitizer: data race (pid=7191)
  Read of size 4 at 0x0000005c2260 by thread T1:

  Previous write of size 4 at 0x0000005c2260 by main thread:

  Location is global 'Search::Limits' of size 88 at 0x0000005c2220 (stockfish+0x0000005c2260)
```

The reason of teh race is that ponder is not just set in UCI go()
assignment but also is signaled by an async ponderhit in uci.cpp:

      else if (token == "ponderhit")
          Search::Limits.ponder = 0; // Switch to normal search

The fix is to add an atomic bool to the threads structure to
signal the ponder status, letting Search::Limits to reflect just
what was passed to 'go'.

No functional change.
2017-08-10 10:46:46 -07:00
Joost VandeVondele
b40e45c1cc Remove Stack/thread dependence in movepick
as a lower level routine, movepicker should not depend on the
search stack or the thread class, removing a circular dependency.
Instead of copying the search stack into the movepicker object,
as well as accessing the thread class for one of the histories,
pass the required fields explicitly to the constructor (removing
the need for thread.h and implicitly search.h in movepick.cpp).
The signature is thus longer, but more explicit:

Also some renaming of histories structures while there.

passed STC [-3,1], suggesting a small elo impact:

LLR: 3.13 (-2.94,2.94) [-3.00,1.00]
Total: 381053 W: 68071 L: 68551 D: 244431
elo =   -0.438 +-    0.660 LOS:    9.7%

No functional change.
2017-08-06 01:45:54 -07:00
joergoster
377d77dbe9 Provide selective search depth info for each pv move
No functional change

Closes #1166
2017-07-13 16:30:03 -07:00
Joost VandeVondele
36a93d90f7 Move stop signal to Threads
Instead of having Signals in the search namespace,
make the stop variables part of the Threads structure.
This moves more of the shared (atomic) variables towards
the thread-related structures, making their role more clear.

No functional change

Closes #1149
2017-07-13 16:08:37 -07:00
Marco Costalba
802fca6fdd Don't uselessy share rootDepth
It is not needed becuase the only case is a real special
one (bench on depth with many threads) and can be easily
rewritten to avoid sharing rootDepth.

Verified with ThreadSanitizer.

No functional change.

Closes #1159
2017-07-02 22:06:47 -07:00
Marco Costalba
05513a6641 Only main thread checks time
The main change of the patch is that now time check
is done only by main thread. In the past, before lazy
SMP, we needed all the threds to check for available
time because main thread could have been blocked on
a split point, now this is no more the case and main
thread can do the job alone, greatly simplifying the logic.

Verified for regression testing on STC with 7 threads:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 11895 W: 1741 L: 1608 D: 8546

No functional change.

Closes #1152
2017-06-28 17:03:35 -07:00
Joost VandeVondele
3cb0200459 Fix four data races.
the nodes, tbHits, rootDepth and lastInfoTime variables are read by multiple threads, but not declared atomic, leading to data races as found by -fsanitize=thread. This patch fixes this issue. It is based on top of the CI-threading branch (PR #1129), and should fix the corresponding CI error messages.

The patch passed an STC check for no regression:

http://tests.stockfishchess.org/tests/view/5925d5590ebc59035df34b9f
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 169597 W: 29938 L: 30066 D: 109593

Whereas rootDepth and lastInfoTime are not performance critical, nodes and tbHits are. Indeed, an earlier version using relaxed atomic updates on the latter two variables failed STC testing (http://tests.stockfishchess.org/tests/view/592001700ebc59035df34924), which can be shown to be due to x86-32 (http://tests.stockfishchess.org/tests/view/592330ac0ebc59035df34a89). Indeed, the latter have no instruction to atomically update a 64bit variable. The proposed solution thus uses a variable in Position that is accessed only by one thread, which is copied every few thousand nodes to the shared variable in Thread.

No functional change.

Closes #1130
Closes #1129
2017-06-21 13:37:58 -07:00
Marco Costalba
ecd3218b6b History code rewrite (#1122)
Rearrange and rename all history heuristic code. Naming
is now based on chessprogramming.wikispaces.com conventions
and the relations among the various heuristics are now more
clear and consistent.

No functional change.
2017-05-26 08:42:50 +02:00
Joost VandeVondele
99d914985f Remove int to int conversion, unused include.
No functional change.

Closes #1112
2017-05-09 18:36:32 -07:00
Joost VandeVondele
d8f683760c Adjust copyright headers to 2017 (#965)
No functional change.
2017-01-11 08:46:29 +01:00
lucasart
34e47ca87d Rename FromTo -> History (#963)
Previously, we had duplicated History:

- one with (piece,to) called History
- one with (from,to) called FromTo

Now that we have only one, rename it to History, which is the generally accepted
name in the chess programming litterature for this technique.

Also correct some comments that had not been updated since the introduction of CMH.

No functional change.
2017-01-10 08:47:56 +01:00
lucasart
e0504ab876 Remove HistoryStats
STC:
LLR: 3.44 (-2.94,2.94) [-3.00,1.00]
Total: 120831 W: 21572 L: 21594 D: 77665

LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 26565 W: 3519 L: 3406 D: 19640

bench 5920493
2017-01-09 15:50:12 +01:00
Miroslav Fontán
f3cd7002aa Sync variable names in decl vs def 2016-11-05 08:05:22 +01:00
Marco Costalba
e18321f55a Correcty resey TB hit counter
Restore original behaviour to reset
the counter before a new move search.

Also fixed some warnings and added const
qualifier to a couple of functions, as
suggested by m_stembera.

Thanks to Werner Bergmans for reporting
the regression.

No functional change.
2016-10-22 08:22:13 +02:00
syzygy
ca67752645 Per-thread TB hit counters
Use a per-thread counter to reduce contention
with many cores and endgame positions.

Measured around 1% speed-up on a 12 core and 8%
on 28 cores with 6-men, searching on:
 7R/1p3k2/2p2P2/3nR1P1/8/3b1P2/7K/r7 b - - 3 38

Also retire the unused set_nodes_searched() and fix
a couple of return types and naming conventions.

No functional change.
2016-10-21 06:15:45 +02:00
Marco Costalba
057d710fc2 Fix indentation in struct FromToStats
And other little trivial stuff.

No functional change.
2016-09-17 09:51:20 +02:00
Marco Costalba
5c58d1f5cb Use per-thread counterMoveHistory
Drops a scalability bottleneck due to memory contention
of a single shared table across threads. The effect starts
to be sensible with a high number of threads. Specifically
we have a small regression with 7 threads both at 60 and
180 seconds TC:

10000 @ 60+0.6 th 7
ELO: -2.46 +-3.2 (95%) LOS: 6.5%
Total: 9896 W: 1037 L: 1107 D: 7752

5000 @ 180+0.6 th 7
ELO: -1.95 +-4.1 (95%) LOS: 17.7%
Total: 5000 W: 444 L: 472 D: 4084

We have a regression because counterMoveHistory table is
quite big and it takes time for a single thread to fill it.
Sharing the table yields to a higher fill rate and better
quality of moves and up to 7 threads the benefits of sharing
more then compensate the loss in speed due to contention.
Interestingly even with a 3X longer TC, so with more time
for the single thread to catch up, the improvment is quite
limited and below noise level. It seems we really need much
longer TC to saturate the table.

When we move to high threads number it's another story:

5000 @ 60+0.6 th 22
ELO: 3.49 +-4.3 (95%) LOS: 94.6%
Total: 4880 W: 490 L: 441 D: 3949

2000 @ 60+0.6 th 32
ELO: 8.34 +-6.9 (95%) LOS: 99.1%
Total: 2000 W: 229 L: 181 D: 1590

As expected the speed-up more than compensates the filling
rate, and we expect that with tournament TC, where single
thread is able to saturate the table, the difference will
be even stronger. For instance for TCEC 9 super-final time
control will be 180 minutes + 15 seconds and this scalability
improvement seems definitely the way to go.

So, summarizing:

GOOD:

Measured big improvement in high core scenario

Suitable for TCEC 9 superfinal (big hardware, very long TC)

Consistent and natural patch that extends to counterMoveHistory
what we already do for remaining history tables, that are all per-thread

Non functional change for the common case of a single core

Very simple (just 6 lines modified, no added ones)

BAD:

Small regression (within 2-3 ELO) with few threads and short TC

bench: 5341477
2016-09-16 08:15:07 +02:00
VoyagerOne
b3525fa9ea Use Color-From-To history stats to help sort moves
STC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 33502 W: 6498 L: 6223 D: 20781
http://tests.stockfishchess.org/tests/view/578abb940ebc5972faa169e2

LTC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 50782 W: 7124 L: 6832 D: 36826
http://tests.stockfishchess.org/tests/view/578b8e5d0ebc5972faa169fd

LTC: (Sanity test against latest master)
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 32759 W: 4600 L: 4370 D: 23789
http://tests.stockfishchess.org/tests/view/5798b7d30ebc591c761f5b72

bench: 6985912

P.S. Thanks @mstembera for rewriting my code to make it smp compatible. A BIG thank you!
2016-08-02 09:17:14 +02:00
Marco Costalba
ca14345ba2 Filter root moves filter before copy to threads
Currently root moves are copied to all teh threads
but are DTZ filtered only in main thread at the
beginning of teh search.

This patch moves the TB filtering before the
copy of root moves fixing issue #679

https://github.com/official-stockfish/Stockfish/issues/679

No bench change.
2016-06-11 09:24:40 +02:00
Marco Costalba
7eaea3848c StateInfo is usually allocated on the stack by search()
And passed in do_move(), this ensures maximum efficiency and
speed and at the same time unlimited move numbers.

The draw back is that to handle Position init we need to
reserve a StateInfo inside Position itself and use at
init time and when copying from another Position.

After lazy SMP we don't need anymore this gimmick and we can
get rid of this special case and always pass an external
StateInfo to Position object.

Also rewritten and simplified Position constructors.

Verified it does not regress with a 3 threads SMP test:
ELO: -0.00 +-12.7 (95%) LOS: 50.0%
Total: 1000 W: 173 L: 173 D: 654

No functional change.
2016-04-17 08:29:33 +02:00
Marco Costalba
356147d99a Rewrite time formula
Time management is really too complex, our aim is
to simplify it, but for time being at least rewrite
in an understandable way.

No functional change.
2016-01-18 17:12:18 +01:00
Lyudmil Antonov
89723339d9 Assorted English grammar changes
No functional change

Resolves #567
2016-01-16 21:34:29 +00:00
Leonid Pechenik
9eceb894ac Adjust time used for move based on previous score
Use less time if evaluation is not worse than for previous move and even less time if in addition no fail low encountered for current iteration.

STC: 10+0.1
ELO: 5.37 +-2.9 (95%) LOS: 100.0%
Total: 20000 W: 3832 L: 3523 D: 12645

STC: 10+0.1
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 17527 W: 3334 L: 3132 D: 11061

LTC: 60+0.6
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 28233 W: 3939 L: 3725 D: 20569

LTC: 60+0.6
ELO: 2.43 +-1.4 (95%) LOS: 100.0%
Total: 60000 W: 8266 L: 7847 D: 43887

LTC: 60+0.06
LLR: 2.95 (-2.94,2.94) [-1.00,3.00]
Total: 38932 W: 5408 L: 5207 D: 28317

Resolves #547
2016-01-03 14:01:15 +00:00
ppigazzini
d4af15f682 Update AUTHORS and copyright notice
No functional change

Resolves #555
2016-01-02 09:43:51 +00:00
Marco Costalba
9742fb10fd Update Copyright year
No functional change.

Resolves #554
2016-01-01 10:17:36 +00:00
Marco Costalba
1b5b900a29 Move some globals into main thread scope
Make it explicit that those variables are not globals, but
are used only by main thread. I think it is a sensible
clarification because easy move is already tricky enough
and current patch makes the involved actors explicit.

No functional change.

Resolves #537
2015-12-27 19:29:16 +00:00
Marco Costalba
93195555ed Rewrite how threads are spawned
Instead of creating a running std::thread and
returning, wait in Thread c'tor that the native
thread of execution goes to sleep in idle_loop().

In this way we can simplify how search is started,
because when main thread is idle we are sure also
all other threads will be idle, in any case, even
at thread creation and startup.

After lazy smp went in, we can simpify and rewrite
a lot of logic that is now no more needed. This is
hopefully the final big cleanup.

Tested for no regression at 5+0.1 with 3 threads:
LLR: 2.95 (-2.94,2.94) [-5.00,0.00]
Total: 17411 W: 3248 L: 3198 D: 10965

No functional change.
2015-11-21 07:48:50 +01:00
Marco Costalba
76ed0ab501 Retire ThreadBase
Now that we don't have anymore TimerThread, there is
no need of this long class hierarchy.

Also assorted reformatting while there.

To verify no regression, passed at STC with 7 threads:
LLR: 2.97 (-2.94,2.94) [-5.00,0.00]
Total: 30990 W: 4945 L: 4942 D: 21103

No functional change.
2015-11-13 08:22:44 +01:00
Marco Costalba
9c9205860c Get rid of timer thread
Unfortunately std::condition_variable::wait_for()
is not accurate in general case and the timer thread
can wake up also after tens or even hundreds of
millisecs after time has elapsded. CPU load, process
priorities, number of concurrent threads, even from
other processes, will have effect upon it.

Even official documentation says: "This function may
block for longer than timeout_duration due to scheduling
or resource contention delays."

So retire timer and use a polling scheme based on a
local thread counter that counts search() calls and
a small trick to keep polling frequency constant,
independently from the number of threads.

Tested for no regression at very fast TC 2+0.05 th 7:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 32969 W: 6720 L: 6620 D: 19629

TC 2+0.05 th 1:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 7765 W: 1917 L: 1765 D: 4083

And at STC TC, both single thread
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 15587 W: 3036 L: 2905 D: 9646

And with 7 threads
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 8149 W: 1367 L: 1227 D: 5555

bench: 8639247
2015-11-03 11:27:00 +01:00
mbootsector
27c5cb5912 Pick bestmove from the deepest thread.
STC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 26930 W: 4441 L: 4214 D: 18275

LTC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 7783 W: 1017 L: 876 D: 5890

No functional change in single thread mode

Resolves #485
2015-11-02 10:05:43 +00:00
Marco Costalba
86f04dbcc0 Assorted trivia in search.cpp
The only interesting change is the moving of
stack[MAX_PLY+4] back to its original position
in id_loop (now renamed Thread::search).

No functional change.
2015-10-31 19:26:35 +01:00
Stéphane Nicolet
80d7556af7 Some code and comment cleanup
- Remove all references to split points
- Some grammar and spelling fixes

No Functional change

Resolves #478
2015-10-29 15:28:59 +00:00
lucasart
00d9e9fd28 Use atomics instead of volatile
Rely on well defined behaviour for message passing, instead of volatile. Three
versions have been tested, to make sure this wouldn't cause a slowdown on any
platform.

v1: Sequentially consistent atomics

No mesurable regression, despite the extra memory barriers on x86. Even with 15
threads and extreme time pressure, both acting as a magnifying glass:

threads=15, tc=2+0.02
ELO: 2.59 +-3.4 (95%) LOS: 93.3%
Total: 18132 W: 4113 L: 3978 D: 10041

threads=7, tc=2+0.02
ELO: -1.64 +-3.6 (95%) LOS: 18.8%
Total: 16914 W: 4053 L: 4133 D: 8728

v2: Acquire/Release semantics

This version generates no extra barriers for x86 (on the hot path). As expected,
no regression either, under the same conditions:

threads=15, tc=2+0.02
ELO: 2.85 +-3.3 (95%) LOS: 95.4%
Total: 19661 W: 4640 L: 4479 D: 10542

threads=7, tc=2+0.02
ELO: 0.23 +-3.5 (95%) LOS: 55.1%
Total: 18108 W: 4326 L: 4314 D: 9468

As suggested by Joona, another test at LTC:

threads=15, tc=20+0.05
ELO: 0.64 +-2.6 (95%) LOS: 68.3%
Total: 20000 W: 3053 L: 3016 D: 13931

v3: Final version: SeqCst/Relaxed

threads=15, tc=10+0.1
ELO: 0.87 +-3.9 (95%) LOS: 67.1%
Total: 9541 W: 1478 L: 1454 D: 6609

Resolves #474
2015-10-25 09:15:45 +00:00
Marco Costalba
307a5a4f63 Cleanup history stats
And other assorted trivia.

No functional change.
2015-10-24 17:29:12 +02:00
mbootsector
ecc5ff6693 Lazy SMP
Start all threads searching on root position and
use only the shared TT table as synching scheme.

It seems this scheme scales better than YBWC for
high number of threads.

Verified for nor regression at STC 3 threads
LLR: -2.95 (-2.94,2.94) [-3.00,1.00]
Total: 40232 W: 6908 L: 7130 D: 26194

Verified for nor regression at LTC 3 threads
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 28186 W: 3908 L: 3798 D: 20480

Verified for nor regression at STC 7 threads
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 3607 W: 674 L: 526 D: 2407

Verified for nor regression at LTC 7 threads
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 4235 W: 671 L: 528 D: 3036

Tested with fixed games at LTC with 20 threads
ELO: 44.75 +-7.6 (95%) LOS: 100.0%
Total: 2069 W: 407 L: 142 D: 1520

Tested with fixed games at XLTC (120secs) with 20 threads
ELO: 28.01 +-6.7 (95%) LOS: 100.0%
Total: 2275 W: 349 L: 166 D: 1760

Original patch of mbootsector, with additional work
from Ivan Ivec (log formula), Joerg Oster (id loop
simplification) and Marco Costalba (assorted formatting
and rework).

Bench: 8116244
2015-10-20 06:58:08 +02:00
Joona Kiiski
a7381d5e81 Fully yielding locks, no spinning
7 threads:

ELO: 2.00 +-2.7 (95%) LOS: 92.4%
Total: 20000 W: 3276 L: 3161 D: 13563

There is no functional change in single thread mode

Resolves #304
2015-03-24 21:34:19 +00:00
Marco Costalba
be77406a55 Get rid of nativeThread
No functional change.
2015-03-23 09:02:52 +01:00
Marco Costalba
26dabb1e6b Use only one ConditionVariable to sync UI
To sync UI with main thread it is enough a single
condition variable because here we have a single
producer / single consumer design pattern.

Two condition variables are strictly needed just for
many producers / many consumers case.

Note that this is possible because now we don't send to
sleep idle threads anymore while searching, so that now
only UI can wake up the main thread and we can use the
same ConditionVariable for both threads.

The natural consequence is to retire wait_for_think_finished()
and move all the logic under MainThread class, yielding the
rename of teh function to join()

No functional change.
2015-03-21 07:55:33 +01:00
Marco Costalba
13d4df95cd Use acquire() and release() for spinlocks
It is more idiomatick than lock() and unlock()

No functional change.
2015-03-16 08:14:08 +01:00
Joona Kiiski
d71f707040 Introduce yielding spin locks
Idea and original implementation by Stephane Nicolet

7 threads 15+0.05
ELO: 3.54 +-2.9 (95%) LOS: 99.2%
Total: 17971 W: 2976 L: 2793 D: 12202

There is no functional change in single thread mode
2015-03-14 19:14:52 +00:00
Joona Kiiski
81c7975dcd Use thread specific mutexes instead of a global one.
This is necessary to improve the scalability with high number of cores.

There is no functional change in a single thread mode.

Resolves #281
2015-03-11 21:59:34 +00:00
Marco Costalba
4b59347194 Retire spinlocks
Use Mutex instead.

This is in preparaation for merging with master branch,
where we stilll don't have spinlocks.

Eventually spinlocks will be readded in some future
patch, once c++11 has been merged.

No functional change.
2015-03-11 21:20:47 +01:00
Marco Costalba
04372316b3 Disable spinlocks
To allow testing on fishtest.

No functional change.
2015-03-10 12:47:49 +01:00
Marco Costalba
8725494966 Add thread_win32.h header
Workaround slow std::thread implementation in mingw
and gcc for Windows with our own old low level thread
functions.

No functional change.
2015-03-10 12:42:40 +01:00
Marco Costalba
a590d1d52d Re-enable spinlocks
For branch C++11, that doe snot run on fishtest,
there is no need of this kludge, let only master
have it.

No functional change.
2015-03-07 08:38:26 +01:00
Marco Costalba
6645115377 Allow to disable spinlocks
And use mutex instead. You may never want to do this.
It is a workaround to run c++11 on fishtest where many
machiens have HTenabled and this can be a problem when
number of cores set is higher than number of physical cores.

To disable spinlocks, just compile with -DNO_SPINLOCK flag

No functional change.
2015-03-01 17:16:05 +01:00
Marco Costalba
63a5fc2366 Rename available_to()
Change this API to be more natural and simple.

Inspired by a patch by Joona.

No functional change.
2015-03-01 12:33:05 +01:00
Marco Costalba
d3d26a94b3 Improve spinlock implementation
Calling lock.test_and_set() in a tight loop creates expensive
memory synchronizations among processors and penalize other
running threads. So syncronize only only once at the beginning
with fetch_sub() and then loop on a simple load() that puts much
less pressure on the system.

Reported about 2-3% speed up on various systems.

Patch by Ronald de Man.

No functional change.
2015-02-23 19:48:46 +01:00