BadFish

mirror of https://github.com/sockspls/badfish synced 2025-05-03 18:19:35 +00:00

Author	SHA1	Message	Date
Marco Costalba	66c5eaebd8	Re-apply the fix for Limits::ponder race But this time correctly set Threads.ponder We avoid using 'limits' for passing pondering flag because we don't want to have 2 ponder variables in search scope: Search::Limits.ponder and Threads.ponder. This would be confusing also because limits.ponder is set at the beginning of the search and never changes, instead Threads.ponder can change value asynchronously during search. No functional change.	2017-08-10 12:47:31 -07:00
Marco Costalba	44236f4ed9	Revert "Fix a race on Limits::ponder" This reverts commit `5410424e3d`. After the commit pondering is broken, so revert for now. I will resubmit with a proper fix. The issue is mine, Joost original code is correct. No functional change.	2017-08-10 10:59:38 -07:00
Joost VandeVondele	5410424e3d	Fix a race on Limits::ponder Limits::ponder was used as a signal between uci and search threads, but is not an atomic variable, leading to the following race as flagged by a sanitized binary. Expect input: ``` spawn ./stockfish send "uci\n" expect "uciok" send "setoption name Ponder value true\n" send "go wtime 4000 btime 4000\n" expect "bestmove" send "position startpos e2e4 d7d5\n" send "go wtime 4000 btime 4000 ponder\n" sleep 0.01 send "ponderhit\n" expect "bestmove" send "quit\n" expect eof ``` Race: ``` WARNING: ThreadSanitizer: data race (pid=7191) Read of size 4 at 0x0000005c2260 by thread T1: Previous write of size 4 at 0x0000005c2260 by main thread: Location is global 'Search::Limits' of size 88 at 0x0000005c2220 (stockfish+0x0000005c2260) ``` The reason of teh race is that ponder is not just set in UCI go() assignment but also is signaled by an async ponderhit in uci.cpp: else if (token == "ponderhit") Search::Limits.ponder = 0; // Switch to normal search The fix is to add an atomic bool to the threads structure to signal the ponder status, letting Search::Limits to reflect just what was passed to 'go'. No functional change.	2017-08-10 10:46:46 -07:00
Joost VandeVondele	b40e45c1cc	Remove Stack/thread dependence in movepick as a lower level routine, movepicker should not depend on the search stack or the thread class, removing a circular dependency. Instead of copying the search stack into the movepicker object, as well as accessing the thread class for one of the histories, pass the required fields explicitly to the constructor (removing the need for thread.h and implicitly search.h in movepick.cpp). The signature is thus longer, but more explicit: Also some renaming of histories structures while there. passed STC [-3,1], suggesting a small elo impact: LLR: 3.13 (-2.94,2.94) [-3.00,1.00] Total: 381053 W: 68071 L: 68551 D: 244431 elo = -0.438 +- 0.660 LOS: 9.7% No functional change.	2017-08-06 01:45:54 -07:00
joergoster	377d77dbe9	Provide selective search depth info for each pv move No functional change Closes #1166	2017-07-13 16:30:03 -07:00
Joost VandeVondele	36a93d90f7	Move stop signal to Threads Instead of having Signals in the search namespace, make the stop variables part of the Threads structure. This moves more of the shared (atomic) variables towards the thread-related structures, making their role more clear. No functional change Closes #1149	2017-07-13 16:08:37 -07:00
Marco Costalba	802fca6fdd	Don't uselessy share rootDepth It is not needed becuase the only case is a real special one (bench on depth with many threads) and can be easily rewritten to avoid sharing rootDepth. Verified with ThreadSanitizer. No functional change. Closes #1159	2017-07-02 22:06:47 -07:00
Marco Costalba	05513a6641	Only main thread checks time The main change of the patch is that now time check is done only by main thread. In the past, before lazy SMP, we needed all the threds to check for available time because main thread could have been blocked on a split point, now this is no more the case and main thread can do the job alone, greatly simplifying the logic. Verified for regression testing on STC with 7 threads: LLR: 2.96 (-2.94,2.94) [-3.00,1.00] Total: 11895 W: 1741 L: 1608 D: 8546 No functional change. Closes #1152	2017-06-28 17:03:35 -07:00
Joost VandeVondele	3cb0200459	Fix four data races. the nodes, tbHits, rootDepth and lastInfoTime variables are read by multiple threads, but not declared atomic, leading to data races as found by -fsanitize=thread. This patch fixes this issue. It is based on top of the CI-threading branch (PR #1129), and should fix the corresponding CI error messages. The patch passed an STC check for no regression: http://tests.stockfishchess.org/tests/view/5925d5590ebc59035df34b9f LLR: 2.96 (-2.94,2.94) [-3.00,1.00] Total: 169597 W: 29938 L: 30066 D: 109593 Whereas rootDepth and lastInfoTime are not performance critical, nodes and tbHits are. Indeed, an earlier version using relaxed atomic updates on the latter two variables failed STC testing (http://tests.stockfishchess.org/tests/view/592001700ebc59035df34924), which can be shown to be due to x86-32 (http://tests.stockfishchess.org/tests/view/592330ac0ebc59035df34a89). Indeed, the latter have no instruction to atomically update a 64bit variable. The proposed solution thus uses a variable in Position that is accessed only by one thread, which is copied every few thousand nodes to the shared variable in Thread. No functional change. Closes #1130 Closes #1129	2017-06-21 13:37:58 -07:00
Marco Costalba	ecd3218b6b	History code rewrite (#1122 ) Rearrange and rename all history heuristic code. Naming is now based on chessprogramming.wikispaces.com conventions and the relations among the various heuristics are now more clear and consistent. No functional change.	2017-05-26 08:42:50 +02:00
Joost VandeVondele	99d914985f	Remove int to int conversion, unused include. No functional change. Closes #1112	2017-05-09 18:36:32 -07:00
Joost VandeVondele	d8f683760c	Adjust copyright headers to 2017 (#965 ) No functional change.	2017-01-11 08:46:29 +01:00
lucasart	34e47ca87d	Rename FromTo -> History (#963 ) Previously, we had duplicated History: - one with (piece,to) called History - one with (from,to) called FromTo Now that we have only one, rename it to History, which is the generally accepted name in the chess programming litterature for this technique. Also correct some comments that had not been updated since the introduction of CMH. No functional change.	2017-01-10 08:47:56 +01:00
lucasart	e0504ab876	Remove HistoryStats STC: LLR: 3.44 (-2.94,2.94) [-3.00,1.00] Total: 120831 W: 21572 L: 21594 D: 77665 LTC: LLR: 2.96 (-2.94,2.94) [-3.00,1.00] Total: 26565 W: 3519 L: 3406 D: 19640 bench 5920493	2017-01-09 15:50:12 +01:00
Miroslav Fontán	f3cd7002aa	Sync variable names in decl vs def	2016-11-05 08:05:22 +01:00
Marco Costalba	e18321f55a	Correcty resey TB hit counter Restore original behaviour to reset the counter before a new move search. Also fixed some warnings and added const qualifier to a couple of functions, as suggested by m_stembera. Thanks to Werner Bergmans for reporting the regression. No functional change.	2016-10-22 08:22:13 +02:00
syzygy	ca67752645	Per-thread TB hit counters Use a per-thread counter to reduce contention with many cores and endgame positions. Measured around 1% speed-up on a 12 core and 8% on 28 cores with 6-men, searching on: 7R/1p3k2/2p2P2/3nR1P1/8/3b1P2/7K/r7 b - - 3 38 Also retire the unused set_nodes_searched() and fix a couple of return types and naming conventions. No functional change.	2016-10-21 06:15:45 +02:00
Marco Costalba	057d710fc2	Fix indentation in struct FromToStats And other little trivial stuff. No functional change.	2016-09-17 09:51:20 +02:00
Marco Costalba	5c58d1f5cb	Use per-thread counterMoveHistory Drops a scalability bottleneck due to memory contention of a single shared table across threads. The effect starts to be sensible with a high number of threads. Specifically we have a small regression with 7 threads both at 60 and 180 seconds TC: 10000 @ 60+0.6 th 7 ELO: -2.46 +-3.2 (95%) LOS: 6.5% Total: 9896 W: 1037 L: 1107 D: 7752 5000 @ 180+0.6 th 7 ELO: -1.95 +-4.1 (95%) LOS: 17.7% Total: 5000 W: 444 L: 472 D: 4084 We have a regression because counterMoveHistory table is quite big and it takes time for a single thread to fill it. Sharing the table yields to a higher fill rate and better quality of moves and up to 7 threads the benefits of sharing more then compensate the loss in speed due to contention. Interestingly even with a 3X longer TC, so with more time for the single thread to catch up, the improvment is quite limited and below noise level. It seems we really need much longer TC to saturate the table. When we move to high threads number it's another story: 5000 @ 60+0.6 th 22 ELO: 3.49 +-4.3 (95%) LOS: 94.6% Total: 4880 W: 490 L: 441 D: 3949 2000 @ 60+0.6 th 32 ELO: 8.34 +-6.9 (95%) LOS: 99.1% Total: 2000 W: 229 L: 181 D: 1590 As expected the speed-up more than compensates the filling rate, and we expect that with tournament TC, where single thread is able to saturate the table, the difference will be even stronger. For instance for TCEC 9 super-final time control will be 180 minutes + 15 seconds and this scalability improvement seems definitely the way to go. So, summarizing: GOOD: Measured big improvement in high core scenario Suitable for TCEC 9 superfinal (big hardware, very long TC) Consistent and natural patch that extends to counterMoveHistory what we already do for remaining history tables, that are all per-thread Non functional change for the common case of a single core Very simple (just 6 lines modified, no added ones) BAD: Small regression (within 2-3 ELO) with few threads and short TC bench: 5341477	2016-09-16 08:15:07 +02:00
VoyagerOne	b3525fa9ea	Use Color-From-To history stats to help sort moves STC: LLR: 2.95 (-2.94,2.94) [0.00,5.00] Total: 33502 W: 6498 L: 6223 D: 20781 http://tests.stockfishchess.org/tests/view/578abb940ebc5972faa169e2 LTC: LLR: 2.95 (-2.94,2.94) [0.00,5.00] Total: 50782 W: 7124 L: 6832 D: 36826 http://tests.stockfishchess.org/tests/view/578b8e5d0ebc5972faa169fd LTC: (Sanity test against latest master) LLR: 2.95 (-2.94,2.94) [0.00,5.00] Total: 32759 W: 4600 L: 4370 D: 23789 http://tests.stockfishchess.org/tests/view/5798b7d30ebc591c761f5b72 bench: 6985912 P.S. Thanks @mstembera for rewriting my code to make it smp compatible. A BIG thank you!	2016-08-02 09:17:14 +02:00
Marco Costalba	ca14345ba2	Filter root moves filter before copy to threads Currently root moves are copied to all teh threads but are DTZ filtered only in main thread at the beginning of teh search. This patch moves the TB filtering before the copy of root moves fixing issue #679 https://github.com/official-stockfish/Stockfish/issues/679 No bench change.	2016-06-11 09:24:40 +02:00
Marco Costalba	7eaea3848c	StateInfo is usually allocated on the stack by search() And passed in do_move(), this ensures maximum efficiency and speed and at the same time unlimited move numbers. The draw back is that to handle Position init we need to reserve a StateInfo inside Position itself and use at init time and when copying from another Position. After lazy SMP we don't need anymore this gimmick and we can get rid of this special case and always pass an external StateInfo to Position object. Also rewritten and simplified Position constructors. Verified it does not regress with a 3 threads SMP test: ELO: -0.00 +-12.7 (95%) LOS: 50.0% Total: 1000 W: 173 L: 173 D: 654 No functional change.	2016-04-17 08:29:33 +02:00
Marco Costalba	356147d99a	Rewrite time formula Time management is really too complex, our aim is to simplify it, but for time being at least rewrite in an understandable way. No functional change.	2016-01-18 17:12:18 +01:00
Lyudmil Antonov	89723339d9	Assorted English grammar changes No functional change Resolves #567	2016-01-16 21:34:29 +00:00
Leonid Pechenik	9eceb894ac	Adjust time used for move based on previous score Use less time if evaluation is not worse than for previous move and even less time if in addition no fail low encountered for current iteration. STC: 10+0.1 ELO: 5.37 +-2.9 (95%) LOS: 100.0% Total: 20000 W: 3832 L: 3523 D: 12645 STC: 10+0.1 LLR: 2.96 (-2.94,2.94) [0.00,5.00] Total: 17527 W: 3334 L: 3132 D: 11061 LTC: 60+0.6 LLR: 2.95 (-2.94,2.94) [0.00,5.00] Total: 28233 W: 3939 L: 3725 D: 20569 LTC: 60+0.6 ELO: 2.43 +-1.4 (95%) LOS: 100.0% Total: 60000 W: 8266 L: 7847 D: 43887 LTC: 60+0.06 LLR: 2.95 (-2.94,2.94) [-1.00,3.00] Total: 38932 W: 5408 L: 5207 D: 28317 Resolves #547	2016-01-03 14:01:15 +00:00
ppigazzini	d4af15f682	Update AUTHORS and copyright notice No functional change Resolves #555	2016-01-02 09:43:51 +00:00
Marco Costalba	9742fb10fd	Update Copyright year No functional change. Resolves #554	2016-01-01 10:17:36 +00:00
Marco Costalba	1b5b900a29	Move some globals into main thread scope Make it explicit that those variables are not globals, but are used only by main thread. I think it is a sensible clarification because easy move is already tricky enough and current patch makes the involved actors explicit. No functional change. Resolves #537	2015-12-27 19:29:16 +00:00
Marco Costalba	93195555ed	Rewrite how threads are spawned Instead of creating a running std::thread and returning, wait in Thread c'tor that the native thread of execution goes to sleep in idle_loop(). In this way we can simplify how search is started, because when main thread is idle we are sure also all other threads will be idle, in any case, even at thread creation and startup. After lazy smp went in, we can simpify and rewrite a lot of logic that is now no more needed. This is hopefully the final big cleanup. Tested for no regression at 5+0.1 with 3 threads: LLR: 2.95 (-2.94,2.94) [-5.00,0.00] Total: 17411 W: 3248 L: 3198 D: 10965 No functional change.	2015-11-21 07:48:50 +01:00
Marco Costalba	76ed0ab501	Retire ThreadBase Now that we don't have anymore TimerThread, there is no need of this long class hierarchy. Also assorted reformatting while there. To verify no regression, passed at STC with 7 threads: LLR: 2.97 (-2.94,2.94) [-5.00,0.00] Total: 30990 W: 4945 L: 4942 D: 21103 No functional change.	2015-11-13 08:22:44 +01:00
Marco Costalba	9c9205860c	Get rid of timer thread Unfortunately std::condition_variable::wait_for() is not accurate in general case and the timer thread can wake up also after tens or even hundreds of millisecs after time has elapsded. CPU load, process priorities, number of concurrent threads, even from other processes, will have effect upon it. Even official documentation says: "This function may block for longer than timeout_duration due to scheduling or resource contention delays." So retire timer and use a polling scheme based on a local thread counter that counts search() calls and a small trick to keep polling frequency constant, independently from the number of threads. Tested for no regression at very fast TC 2+0.05 th 7: LLR: 2.96 (-2.94,2.94) [-3.00,1.00] Total: 32969 W: 6720 L: 6620 D: 19629 TC 2+0.05 th 1: LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 7765 W: 1917 L: 1765 D: 4083 And at STC TC, both single thread LLR: 2.96 (-2.94,2.94) [-3.00,1.00] Total: 15587 W: 3036 L: 2905 D: 9646 And with 7 threads LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 8149 W: 1367 L: 1227 D: 5555 bench: 8639247	2015-11-03 11:27:00 +01:00
mbootsector	27c5cb5912	Pick bestmove from the deepest thread. STC: LLR: 2.96 (-2.94,2.94) [0.00,5.00] Total: 26930 W: 4441 L: 4214 D: 18275 LTC: LLR: 2.96 (-2.94,2.94) [0.00,5.00] Total: 7783 W: 1017 L: 876 D: 5890 No functional change in single thread mode Resolves #485	2015-11-02 10:05:43 +00:00
Marco Costalba	86f04dbcc0	Assorted trivia in search.cpp The only interesting change is the moving of stack[MAX_PLY+4] back to its original position in id_loop (now renamed Thread::search). No functional change.	2015-10-31 19:26:35 +01:00
Stéphane Nicolet	80d7556af7	Some code and comment cleanup - Remove all references to split points - Some grammar and spelling fixes No Functional change Resolves #478	2015-10-29 15:28:59 +00:00
lucasart	00d9e9fd28	Use atomics instead of volatile Rely on well defined behaviour for message passing, instead of volatile. Three versions have been tested, to make sure this wouldn't cause a slowdown on any platform. v1: Sequentially consistent atomics No mesurable regression, despite the extra memory barriers on x86. Even with 15 threads and extreme time pressure, both acting as a magnifying glass: threads=15, tc=2+0.02 ELO: 2.59 +-3.4 (95%) LOS: 93.3% Total: 18132 W: 4113 L: 3978 D: 10041 threads=7, tc=2+0.02 ELO: -1.64 +-3.6 (95%) LOS: 18.8% Total: 16914 W: 4053 L: 4133 D: 8728 v2: Acquire/Release semantics This version generates no extra barriers for x86 (on the hot path). As expected, no regression either, under the same conditions: threads=15, tc=2+0.02 ELO: 2.85 +-3.3 (95%) LOS: 95.4% Total: 19661 W: 4640 L: 4479 D: 10542 threads=7, tc=2+0.02 ELO: 0.23 +-3.5 (95%) LOS: 55.1% Total: 18108 W: 4326 L: 4314 D: 9468 As suggested by Joona, another test at LTC: threads=15, tc=20+0.05 ELO: 0.64 +-2.6 (95%) LOS: 68.3% Total: 20000 W: 3053 L: 3016 D: 13931 v3: Final version: SeqCst/Relaxed threads=15, tc=10+0.1 ELO: 0.87 +-3.9 (95%) LOS: 67.1% Total: 9541 W: 1478 L: 1454 D: 6609 Resolves #474	2015-10-25 09:15:45 +00:00
Marco Costalba	307a5a4f63	Cleanup history stats And other assorted trivia. No functional change.	2015-10-24 17:29:12 +02:00
mbootsector	ecc5ff6693	Lazy SMP Start all threads searching on root position and use only the shared TT table as synching scheme. It seems this scheme scales better than YBWC for high number of threads. Verified for nor regression at STC 3 threads LLR: -2.95 (-2.94,2.94) [-3.00,1.00] Total: 40232 W: 6908 L: 7130 D: 26194 Verified for nor regression at LTC 3 threads LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 28186 W: 3908 L: 3798 D: 20480 Verified for nor regression at STC 7 threads LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 3607 W: 674 L: 526 D: 2407 Verified for nor regression at LTC 7 threads LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 4235 W: 671 L: 528 D: 3036 Tested with fixed games at LTC with 20 threads ELO: 44.75 +-7.6 (95%) LOS: 100.0% Total: 2069 W: 407 L: 142 D: 1520 Tested with fixed games at XLTC (120secs) with 20 threads ELO: 28.01 +-6.7 (95%) LOS: 100.0% Total: 2275 W: 349 L: 166 D: 1760 Original patch of mbootsector, with additional work from Ivan Ivec (log formula), Joerg Oster (id loop simplification) and Marco Costalba (assorted formatting and rework). Bench: 8116244	2015-10-20 06:58:08 +02:00
Joona Kiiski	a7381d5e81	Fully yielding locks, no spinning 7 threads: ELO: 2.00 +-2.7 (95%) LOS: 92.4% Total: 20000 W: 3276 L: 3161 D: 13563 There is no functional change in single thread mode Resolves #304	2015-03-24 21:34:19 +00:00
Marco Costalba	be77406a55	Get rid of nativeThread No functional change.	2015-03-23 09:02:52 +01:00
Marco Costalba	26dabb1e6b	Use only one ConditionVariable to sync UI To sync UI with main thread it is enough a single condition variable because here we have a single producer / single consumer design pattern. Two condition variables are strictly needed just for many producers / many consumers case. Note that this is possible because now we don't send to sleep idle threads anymore while searching, so that now only UI can wake up the main thread and we can use the same ConditionVariable for both threads. The natural consequence is to retire wait_for_think_finished() and move all the logic under MainThread class, yielding the rename of teh function to join() No functional change.	2015-03-21 07:55:33 +01:00
Marco Costalba	13d4df95cd	Use acquire() and release() for spinlocks It is more idiomatick than lock() and unlock() No functional change.	2015-03-16 08:14:08 +01:00
Joona Kiiski	d71f707040	Introduce yielding spin locks Idea and original implementation by Stephane Nicolet 7 threads 15+0.05 ELO: 3.54 +-2.9 (95%) LOS: 99.2% Total: 17971 W: 2976 L: 2793 D: 12202 There is no functional change in single thread mode	2015-03-14 19:14:52 +00:00
Joona Kiiski	81c7975dcd	Use thread specific mutexes instead of a global one. This is necessary to improve the scalability with high number of cores. There is no functional change in a single thread mode. Resolves #281	2015-03-11 21:59:34 +00:00
Marco Costalba	4b59347194	Retire spinlocks Use Mutex instead. This is in preparaation for merging with master branch, where we stilll don't have spinlocks. Eventually spinlocks will be readded in some future patch, once c++11 has been merged. No functional change.	2015-03-11 21:20:47 +01:00
Marco Costalba	04372316b3	Disable spinlocks To allow testing on fishtest. No functional change.	2015-03-10 12:47:49 +01:00
Marco Costalba	8725494966	Add thread_win32.h header Workaround slow std::thread implementation in mingw and gcc for Windows with our own old low level thread functions. No functional change.	2015-03-10 12:42:40 +01:00
Marco Costalba	a590d1d52d	Re-enable spinlocks For branch C++11, that doe snot run on fishtest, there is no need of this kludge, let only master have it. No functional change.	2015-03-07 08:38:26 +01:00
Marco Costalba	6645115377	Allow to disable spinlocks And use mutex instead. You may never want to do this. It is a workaround to run c++11 on fishtest where many machiens have HTenabled and this can be a problem when number of cores set is higher than number of physical cores. To disable spinlocks, just compile with -DNO_SPINLOCK flag No functional change.	2015-03-01 17:16:05 +01:00
Marco Costalba	63a5fc2366	Rename available_to() Change this API to be more natural and simple. Inspired by a patch by Joona. No functional change.	2015-03-01 12:33:05 +01:00
Marco Costalba	d3d26a94b3	Improve spinlock implementation Calling lock.test_and_set() in a tight loop creates expensive memory synchronizations among processors and penalize other running threads. So syncronize only only once at the beginning with fetch_sub() and then loop on a simple load() that puts much less pressure on the system. Reported about 2-3% speed up on various systems. Patch by Ronald de Man. No functional change.	2015-02-23 19:48:46 +01:00

1 2 3 4 5 ...

275 commits