BadFish

mirror of https://github.com/sockspls/badfish synced 2025-07-14 13:09:14 +00:00

Author	SHA1	Message	Date
lucasart	00d9e9fd28	Use atomics instead of volatile Rely on well defined behaviour for message passing, instead of volatile. Three versions have been tested, to make sure this wouldn't cause a slowdown on any platform. v1: Sequentially consistent atomics No mesurable regression, despite the extra memory barriers on x86. Even with 15 threads and extreme time pressure, both acting as a magnifying glass: threads=15, tc=2+0.02 ELO: 2.59 +-3.4 (95%) LOS: 93.3% Total: 18132 W: 4113 L: 3978 D: 10041 threads=7, tc=2+0.02 ELO: -1.64 +-3.6 (95%) LOS: 18.8% Total: 16914 W: 4053 L: 4133 D: 8728 v2: Acquire/Release semantics This version generates no extra barriers for x86 (on the hot path). As expected, no regression either, under the same conditions: threads=15, tc=2+0.02 ELO: 2.85 +-3.3 (95%) LOS: 95.4% Total: 19661 W: 4640 L: 4479 D: 10542 threads=7, tc=2+0.02 ELO: 0.23 +-3.5 (95%) LOS: 55.1% Total: 18108 W: 4326 L: 4314 D: 9468 As suggested by Joona, another test at LTC: threads=15, tc=20+0.05 ELO: 0.64 +-2.6 (95%) LOS: 68.3% Total: 20000 W: 3053 L: 3016 D: 13931 v3: Final version: SeqCst/Relaxed threads=15, tc=10+0.1 ELO: 0.87 +-3.9 (95%) LOS: 67.1% Total: 9541 W: 1478 L: 1454 D: 6609 Resolves #474	2015-10-25 09:15:45 +00:00
Marco Costalba	307a5a4f63	Cleanup history stats And other assorted trivia. No functional change.	2015-10-24 17:29:12 +02:00
mbootsector	ecc5ff6693	Lazy SMP Start all threads searching on root position and use only the shared TT table as synching scheme. It seems this scheme scales better than YBWC for high number of threads. Verified for nor regression at STC 3 threads LLR: -2.95 (-2.94,2.94) [-3.00,1.00] Total: 40232 W: 6908 L: 7130 D: 26194 Verified for nor regression at LTC 3 threads LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 28186 W: 3908 L: 3798 D: 20480 Verified for nor regression at STC 7 threads LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 3607 W: 674 L: 526 D: 2407 Verified for nor regression at LTC 7 threads LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 4235 W: 671 L: 528 D: 3036 Tested with fixed games at LTC with 20 threads ELO: 44.75 +-7.6 (95%) LOS: 100.0% Total: 2069 W: 407 L: 142 D: 1520 Tested with fixed games at XLTC (120secs) with 20 threads ELO: 28.01 +-6.7 (95%) LOS: 100.0% Total: 2275 W: 349 L: 166 D: 1760 Original patch of mbootsector, with additional work from Ivan Ivec (log formula), Joerg Oster (id loop simplification) and Marco Costalba (assorted formatting and rework). Bench: 8116244	2015-10-20 06:58:08 +02:00
Marco Costalba	3c0fe1d9b2	Rework lock protecting When changing 'search' and 'splitPointsSize' we have to use thread locks, not split point ones, because can_join() is called under the formers. Verified succesfully with 24 hours toruture tests with 20 cores machine by Louis Zulli: it does not hangs. Verifyed for no regressions with STC, 7 threads: LLR: 2.94 (-2.94,2.94) [-3.00,1.00] Total: 52804 W: 8159 L: 8087 D: 36558 No functional change.	2015-09-30 10:47:20 +02:00
Joona Kiiski	613dc66c12	Careful SMP locking - Fix very occasional hangs Louis Zulli reported that Stockfish suffers from very occasional hangs with his 20 cores machine. Careful SMP debugging revealed that this was caused by "a ghost split point slave", where thread was marked as a split point slave, but wasn't actually working on it. The only logical explanation for this was double booking, where due to SMP race, the same thread is booked for two different split points simultaneously. Due to very intermittent nature of the problem, we can't say exactly how this happens. The current handling of Thread specific variables is risky though. Volatile variables are in some cases changed without spinlock being hold. In this case standard doesn't give us any kind of guarantees about how the updated values are propagated to other threads. We resolve the situation by enforcing very strict locking rules: - Values for key thread variables (splitPointsSize, activeSplitPoint, searching) can only be changed when the thread specific spinlock is held. - Structural changes for splitPoints[] are only allowed when the thread specific spinlock is held. - Thread booking decisions (per split point) can only be done when the thread specific spinlock is held. With these changes hangs didn't occur anymore during 2 days torture testing on Zulli's machine. We probably have a slight performance penalty in SMP mode due to more locking. STC (7 threads): ELO: -1.00 +-2.2 (95%) LOS: 18.4% Total: 30000 W: 4538 L: 4624 D: 20838 However stability is worth more than 1-2 ELO points in this case. No functional change Resolves #422	2015-09-10 19:15:43 +01:00
Marco Costalba	fb03188fc7	Assorted cleanup of last patches No functional change.	2015-04-11 23:24:43 +02:00
Stéphane Nicolet	2ca142a5b4	Use minimumSplitDepth = 5 Using minimumSplitDepth = 5 seems to be the best compromise in the current SMP implementation STC, 11 threads: ELO: 14.87 +-4.1 (95%) LOS: 100.0% Total: 8509 W: 1497 L: 1133 D: 5879 STC, 4 threads: ELO: 0.30 +-2.8 (95%) LOS: 58.2% Total: 20000 W: 3365 L: 3348 D: 13287 STC, 2 threads: ELO: -1.02 +-2.0 (95%) LOS: 16.4% Total: 40000 W: 7087 L: 7204 D: 25709 Resolves #324	2015-04-09 20:32:36 +01:00
Marco Costalba	5d1b92e8f9	Introduce elapsed_time() And reformat a bit time manager code. Note that now we set starting search time in think() and no more in ThreadPool::start_thinking(), the added delay is less than 1 msec, so below timer resolution (5msec) and should not affect time lossses ratio. No functional change.	2015-04-03 04:19:26 +02:00
Marco Costalba	dc3a5f791e	Allow Bitbases::init() to be called more than once Currently if we call it more than once, we crash. This is not a real problem, because this function is indeed called just once. Nevertheless with this small fix, that gets rid of a hidden 'static' variable, we cleanly resolve the issue. While there, fix also ThreadPool::exit to return in a consistent state. Now all the init() functions but UCI::init() are reentrant and can be called multiple times. No functional change.	2015-03-23 17:14:31 +01:00
Marco Costalba	be77406a55	Get rid of nativeThread No functional change.	2015-03-23 09:02:52 +01:00
Marco Costalba	26dabb1e6b	Use only one ConditionVariable to sync UI To sync UI with main thread it is enough a single condition variable because here we have a single producer / single consumer design pattern. Two condition variables are strictly needed just for many producers / many consumers case. Note that this is possible because now we don't send to sleep idle threads anymore while searching, so that now only UI can wake up the main thread and we can use the same ConditionVariable for both threads. The natural consequence is to retire wait_for_think_finished() and move all the logic under MainThread class, yielding the rename of teh function to join() No functional change.	2015-03-21 07:55:33 +01:00
Marco Costalba	9a6cfee73b	Simplify nosleep logic Avoid redundant 'while' conditions. It is enough to check them in the outer loop. Quick tested for no regression 10K games at 4 threads ELO: -1.32 +-3.9 (95%) LOS: 25.6% Total: 10000 W: 1653 L: 1691 D: 6656 No functional change.	2015-03-18 08:01:50 +01:00
Marco Costalba	13d4df95cd	Use acquire() and release() for spinlocks It is more idiomatick than lock() and unlock() No functional change.	2015-03-16 08:14:08 +01:00
Joona Kiiski	f04f50b368	Do not sleep, but yield During the search, do not block on condition variable, but instead use std::this_thread::yield(). Clear gain with 16 threads. Again results vary highly depending on hardware, but on average it's a clear gain. ELO: 12.17 +-4.3 (95%) LOS: 100.0% Total: 7998 W: 1407 L: 1127 D: 5464 There is no functional change in single thread mode Resolves #294	2015-03-15 19:45:30 +00:00
Joona Kiiski	d71f707040	Introduce yielding spin locks Idea and original implementation by Stephane Nicolet 7 threads 15+0.05 ELO: 3.54 +-2.9 (95%) LOS: 99.2% Total: 17971 W: 2976 L: 2793 D: 12202 There is no functional change in single thread mode	2015-03-14 19:14:52 +00:00
Joona Kiiski	81c7975dcd	Use thread specific mutexes instead of a global one. This is necessary to improve the scalability with high number of cores. There is no functional change in a single thread mode. Resolves #281	2015-03-11 21:59:34 +00:00
Marco Costalba	4b59347194	Retire spinlocks Use Mutex instead. This is in preparaation for merging with master branch, where we stilll don't have spinlocks. Eventually spinlocks will be readded in some future patch, once c++11 has been merged. No functional change.	2015-03-11 21:20:47 +01:00
Marco Costalba	8725494966	Add thread_win32.h header Workaround slow std::thread implementation in mingw and gcc for Windows with our own old low level thread functions. No functional change.	2015-03-10 12:42:40 +01:00
Marco Costalba	63a5fc2366	Rename available_to() Change this API to be more natural and simple. Inspired by a patch by Joona. No functional change.	2015-03-01 12:33:05 +01:00
Marco Costalba	0b36ba74fc	Don't assume the type of Time::point But instead use the proper definition. Also rewrite chrono functions while there. No functional change.	2015-02-24 14:08:14 +01:00
Marco Costalba	38112060dc	Use spinlock instead of mutex for Threads and SplitPoint It is reported to be defenitly faster with increasing number of threads, we go from a +3.5% with 4 threads to a +15% with 16 threads. The only drawback is that now when testing with more threads than physical available cores, the speed slows down to a crawl. This is expected and was similar at what we had setting the old sleepingThreads to false. No functional change.	2015-02-23 13:47:07 +01:00
Marco Costalba	7ff965eebf	Improve comments in SMP code No functional change.	2015-02-20 12:38:54 +01:00
Marco Costalba	40548c9153	Sync with master bench: 7911944	2015-02-20 10:37:29 +01:00
Marco Costalba	950c8436ed	Use size_t consistently across thread code No functional change.	2015-02-19 10:43:28 +01:00
Marco Costalba	8d47caa16e	Retire redundant sp->slavesCount field It should be used slavesMask.count() instead. Verified 100% equivalent when sp->allSlavesSearching: dbg_hit_on(sp->allSlavesSearching, sp->slavesCount != sp->slavesMask.count()); No functional change.	2015-02-19 10:36:15 +01:00
Marco Costalba	dccaa145d2	Compute SplitPoint::spLevel on the fly And retire a redundant field. This is important also from a concept point of view becuase we want to keep SMP structures as simple as possible with the only strictly necessary data. Verified with dbg_hit_on(sp->spLevel != level) that the values are 100% the same out of more 50K samples. No functional change.	2015-02-18 21:50:35 +01:00
Joona Kiiski	d65f75c153	Improve smp performance for high number of threads Balance threads between split points. There are huge differences between different machines and autopurging makes it very difficult to measure the improvement in fishtest, but the following was recorded for 16 threads at 15+0.05: For Bravone (1000 games): 0 ELO For Glinscott (1000 games): +20 ELO For bKingUs (1000 games): +50 ELO For fastGM (1500 games): +50 ELO The change was regression for no one, and a big improvement for some, so it should be fine to commit it. Also for 8 threads at 15+0.05 we measured a statistically significant improvement: ELO: 6.19 +-3.9 (95%) LOS: 99.9% Total: 10325 W: 1824 L: 1640 D: 6861 Finally it was verified that there was no (significant) regression for 4 threads: ELO: 0.09 +-2.8 (95%) LOS: 52.4% Total: 19908 W: 3422 L: 3417 D: 13069 2 threads: ELO: 0.38 +-3.0 (95%) LOS: 60.0% Total: 19044 W: 3480 L: 3459 D: 12105 1 thread: ELO: -1.27 +-2.1 (95%) LOS: 12.3% Total: 40000 W: 7829 L: 7975 D: 24196 Resolves #258	2015-02-16 20:36:13 +00:00
Marco Costalba	65f46794af	Implicit conversion from ExtMove to Move Verified with perft there is no speed regression, and code is simpler. It is also conceptually correct becuase an extended move is just a move that happens to have also a score. No functional change.	2015-01-31 19:22:07 +01:00
Marco Costalba	3c07603dac	Import C++11 branch Import C++11 branch from: https://github.com/mcostalba/Stockfish/tree/c++11 The version imported is teh last one as of today: `6670e93e50` Branch is fully equivalent with master but syzygy tablebases that are missing (but will be added with next commit). bench: 8080602	2015-01-18 08:00:50 +01:00
Marco Costalba	4eb2d8ce09	Assorted headers cleanup Mostly comments fixing and other small things. No functional change.	2015-01-11 22:56:35 +01:00
Marco Costalba	42b48b08e8	Update copyright year No functional change.	2015-01-10 11:46:28 +01:00
Marco Costalba	62f531254e	Fix comments in thread.cpp And reshuffle a bit the functions to place them in a consistent order. To be on the safe side, patch has been validated for no regression/crashes with a small 8K games test with 3 threads: ELO: 3.98 +-4.4 (95%) LOS: 96.3% Total: 8388 W: 1500 L: 1404 D: 5484 No functional change.	2015-01-03 09:34:58 +01:00
hxim	fbb53524ef	Rename some variables for more clarity. No functional change. Resolves #131	2014-12-08 07:53:33 +08:00
Gary Linscott	4739037f96	100% accurate PV display This gives SF accurate PVs, such that the evaluation of the leaf node in the PV matches the score backed up to the root (99% of the time. q-search will use the value stored in the hash table instead of the eval value sometimes). One drawback is that fail-high/low only get a minimal 2 move PV. It doesn't add any additional overhead to the non-PV codepath except an extra eight bytes to the SearchStack structure in multi-threaded searches. A core part of this is not pruning based on TT score in PV nodes. This was measured as not being a regression at multiple TCs, except for one exception, fast TC with huge hash, which is not realistic for longer searches. STC - 1 thread, 128 mb hash ELO: 1.42 +-3.1 (95%) LOS: 81.9% Total: 20000 W: 4078 L: 3996 D: 11926 STC - 3 thread, 128 mb hash ELO: -3.60 +-2.9 (95%) LOS: 0.8% Total: 20000 W: 3575 L: 3782 D: 12643 STC - 3 thread, 8 mb hash ELO: 0.12 +-2.9 (95%) LOS: 53.3% Total: 20000 W: 3654 L: 3647 D: 12699 LTC - 3 thread, 32mb hash ELO: 2.29 +-2.0 (95%) LOS: 98.8% Total: 35740 W: 5618 L: 5382 D: 24740 Bench: 6984058 Resolves #102	2014-11-12 16:16:33 -05:00
Marco Costalba	5cbcff55cc	Rename ucioption.h to uci.h We are going to add all UCI related functions here, so first rename it to a more proper name. No functional change.	2014-10-26 19:39:46 +00:00
Joona Kiiski	eb50793cff	Retire FakeSplit - Currently broken - Never been really useful - Does not work well with new splitting model Verified for no regression at STC with 3 threads: LLR: 2.96 (-2.94,2.94) [-6.00,0.00] Total: 81905 W: 12122 L: 12381 D: 57402 No functional change	2014-07-09 07:19:06 +08:00
kinderchocolate	6f48367094	Add some const qualifier No functional change.	2014-06-03 11:43:52 +02:00
Marco Costalba	9f843adf89	Retire "Idle Threads Sleep" UCI option After last Joona's patch there is no measurable difference between the option set or unset. Tested by Andreas Strangmüller with 16 threads on his Dual Opteron 6376. After 5000 games at 15+0.05 the result is: 1 Stockfish_14050822_T16_on : 3003 5000 (+849,=3396,-755), 50.9 % 2 Stockfish_14050822_T16_off : 2997 5000 (+755,=3396,-849), 49.1 % bench: 880215	2014-05-11 10:29:56 +02:00
Marco Costalba	6ba1d3ead6	Clarify some comments in SMP code Spotted by Joona. No functional change.	2014-05-08 09:09:35 +02:00
Marco Costalba	7e3dba4f4c	Reformat and simplify previous patch No functional change.	2014-05-07 08:56:16 +02:00
Joona Kiiski	f6e98a924a	Allow a slave to 'late join' another splitpoint Instead of waiting to be allocated, actively search for another split point to join when finishes its search. Also modify split conditions. This patch has been tested with 7 threads SMP and passed both STC: LLR: 2.97 (-2.94,2.94) [-1.50,4.50] Total: 2885 W: 519 L: 410 D: 1956 And a reduced-LTC at 25+0.05 LLR: 2.95 (-2.94,2.94) [0.00,6.00] Total: 4401 W: 684 L: 566 D: 3151 Was then retested against regression in 3 thread case at standard LTC of 60+0.05: LLR: 2.96 (-2.94,2.94) [-4.00,0.00] Total: 40809 W: 5446 L: 5406 D: 29957 bench: 8802105	2014-05-07 08:38:56 +02:00
Marco Costalba	aab5863dd4	Increase max threads to 128 Thanks to std::bitset we can easily increase the limit of active threads above 64. Thanks to Lucas Braesch for pointing at the correct solution of using std::bitset. No functional change.	2014-03-18 12:07:26 +01:00
Marco Costalba	a091ae4cc8	Split also if no slaves are found Because we test for available slaves before entering split(), we almost always allocate a slave, only in the rare case of a race (less then 2% of cases) this is not true, but to special case this occurrence is not worth the added complexity. bench: 7451319	2014-03-15 23:43:35 +01:00
Marco Costalba	a1a7bc84da	Remove "Max Threads per Split Point" UCI option Experimental patch to verify if drop of nps in endgames at very long TC is due to this. Suggested by Ronald de Man. bench: 7451319	2014-03-15 21:26:04 +01:00
Marco Costalba	3e5470d88f	Remove limit of minimumSplitDepth There is no reason why an user cannot set it at a value less than 4. No functional change.	2014-03-01 23:22:14 +01:00
Marco Costalba	41641e3b1e	Assorted tweaks from DON Mainly renames and some little code style improvment, inspired by looking at DON sources: https://github.com/erashid/DON No functional change.	2014-02-09 17:31:45 +01:00
Marco Costalba	c9dcda6ac4	Update copyright year No functional change.	2014-01-02 01:49:18 +01:00
Lucas Braesch	f5727deee3	Remove threat move stuff A great simplification that shows no regression and it seems even a bit scalable. Tested with fixed number of games: Short TC ELO: 0.60 +-2.1 (95%) LOS: 71.1% Total: 39554 W: 7477 L: 7409 D: 24668 Long TC ELO: 2.97 +-2.0 (95%) LOS: 99.8% Total: 36424 W: 5894 L: 5583 D: 24947 bench: 8184352	2013-12-15 09:43:29 +01:00
Arjun Temurnikar	431c3ac485	Even more spelling fixes No functional change.	2013-12-06 09:03:24 +01:00
Jerry Donald	a8af78c833	Another round of spelling fixes And also renamed a loop variable while there. No functional change.	2013-12-02 23:51:29 +01:00

1 2 3 4 5 ...

292 commits