1
0
Fork 0
mirror of https://github.com/sockspls/badfish synced 2025-04-30 08:43:09 +00:00
Commit graph

192 commits

Author SHA1 Message Date
Marco Costalba
32d2c4e12b Move Thread::idle_loop() where it belongs
No functional change.
2015-10-17 08:07:07 +02:00
Marco Costalba
b01ad9ba18 Reformat lazy smp code
Just a first quick pass. Probably Skill and MultiPV
need some work too.

No functional change.
2015-10-17 08:07:03 +02:00
mbootsector
2d668a3cfc Lazy smp
Start all threads searching on root position and
use only the shared TT table as synching scheme.

It seems this scheme scales better than YBWC for
high number of threads.

Tested at very LTC (120+0.1) with 23 threads
ELO: 35.52 +-9.6 (95%) LOS: 100.0%
Total: 1109 W: 183 L: 70 D: 856

Tested at LTC with 23 threads
ELO: 34.41 +-9.9 (95%) LOS: 100.0%
Total: 1094 W: 184 L: 76 D: 834

Tested at LTC with 7 threads
ELO: 8.76 +-5.0 (95%) LOS: 100.0%
Total: 5000 W: 735 L: 609 D: 3656

Tested at STC with 7 threads
ELO: 16.76 +-5.4 (95%) LOS: 100.0%
Total: 5000 W: 899 L: 658 D: 3443

Bench: 8397672
2015-10-17 08:06:59 +02:00
Marco Costalba
3c0fe1d9b2 Rework lock protecting
When changing 'search' and 'splitPointsSize' we have to
use thread locks, not split point ones, because can_join()
is called under the formers.

Verified succesfully with 24 hours toruture tests with 20
cores machine by Louis Zulli: it does not hangs.

Verifyed for no regressions with STC, 7 threads:
LLR: 2.94 (-2.94,2.94) [-3.00,1.00]
Total: 52804 W: 8159 L: 8087 D: 36558

No functional change.
2015-09-30 10:47:20 +02:00
Joona Kiiski
613dc66c12 Careful SMP locking - Fix very occasional hangs
Louis Zulli reported that Stockfish suffers from very occasional hangs with his 20 cores machine.

Careful SMP debugging revealed that this was caused by "a ghost split point slave", where thread
was marked as a split point slave, but wasn't actually working on it.

The only logical explanation for this was double booking, where due to SMP race, the same thread
is booked for two different split points simultaneously.

Due to very intermittent nature of the problem, we can't say exactly how this happens.

The current handling of Thread specific variables is risky though. Volatile variables are in some
cases changed without spinlock being hold. In this case standard doesn't give us any kind of
guarantees about how the updated values are propagated to other threads.

We resolve the situation by enforcing very strict locking rules:
- Values for key thread variables (splitPointsSize, activeSplitPoint, searching)
can only be changed when the thread specific spinlock is held.
- Structural changes for splitPoints[] are only allowed when the thread specific spinlock is held.
- Thread booking decisions (per split point) can only be done when the thread specific spinlock is held.

With these changes hangs didn't occur anymore during 2 days torture testing on Zulli's machine.

We probably have a slight performance penalty in SMP mode due to more locking.

STC (7 threads):
ELO: -1.00 +-2.2 (95%) LOS: 18.4%
Total: 30000 W: 4538 L: 4624 D: 20838

However stability is worth more than 1-2 ELO points in this case.

No functional change

Resolves #422
2015-09-10 19:15:43 +01:00
Marco Costalba
fb03188fc7 Assorted cleanup of last patches
No functional change.
2015-04-11 23:24:43 +02:00
Stéphane Nicolet
2ca142a5b4 Use minimumSplitDepth = 5
Using minimumSplitDepth = 5 seems to be the best compromise in the
current SMP implementation

STC, 11 threads:

ELO: 14.87 +-4.1 (95%) LOS: 100.0%
Total: 8509 W: 1497 L: 1133 D: 5879

STC, 4 threads:

ELO: 0.30 +-2.8 (95%) LOS: 58.2%
Total: 20000 W: 3365 L: 3348 D: 13287

STC, 2 threads:

ELO: -1.02 +-2.0 (95%) LOS: 16.4%
Total: 40000 W: 7087 L: 7204 D: 25709

Resolves #324
2015-04-09 20:32:36 +01:00
Marco Costalba
5d1b92e8f9 Introduce elapsed_time()
And reformat a bit time manager code.

Note that now we set starting search time in think() and
no more in ThreadPool::start_thinking(), the added delay
is less than 1 msec, so below timer resolution (5msec) and
should not affect time lossses ratio.

No functional change.
2015-04-03 04:19:26 +02:00
Marco Costalba
dc3a5f791e Allow Bitbases::init() to be called more than once
Currently if we call it more than once, we crash.

This is not a real problem, because this function is
indeed called just once. Nevertheless with this small fix,
that gets rid of a hidden 'static' variable, we cleanly
resolve the issue.

While there, fix also ThreadPool::exit to return in a
consistent state. Now all the init() functions but
UCI::init() are reentrant and can be called multiple
times.

No functional change.
2015-03-23 17:14:31 +01:00
Marco Costalba
be77406a55 Get rid of nativeThread
No functional change.
2015-03-23 09:02:52 +01:00
Marco Costalba
26dabb1e6b Use only one ConditionVariable to sync UI
To sync UI with main thread it is enough a single
condition variable because here we have a single
producer / single consumer design pattern.

Two condition variables are strictly needed just for
many producers / many consumers case.

Note that this is possible because now we don't send to
sleep idle threads anymore while searching, so that now
only UI can wake up the main thread and we can use the
same ConditionVariable for both threads.

The natural consequence is to retire wait_for_think_finished()
and move all the logic under MainThread class, yielding the
rename of teh function to join()

No functional change.
2015-03-21 07:55:33 +01:00
Marco Costalba
9a6cfee73b Simplify nosleep logic
Avoid redundant 'while' conditions. It is enough to
check them in the outer loop.

Quick tested for no regression 10K games at 4 threads
ELO: -1.32 +-3.9 (95%) LOS: 25.6%
Total: 10000 W: 1653 L: 1691 D: 6656

No functional change.
2015-03-18 08:01:50 +01:00
Marco Costalba
13d4df95cd Use acquire() and release() for spinlocks
It is more idiomatick than lock() and unlock()

No functional change.
2015-03-16 08:14:08 +01:00
Joona Kiiski
f04f50b368 Do not sleep, but yield
During the search, do not block on condition variable, but instead use std::this_thread::yield().

Clear gain with 16 threads. Again results vary highly depending on hardware, but on average it's a clear gain.

ELO: 12.17 +-4.3 (95%) LOS: 100.0%
Total: 7998 W: 1407 L: 1127 D: 5464

There is no functional change in single thread mode

Resolves #294
2015-03-15 19:45:30 +00:00
Joona Kiiski
d71f707040 Introduce yielding spin locks
Idea and original implementation by Stephane Nicolet

7 threads 15+0.05
ELO: 3.54 +-2.9 (95%) LOS: 99.2%
Total: 17971 W: 2976 L: 2793 D: 12202

There is no functional change in single thread mode
2015-03-14 19:14:52 +00:00
Joona Kiiski
81c7975dcd Use thread specific mutexes instead of a global one.
This is necessary to improve the scalability with high number of cores.

There is no functional change in a single thread mode.

Resolves #281
2015-03-11 21:59:34 +00:00
Marco Costalba
4b59347194 Retire spinlocks
Use Mutex instead.

This is in preparaation for merging with master branch,
where we stilll don't have spinlocks.

Eventually spinlocks will be readded in some future
patch, once c++11 has been merged.

No functional change.
2015-03-11 21:20:47 +01:00
Marco Costalba
8725494966 Add thread_win32.h header
Workaround slow std::thread implementation in mingw
and gcc for Windows with our own old low level thread
functions.

No functional change.
2015-03-10 12:42:40 +01:00
Marco Costalba
63a5fc2366 Rename available_to()
Change this API to be more natural and simple.

Inspired by a patch by Joona.

No functional change.
2015-03-01 12:33:05 +01:00
Marco Costalba
0b36ba74fc Don't assume the type of Time::point
But instead use the proper definition. Also
rewrite chrono functions while there.

No functional change.
2015-02-24 14:08:14 +01:00
Marco Costalba
38112060dc Use spinlock instead of mutex for Threads and SplitPoint
It is reported to be defenitly faster with increasing
number of threads, we go from a +3.5% with 4 threads
to a +15% with 16 threads.

The only drawback is that now when testing with more
threads than physical available cores, the speed slows
down to a crawl. This is expected and was similar at what
we had setting the old sleepingThreads to false.

No functional change.
2015-02-23 13:47:07 +01:00
Marco Costalba
7ff965eebf Improve comments in SMP code
No functional change.
2015-02-20 12:38:54 +01:00
Marco Costalba
40548c9153 Sync with master
bench: 7911944
2015-02-20 10:37:29 +01:00
Marco Costalba
950c8436ed Use size_t consistently across thread code
No functional change.
2015-02-19 10:43:28 +01:00
Marco Costalba
8d47caa16e Retire redundant sp->slavesCount field
It should be used slavesMask.count() instead.

Verified 100% equivalent when sp->allSlavesSearching:

dbg_hit_on(sp->allSlavesSearching, sp->slavesCount != sp->slavesMask.count());

No functional change.
2015-02-19 10:36:15 +01:00
Marco Costalba
dccaa145d2 Compute SplitPoint::spLevel on the fly
And retire a redundant field. This is important also
from a concept point of view becuase we want to keep
SMP structures as simple as possible with the only
strictly necessary data.

Verified with

dbg_hit_on(sp->spLevel != level)

that the values are 100% the same out of more 50K samples.

No functional change.
2015-02-18 21:50:35 +01:00
Joona Kiiski
d65f75c153 Improve smp performance for high number of threads
Balance threads between split points.

There are huge differences between different machines and autopurging makes it very difficult to measure the improvement in fishtest, but the following was recorded for 16 threads at 15+0.05:

    For Bravone (1000 games): 0 ELO
    For Glinscott (1000 games): +20 ELO
    For bKingUs (1000 games): +50 ELO
    For fastGM (1500 games): +50 ELO

The change was regression for no one, and a big improvement for some, so it should be fine to commit it.
Also for 8 threads at 15+0.05 we measured a statistically significant improvement:
ELO: 6.19 +-3.9 (95%) LOS: 99.9%
Total: 10325 W: 1824 L: 1640 D: 6861

Finally it was verified that there was no (significant) regression for

4 threads:
ELO: 0.09 +-2.8 (95%) LOS: 52.4%
Total: 19908 W: 3422 L: 3417 D: 13069

2 threads:
ELO: 0.38 +-3.0 (95%) LOS: 60.0%
Total: 19044 W: 3480 L: 3459 D: 12105

1 thread:
ELO: -1.27 +-2.1 (95%) LOS: 12.3%
Total: 40000 W: 7829 L: 7975 D: 24196

Resolves #258
2015-02-16 20:36:13 +00:00
Marco Costalba
65f46794af Implicit conversion from ExtMove to Move
Verified with perft there is no speed regression,
and code is simpler. It is also conceptually correct
becuase an extended move is just a move that happens
to have also a score.

No functional change.
2015-01-31 19:22:07 +01:00
Marco Costalba
3c07603dac Import C++11 branch
Import C++11 branch from:

https://github.com/mcostalba/Stockfish/tree/c++11

The version imported is teh last one as of today:
6670e93e50

Branch is fully equivalent with master but syzygy
tablebases that are missing (but will be added with
next commit).

bench: 8080602
2015-01-18 08:00:50 +01:00
Marco Costalba
4eb2d8ce09 Assorted headers cleanup
Mostly comments fixing and other small things.

No functional change.
2015-01-11 22:56:35 +01:00
Marco Costalba
42b48b08e8 Update copyright year
No functional change.
2015-01-10 11:46:28 +01:00
Marco Costalba
62f531254e Fix comments in thread.cpp
And reshuffle a bit the functions to place
them in a consistent order.

To be on the safe side, patch has been
validated for no regression/crashes with
a small 8K games test with 3 threads:

ELO: 3.98 +-4.4 (95%) LOS: 96.3%
Total: 8388 W: 1500 L: 1404 D: 5484

No functional change.
2015-01-03 09:34:58 +01:00
hxim
fbb53524ef Rename some variables for more clarity.
No functional change.

Resolves #131
2014-12-08 07:53:33 +08:00
Gary Linscott
4739037f96 100% accurate PV display
This gives SF accurate PVs, such that the evaluation of the leaf node in
the PV matches the score backed up to the root (99% of the time.
q-search will use the value stored in the hash table instead of the eval
value sometimes).

One drawback is that fail-high/low only get a minimal 2 move PV.

It doesn't add any additional overhead to the non-PV codepath except an
extra eight bytes to the SearchStack structure in multi-threaded
searches.

A core part of this is not pruning based on TT score in PV nodes. This
was measured as not being a regression at multiple TCs, except for one
exception, fast TC with huge hash, which is not realistic for longer
searches.

STC - 1 thread, 128 mb hash
ELO: 1.42 +-3.1 (95%) LOS: 81.9%
Total: 20000 W: 4078 L: 3996 D: 11926

STC - 3 thread, 128 mb hash
ELO: -3.60 +-2.9 (95%) LOS: 0.8%
Total: 20000 W: 3575 L: 3782 D: 12643

STC - 3 thread, 8 mb hash
ELO: 0.12 +-2.9 (95%) LOS: 53.3%
Total: 20000 W: 3654 L: 3647 D: 12699

LTC - 3 thread, 32mb hash
ELO: 2.29 +-2.0 (95%) LOS: 98.8%
Total: 35740 W: 5618 L: 5382 D: 24740

Bench: 6984058

Resolves #102
2014-11-12 16:16:33 -05:00
Marco Costalba
5cbcff55cc Rename ucioption.h to uci.h
We are going to add all UCI related
functions here, so first rename it
to a more proper name.

No functional change.
2014-10-26 19:39:46 +00:00
Joona Kiiski
eb50793cff Retire FakeSplit
- Currently broken
    - Never been really useful
    - Does not work well with new splitting model

Verified for no regression at STC with 3 threads:
LLR: 2.96 (-2.94,2.94) [-6.00,0.00]
Total: 81905 W: 12122 L: 12381 D: 57402

No functional change
2014-07-09 07:19:06 +08:00
kinderchocolate
6f48367094 Add some const qualifier
No functional change.
2014-06-03 11:43:52 +02:00
Marco Costalba
9f843adf89 Retire "Idle Threads Sleep" UCI option
After last Joona's patch there is no measurable
difference between the option set or unset.

Tested by Andreas Strangmüller with 16 threads
on his Dual Opteron 6376.

After 5000 games at 15+0.05 the result is:

1 Stockfish_14050822_T16_on   : 3003  5000 (+849,=3396,-755), 50.9 %
2 Stockfish_14050822_T16_off  : 2997  5000 (+755,=3396,-849), 49.1 %

bench: 880215
2014-05-11 10:29:56 +02:00
Marco Costalba
6ba1d3ead6 Clarify some comments in SMP code
Spotted by Joona.

No functional change.
2014-05-08 09:09:35 +02:00
Marco Costalba
7e3dba4f4c Reformat and simplify previous patch
No functional change.
2014-05-07 08:56:16 +02:00
Joona Kiiski
f6e98a924a Allow a slave to 'late join' another splitpoint
Instead of waiting to be allocated, actively search
for another split point to join when finishes its
search. Also modify split conditions.

This patch has been tested with 7 threads SMP and
passed both STC:

LLR: 2.97 (-2.94,2.94) [-1.50,4.50]
Total: 2885 W: 519 L: 410 D: 1956

And a reduced-LTC at  25+0.05
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 4401 W: 684 L: 566 D: 3151

Was then retested against regression in 3 thread case
at standard LTC of  60+0.05:

LLR: 2.96 (-2.94,2.94) [-4.00,0.00]
Total: 40809 W: 5446 L: 5406 D: 29957

bench: 8802105
2014-05-07 08:38:56 +02:00
Marco Costalba
aab5863dd4 Increase max threads to 128
Thanks to std::bitset we can easily increase
the limit of active threads above 64.

Thanks to Lucas Braesch for pointing at the
correct solution of using std::bitset.

No functional change.
2014-03-18 12:07:26 +01:00
Marco Costalba
a091ae4cc8 Split also if no slaves are found
Because we test for available slaves before
entering split(), we almost always allocate a
slave, only in the rare case of a race (less
then 2% of cases) this is not true, but to
special case this occurrence is not worth
the added complexity.

bench: 7451319
2014-03-15 23:43:35 +01:00
Marco Costalba
a1a7bc84da Remove "Max Threads per Split Point" UCI option
Experimental patch to verify if drop of nps
in endgames at very long TC is due to this.

Suggested by Ronald de Man.

bench: 7451319
2014-03-15 21:26:04 +01:00
Marco Costalba
3e5470d88f Remove limit of minimumSplitDepth
There is no reason why an user cannot set
it at a value less than 4.

No functional change.
2014-03-01 23:22:14 +01:00
Marco Costalba
41641e3b1e Assorted tweaks from DON
Mainly renames and some little code style improvment,
inspired by looking at DON sources:

https://github.com/erashid/DON

No functional change.
2014-02-09 17:31:45 +01:00
Marco Costalba
c9dcda6ac4 Update copyright year
No functional change.
2014-01-02 01:49:18 +01:00
Lucas Braesch
f5727deee3 Remove threat move stuff
A great simplification that shows no regression
and it seems even a bit scalable.

Tested with fixed number of games:

Short TC
ELO: 0.60 +-2.1 (95%) LOS: 71.1%
Total: 39554 W: 7477 L: 7409 D: 24668

Long TC
ELO: 2.97 +-2.0 (95%) LOS: 99.8%
Total: 36424 W: 5894 L: 5583 D: 24947

bench: 8184352
2013-12-15 09:43:29 +01:00
Arjun Temurnikar
431c3ac485 Even more spelling fixes
No functional change.
2013-12-06 09:03:24 +01:00
Jerry Donald
a8af78c833 Another round of spelling fixes
And also renamed a loop variable while there.

No functional change.
2013-12-02 23:51:29 +01:00