1
0
Fork 0
mirror of https://github.com/sockspls/badfish synced 2025-05-02 17:49:35 +00:00
Commit graph

308 commits

Author SHA1 Message Date
Marco Costalba
057d710fc2 Fix indentation in struct FromToStats
And other little trivial stuff.

No functional change.
2016-09-17 09:51:20 +02:00
Marco Costalba
5c58d1f5cb Use per-thread counterMoveHistory
Drops a scalability bottleneck due to memory contention
of a single shared table across threads. The effect starts
to be sensible with a high number of threads. Specifically
we have a small regression with 7 threads both at 60 and
180 seconds TC:

10000 @ 60+0.6 th 7
ELO: -2.46 +-3.2 (95%) LOS: 6.5%
Total: 9896 W: 1037 L: 1107 D: 7752

5000 @ 180+0.6 th 7
ELO: -1.95 +-4.1 (95%) LOS: 17.7%
Total: 5000 W: 444 L: 472 D: 4084

We have a regression because counterMoveHistory table is
quite big and it takes time for a single thread to fill it.
Sharing the table yields to a higher fill rate and better
quality of moves and up to 7 threads the benefits of sharing
more then compensate the loss in speed due to contention.
Interestingly even with a 3X longer TC, so with more time
for the single thread to catch up, the improvment is quite
limited and below noise level. It seems we really need much
longer TC to saturate the table.

When we move to high threads number it's another story:

5000 @ 60+0.6 th 22
ELO: 3.49 +-4.3 (95%) LOS: 94.6%
Total: 4880 W: 490 L: 441 D: 3949

2000 @ 60+0.6 th 32
ELO: 8.34 +-6.9 (95%) LOS: 99.1%
Total: 2000 W: 229 L: 181 D: 1590

As expected the speed-up more than compensates the filling
rate, and we expect that with tournament TC, where single
thread is able to saturate the table, the difference will
be even stronger. For instance for TCEC 9 super-final time
control will be 180 minutes + 15 seconds and this scalability
improvement seems definitely the way to go.

So, summarizing:

GOOD:

Measured big improvement in high core scenario

Suitable for TCEC 9 superfinal (big hardware, very long TC)

Consistent and natural patch that extends to counterMoveHistory
what we already do for remaining history tables, that are all per-thread

Non functional change for the common case of a single core

Very simple (just 6 lines modified, no added ones)

BAD:

Small regression (within 2-3 ELO) with few threads and short TC

bench: 5341477
2016-09-16 08:15:07 +02:00
VoyagerOne
b3525fa9ea Use Color-From-To history stats to help sort moves
STC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 33502 W: 6498 L: 6223 D: 20781
http://tests.stockfishchess.org/tests/view/578abb940ebc5972faa169e2

LTC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 50782 W: 7124 L: 6832 D: 36826
http://tests.stockfishchess.org/tests/view/578b8e5d0ebc5972faa169fd

LTC: (Sanity test against latest master)
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 32759 W: 4600 L: 4370 D: 23789
http://tests.stockfishchess.org/tests/view/5798b7d30ebc591c761f5b72

bench: 6985912

P.S. Thanks @mstembera for rewriting my code to make it smp compatible. A BIG thank you!
2016-08-02 09:17:14 +02:00
Marco Costalba
ca14345ba2 Filter root moves filter before copy to threads
Currently root moves are copied to all teh threads
but are DTZ filtered only in main thread at the
beginning of teh search.

This patch moves the TB filtering before the
copy of root moves fixing issue #679

https://github.com/official-stockfish/Stockfish/issues/679

No bench change.
2016-06-11 09:24:40 +02:00
Marco Costalba
7eaea3848c StateInfo is usually allocated on the stack by search()
And passed in do_move(), this ensures maximum efficiency and
speed and at the same time unlimited move numbers.

The draw back is that to handle Position init we need to
reserve a StateInfo inside Position itself and use at
init time and when copying from another Position.

After lazy SMP we don't need anymore this gimmick and we can
get rid of this special case and always pass an external
StateInfo to Position object.

Also rewritten and simplified Position constructors.

Verified it does not regress with a 3 threads SMP test:
ELO: -0.00 +-12.7 (95%) LOS: 50.0%
Total: 1000 W: 173 L: 173 D: 654

No functional change.
2016-04-17 08:29:33 +02:00
Marco Costalba
356147d99a Rewrite time formula
Time management is really too complex, our aim is
to simplify it, but for time being at least rewrite
in an understandable way.

No functional change.
2016-01-18 17:12:18 +01:00
Lyudmil Antonov
89723339d9 Assorted English grammar changes
No functional change

Resolves #567
2016-01-16 21:34:29 +00:00
Leonid Pechenik
9eceb894ac Adjust time used for move based on previous score
Use less time if evaluation is not worse than for previous move and even less time if in addition no fail low encountered for current iteration.

STC: 10+0.1
ELO: 5.37 +-2.9 (95%) LOS: 100.0%
Total: 20000 W: 3832 L: 3523 D: 12645

STC: 10+0.1
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 17527 W: 3334 L: 3132 D: 11061

LTC: 60+0.6
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 28233 W: 3939 L: 3725 D: 20569

LTC: 60+0.6
ELO: 2.43 +-1.4 (95%) LOS: 100.0%
Total: 60000 W: 8266 L: 7847 D: 43887

LTC: 60+0.06
LLR: 2.95 (-2.94,2.94) [-1.00,3.00]
Total: 38932 W: 5408 L: 5207 D: 28317

Resolves #547
2016-01-03 14:01:15 +00:00
ppigazzini
d4af15f682 Update AUTHORS and copyright notice
No functional change

Resolves #555
2016-01-02 09:43:51 +00:00
Marco Costalba
9742fb10fd Update Copyright year
No functional change.

Resolves #554
2016-01-01 10:17:36 +00:00
Marco Costalba
1b5b900a29 Move some globals into main thread scope
Make it explicit that those variables are not globals, but
are used only by main thread. I think it is a sensible
clarification because easy move is already tricky enough
and current patch makes the involved actors explicit.

No functional change.

Resolves #537
2015-12-27 19:29:16 +00:00
Marco Costalba
93195555ed Rewrite how threads are spawned
Instead of creating a running std::thread and
returning, wait in Thread c'tor that the native
thread of execution goes to sleep in idle_loop().

In this way we can simplify how search is started,
because when main thread is idle we are sure also
all other threads will be idle, in any case, even
at thread creation and startup.

After lazy smp went in, we can simpify and rewrite
a lot of logic that is now no more needed. This is
hopefully the final big cleanup.

Tested for no regression at 5+0.1 with 3 threads:
LLR: 2.95 (-2.94,2.94) [-5.00,0.00]
Total: 17411 W: 3248 L: 3198 D: 10965

No functional change.
2015-11-21 07:48:50 +01:00
Marco Costalba
76ed0ab501 Retire ThreadBase
Now that we don't have anymore TimerThread, there is
no need of this long class hierarchy.

Also assorted reformatting while there.

To verify no regression, passed at STC with 7 threads:
LLR: 2.97 (-2.94,2.94) [-5.00,0.00]
Total: 30990 W: 4945 L: 4942 D: 21103

No functional change.
2015-11-13 08:22:44 +01:00
Marco Costalba
9c9205860c Get rid of timer thread
Unfortunately std::condition_variable::wait_for()
is not accurate in general case and the timer thread
can wake up also after tens or even hundreds of
millisecs after time has elapsded. CPU load, process
priorities, number of concurrent threads, even from
other processes, will have effect upon it.

Even official documentation says: "This function may
block for longer than timeout_duration due to scheduling
or resource contention delays."

So retire timer and use a polling scheme based on a
local thread counter that counts search() calls and
a small trick to keep polling frequency constant,
independently from the number of threads.

Tested for no regression at very fast TC 2+0.05 th 7:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 32969 W: 6720 L: 6620 D: 19629

TC 2+0.05 th 1:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 7765 W: 1917 L: 1765 D: 4083

And at STC TC, both single thread
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 15587 W: 3036 L: 2905 D: 9646

And with 7 threads
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 8149 W: 1367 L: 1227 D: 5555

bench: 8639247
2015-11-03 11:27:00 +01:00
mbootsector
27c5cb5912 Pick bestmove from the deepest thread.
STC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 26930 W: 4441 L: 4214 D: 18275

LTC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 7783 W: 1017 L: 876 D: 5890

No functional change in single thread mode

Resolves #485
2015-11-02 10:05:43 +00:00
Marco Costalba
86f04dbcc0 Assorted trivia in search.cpp
The only interesting change is the moving of
stack[MAX_PLY+4] back to its original position
in id_loop (now renamed Thread::search).

No functional change.
2015-10-31 19:26:35 +01:00
Stéphane Nicolet
80d7556af7 Some code and comment cleanup
- Remove all references to split points
- Some grammar and spelling fixes

No Functional change

Resolves #478
2015-10-29 15:28:59 +00:00
lucasart
00d9e9fd28 Use atomics instead of volatile
Rely on well defined behaviour for message passing, instead of volatile. Three
versions have been tested, to make sure this wouldn't cause a slowdown on any
platform.

v1: Sequentially consistent atomics

No mesurable regression, despite the extra memory barriers on x86. Even with 15
threads and extreme time pressure, both acting as a magnifying glass:

threads=15, tc=2+0.02
ELO: 2.59 +-3.4 (95%) LOS: 93.3%
Total: 18132 W: 4113 L: 3978 D: 10041

threads=7, tc=2+0.02
ELO: -1.64 +-3.6 (95%) LOS: 18.8%
Total: 16914 W: 4053 L: 4133 D: 8728

v2: Acquire/Release semantics

This version generates no extra barriers for x86 (on the hot path). As expected,
no regression either, under the same conditions:

threads=15, tc=2+0.02
ELO: 2.85 +-3.3 (95%) LOS: 95.4%
Total: 19661 W: 4640 L: 4479 D: 10542

threads=7, tc=2+0.02
ELO: 0.23 +-3.5 (95%) LOS: 55.1%
Total: 18108 W: 4326 L: 4314 D: 9468

As suggested by Joona, another test at LTC:

threads=15, tc=20+0.05
ELO: 0.64 +-2.6 (95%) LOS: 68.3%
Total: 20000 W: 3053 L: 3016 D: 13931

v3: Final version: SeqCst/Relaxed

threads=15, tc=10+0.1
ELO: 0.87 +-3.9 (95%) LOS: 67.1%
Total: 9541 W: 1478 L: 1454 D: 6609

Resolves #474
2015-10-25 09:15:45 +00:00
Marco Costalba
307a5a4f63 Cleanup history stats
And other assorted trivia.

No functional change.
2015-10-24 17:29:12 +02:00
mbootsector
ecc5ff6693 Lazy SMP
Start all threads searching on root position and
use only the shared TT table as synching scheme.

It seems this scheme scales better than YBWC for
high number of threads.

Verified for nor regression at STC 3 threads
LLR: -2.95 (-2.94,2.94) [-3.00,1.00]
Total: 40232 W: 6908 L: 7130 D: 26194

Verified for nor regression at LTC 3 threads
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 28186 W: 3908 L: 3798 D: 20480

Verified for nor regression at STC 7 threads
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 3607 W: 674 L: 526 D: 2407

Verified for nor regression at LTC 7 threads
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 4235 W: 671 L: 528 D: 3036

Tested with fixed games at LTC with 20 threads
ELO: 44.75 +-7.6 (95%) LOS: 100.0%
Total: 2069 W: 407 L: 142 D: 1520

Tested with fixed games at XLTC (120secs) with 20 threads
ELO: 28.01 +-6.7 (95%) LOS: 100.0%
Total: 2275 W: 349 L: 166 D: 1760

Original patch of mbootsector, with additional work
from Ivan Ivec (log formula), Joerg Oster (id loop
simplification) and Marco Costalba (assorted formatting
and rework).

Bench: 8116244
2015-10-20 06:58:08 +02:00
Joona Kiiski
a7381d5e81 Fully yielding locks, no spinning
7 threads:

ELO: 2.00 +-2.7 (95%) LOS: 92.4%
Total: 20000 W: 3276 L: 3161 D: 13563

There is no functional change in single thread mode

Resolves #304
2015-03-24 21:34:19 +00:00
Marco Costalba
be77406a55 Get rid of nativeThread
No functional change.
2015-03-23 09:02:52 +01:00
Marco Costalba
26dabb1e6b Use only one ConditionVariable to sync UI
To sync UI with main thread it is enough a single
condition variable because here we have a single
producer / single consumer design pattern.

Two condition variables are strictly needed just for
many producers / many consumers case.

Note that this is possible because now we don't send to
sleep idle threads anymore while searching, so that now
only UI can wake up the main thread and we can use the
same ConditionVariable for both threads.

The natural consequence is to retire wait_for_think_finished()
and move all the logic under MainThread class, yielding the
rename of teh function to join()

No functional change.
2015-03-21 07:55:33 +01:00
Marco Costalba
13d4df95cd Use acquire() and release() for spinlocks
It is more idiomatick than lock() and unlock()

No functional change.
2015-03-16 08:14:08 +01:00
Joona Kiiski
d71f707040 Introduce yielding spin locks
Idea and original implementation by Stephane Nicolet

7 threads 15+0.05
ELO: 3.54 +-2.9 (95%) LOS: 99.2%
Total: 17971 W: 2976 L: 2793 D: 12202

There is no functional change in single thread mode
2015-03-14 19:14:52 +00:00
Joona Kiiski
81c7975dcd Use thread specific mutexes instead of a global one.
This is necessary to improve the scalability with high number of cores.

There is no functional change in a single thread mode.

Resolves #281
2015-03-11 21:59:34 +00:00
Marco Costalba
4b59347194 Retire spinlocks
Use Mutex instead.

This is in preparaation for merging with master branch,
where we stilll don't have spinlocks.

Eventually spinlocks will be readded in some future
patch, once c++11 has been merged.

No functional change.
2015-03-11 21:20:47 +01:00
Marco Costalba
04372316b3 Disable spinlocks
To allow testing on fishtest.

No functional change.
2015-03-10 12:47:49 +01:00
Marco Costalba
8725494966 Add thread_win32.h header
Workaround slow std::thread implementation in mingw
and gcc for Windows with our own old low level thread
functions.

No functional change.
2015-03-10 12:42:40 +01:00
Marco Costalba
a590d1d52d Re-enable spinlocks
For branch C++11, that doe snot run on fishtest,
there is no need of this kludge, let only master
have it.

No functional change.
2015-03-07 08:38:26 +01:00
Marco Costalba
6645115377 Allow to disable spinlocks
And use mutex instead. You may never want to do this.
It is a workaround to run c++11 on fishtest where many
machiens have HTenabled and this can be a problem when
number of cores set is higher than number of physical cores.

To disable spinlocks, just compile with -DNO_SPINLOCK flag

No functional change.
2015-03-01 17:16:05 +01:00
Marco Costalba
63a5fc2366 Rename available_to()
Change this API to be more natural and simple.

Inspired by a patch by Joona.

No functional change.
2015-03-01 12:33:05 +01:00
Marco Costalba
d3d26a94b3 Improve spinlock implementation
Calling lock.test_and_set() in a tight loop creates expensive
memory synchronizations among processors and penalize other
running threads. So syncronize only only once at the beginning
with fetch_sub() and then loop on a simple load() that puts much
less pressure on the system.

Reported about 2-3% speed up on various systems.

Patch by Ronald de Man.

No functional change.
2015-02-23 19:48:46 +01:00
Marco Costalba
38112060dc Use spinlock instead of mutex for Threads and SplitPoint
It is reported to be defenitly faster with increasing
number of threads, we go from a +3.5% with 4 threads
to a +15% with 16 threads.

The only drawback is that now when testing with more
threads than physical available cores, the speed slows
down to a crawl. This is expected and was similar at what
we had setting the old sleepingThreads to false.

No functional change.
2015-02-23 13:47:07 +01:00
Marco Costalba
775f8239d3 Introduce Spinlock class
Initialization is more complex than what I'd like due
to MSVC compatibility that for some reason does not like:

std::atomic_flag lock = ATOMIC_FLAG_INIT;

No functional change.
2015-02-23 13:37:46 +01:00
Marco Costalba
7ff965eebf Improve comments in SMP code
No functional change.
2015-02-20 12:38:54 +01:00
Marco Costalba
40548c9153 Sync with master
bench: 7911944
2015-02-20 10:37:29 +01:00
Marco Costalba
950c8436ed Use size_t consistently across thread code
No functional change.
2015-02-19 10:43:28 +01:00
Marco Costalba
8d47caa16e Retire redundant sp->slavesCount field
It should be used slavesMask.count() instead.

Verified 100% equivalent when sp->allSlavesSearching:

dbg_hit_on(sp->allSlavesSearching, sp->slavesCount != sp->slavesMask.count());

No functional change.
2015-02-19 10:36:15 +01:00
Marco Costalba
dccaa145d2 Compute SplitPoint::spLevel on the fly
And retire a redundant field. This is important also
from a concept point of view becuase we want to keep
SMP structures as simple as possible with the only
strictly necessary data.

Verified with

dbg_hit_on(sp->spLevel != level)

that the values are 100% the same out of more 50K samples.

No functional change.
2015-02-18 21:50:35 +01:00
Joona Kiiski
d65f75c153 Improve smp performance for high number of threads
Balance threads between split points.

There are huge differences between different machines and autopurging makes it very difficult to measure the improvement in fishtest, but the following was recorded for 16 threads at 15+0.05:

    For Bravone (1000 games): 0 ELO
    For Glinscott (1000 games): +20 ELO
    For bKingUs (1000 games): +50 ELO
    For fastGM (1500 games): +50 ELO

The change was regression for no one, and a big improvement for some, so it should be fine to commit it.
Also for 8 threads at 15+0.05 we measured a statistically significant improvement:
ELO: 6.19 +-3.9 (95%) LOS: 99.9%
Total: 10325 W: 1824 L: 1640 D: 6861

Finally it was verified that there was no (significant) regression for

4 threads:
ELO: 0.09 +-2.8 (95%) LOS: 52.4%
Total: 19908 W: 3422 L: 3417 D: 13069

2 threads:
ELO: 0.38 +-3.0 (95%) LOS: 60.0%
Total: 19044 W: 3480 L: 3459 D: 12105

1 thread:
ELO: -1.27 +-2.1 (95%) LOS: 12.3%
Total: 40000 W: 7829 L: 7975 D: 24196

Resolves #258
2015-02-16 20:36:13 +00:00
Marco Costalba
96e36a7897 Explicitly defaulted and deleted members
Better than a bit obscure implicit ones.

No functional change.
2015-01-21 13:18:19 +01:00
Marco Costalba
f53aea45e3 Add syzygy support
bench: 8080602
2015-01-18 08:27:46 +01:00
Marco Costalba
3c07603dac Import C++11 branch
Import C++11 branch from:

https://github.com/mcostalba/Stockfish/tree/c++11

The version imported is teh last one as of today:
6670e93e50

Branch is fully equivalent with master but syzygy
tablebases that are missing (but will be added with
next commit).

bench: 8080602
2015-01-18 08:00:50 +01:00
Marco Costalba
4eb2d8ce09 Assorted headers cleanup
Mostly comments fixing and other small things.

No functional change.
2015-01-11 22:56:35 +01:00
Marco Costalba
42b48b08e8 Update copyright year
No functional change.
2015-01-10 11:46:28 +01:00
Gary Linscott
4739037f96 100% accurate PV display
This gives SF accurate PVs, such that the evaluation of the leaf node in
the PV matches the score backed up to the root (99% of the time.
q-search will use the value stored in the hash table instead of the eval
value sometimes).

One drawback is that fail-high/low only get a minimal 2 move PV.

It doesn't add any additional overhead to the non-PV codepath except an
extra eight bytes to the SearchStack structure in multi-threaded
searches.

A core part of this is not pruning based on TT score in PV nodes. This
was measured as not being a regression at multiple TCs, except for one
exception, fast TC with huge hash, which is not realistic for longer
searches.

STC - 1 thread, 128 mb hash
ELO: 1.42 +-3.1 (95%) LOS: 81.9%
Total: 20000 W: 4078 L: 3996 D: 11926

STC - 3 thread, 128 mb hash
ELO: -3.60 +-2.9 (95%) LOS: 0.8%
Total: 20000 W: 3575 L: 3782 D: 12643

STC - 3 thread, 8 mb hash
ELO: 0.12 +-2.9 (95%) LOS: 53.3%
Total: 20000 W: 3654 L: 3647 D: 12699

LTC - 3 thread, 32mb hash
ELO: 2.29 +-2.0 (95%) LOS: 98.8%
Total: 35740 W: 5618 L: 5382 D: 24740

Bench: 6984058

Resolves #102
2014-11-12 16:16:33 -05:00
Joona Kiiski
eb50793cff Retire FakeSplit
- Currently broken
    - Never been really useful
    - Does not work well with new splitting model

Verified for no regression at STC with 3 threads:
LLR: 2.96 (-2.94,2.94) [-6.00,0.00]
Total: 81905 W: 12122 L: 12381 D: 57402

No functional change
2014-07-09 07:19:06 +08:00
Marco Costalba
9f843adf89 Retire "Idle Threads Sleep" UCI option
After last Joona's patch there is no measurable
difference between the option set or unset.

Tested by Andreas Strangmüller with 16 threads
on his Dual Opteron 6376.

After 5000 games at 15+0.05 the result is:

1 Stockfish_14050822_T16_on   : 3003  5000 (+849,=3396,-755), 50.9 %
2 Stockfish_14050822_T16_off  : 2997  5000 (+755,=3396,-849), 49.1 %

bench: 880215
2014-05-11 10:29:56 +02:00
Marco Costalba
7e3dba4f4c Reformat and simplify previous patch
No functional change.
2014-05-07 08:56:16 +02:00