1
0
Fork 0
mirror of https://github.com/sockspls/badfish synced 2025-04-30 08:43:09 +00:00
Commit graph

207 commits

Author SHA1 Message Date
Marco Costalba
e18321f55a Correcty resey TB hit counter
Restore original behaviour to reset
the counter before a new move search.

Also fixed some warnings and added const
qualifier to a couple of functions, as
suggested by m_stembera.

Thanks to Werner Bergmans for reporting
the regression.

No functional change.
2016-10-22 08:22:13 +02:00
syzygy
ca67752645 Per-thread TB hit counters
Use a per-thread counter to reduce contention
with many cores and endgame positions.

Measured around 1% speed-up on a 12 core and 8%
on 28 cores with 6-men, searching on:
 7R/1p3k2/2p2P2/3nR1P1/8/3b1P2/7K/r7 b - - 3 38

Also retire the unused set_nodes_searched() and fix
a couple of return types and naming conventions.

No functional change.
2016-10-21 06:15:45 +02:00
Marco Costalba
8662bdfa12 Fix crash when passing a mate/stalemate position
Both Tablebases::filter_root_moves() and
extract_ponder_from_tt(9 were unable to handle
a mate/stalemate position.

Spotted and reported by Dann Corbit.

Added some mate/stalemate positions to bench so
to early catch this regression in the future.

No functional change.
2016-09-24 07:37:52 +02:00
Marco Costalba
ca14345ba2 Filter root moves filter before copy to threads
Currently root moves are copied to all teh threads
but are DTZ filtered only in main thread at the
beginning of teh search.

This patch moves the TB filtering before the
copy of root moves fixing issue #679

https://github.com/official-stockfish/Stockfish/issues/679

No bench change.
2016-06-11 09:24:40 +02:00
Marco Costalba
94e41274bb Fix incorrect draw detection
In this position we should have draw for repetition:

position fen rnbqkbnr/2pppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1 moves g1f3 g8f6 f3g1
go infinite

But latest patch broke it.

Actually we had two(!) very subtle bugs, the first is that Position::set()
clears the passed state and in particular 'previous' member, so
that on passing setupStates, 'previous' pointer was reset.

Second bug is even more subtle: SetupStates was based on std::vector
as container, but when vector grows, std::vector copies all its contents
to a new location invalidating all references to its entries. Because
all StateInfo records are linked by 'previous' pointer, this made pointers
go stale upon adding more element to setupStates. So revert to use a
std::deque that ensures references are preserved when pushing back new
elements.

No functional change.
2016-04-18 00:13:16 +02:00
Marco Costalba
7eaea3848c StateInfo is usually allocated on the stack by search()
And passed in do_move(), this ensures maximum efficiency and
speed and at the same time unlimited move numbers.

The draw back is that to handle Position init we need to
reserve a StateInfo inside Position itself and use at
init time and when copying from another Position.

After lazy SMP we don't need anymore this gimmick and we can
get rid of this special case and always pass an external
StateInfo to Position object.

Also rewritten and simplified Position constructors.

Verified it does not regress with a 3 threads SMP test:
ELO: -0.00 +-12.7 (95%) LOS: 50.0%
Total: 1000 W: 173 L: 173 D: 654

No functional change.
2016-04-17 08:29:33 +02:00
Lyudmil Antonov
89723339d9 Assorted English grammar changes
No functional change

Resolves #567
2016-01-16 21:34:29 +00:00
ppigazzini
d4af15f682 Update AUTHORS and copyright notice
No functional change

Resolves #555
2016-01-02 09:43:51 +00:00
Marco Costalba
9742fb10fd Update Copyright year
No functional change.

Resolves #554
2016-01-01 10:17:36 +00:00
Leonid Pechenik
69240a982d Simplify time management and fix 'ponder on' bug
Simplify time management code by removing hard stops for unchanging first root moves.
Search is now stopped earlier at the end iteration if it did not have fail-lows at root.

This simplification also fixes pondering bug. Ponder flag was true by default
and cutechess-cli doesn't change it to false even though no pondering is possible.
Fix the issue by setting the default value of 'Ponder' flag to false.

10+0.1:
ELO: 3.51 +-3.0 (95%) LOS: 99.0%
Total: 20000 W: 3898 L: 3696 D: 12406

40+0.4:
ELO: 1.39 +-2.7 (95%) LOS: 84.7%
Total: 20000 W: 3104 L: 3024 D: 13872

60+0.06:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 37231 W: 5333 L: 5236 D: 26662

Stopped run at 100+1:
LLR: 1.09 (-2.94,2.94) [-3.00,1.00]
Total: 37253 W: 4862 L: 4856 D: 27535

Resolves #523
Fixes #510
2015-12-14 18:00:52 +00:00
Marco Costalba
93195555ed Rewrite how threads are spawned
Instead of creating a running std::thread and
returning, wait in Thread c'tor that the native
thread of execution goes to sleep in idle_loop().

In this way we can simplify how search is started,
because when main thread is idle we are sure also
all other threads will be idle, in any case, even
at thread creation and startup.

After lazy smp went in, we can simpify and rewrite
a lot of logic that is now no more needed. This is
hopefully the final big cleanup.

Tested for no regression at 5+0.1 with 3 threads:
LLR: 2.95 (-2.94,2.94) [-5.00,0.00]
Total: 17411 W: 3248 L: 3198 D: 10965

No functional change.
2015-11-21 07:48:50 +01:00
Marco Costalba
76ed0ab501 Retire ThreadBase
Now that we don't have anymore TimerThread, there is
no need of this long class hierarchy.

Also assorted reformatting while there.

To verify no regression, passed at STC with 7 threads:
LLR: 2.97 (-2.94,2.94) [-5.00,0.00]
Total: 30990 W: 4945 L: 4942 D: 21103

No functional change.
2015-11-13 08:22:44 +01:00
Marco Costalba
9c9205860c Get rid of timer thread
Unfortunately std::condition_variable::wait_for()
is not accurate in general case and the timer thread
can wake up also after tens or even hundreds of
millisecs after time has elapsded. CPU load, process
priorities, number of concurrent threads, even from
other processes, will have effect upon it.

Even official documentation says: "This function may
block for longer than timeout_duration due to scheduling
or resource contention delays."

So retire timer and use a polling scheme based on a
local thread counter that counts search() calls and
a small trick to keep polling frequency constant,
independently from the number of threads.

Tested for no regression at very fast TC 2+0.05 th 7:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 32969 W: 6720 L: 6620 D: 19629

TC 2+0.05 th 1:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 7765 W: 1917 L: 1765 D: 4083

And at STC TC, both single thread
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 15587 W: 3036 L: 2905 D: 9646

And with 7 threads
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 8149 W: 1367 L: 1227 D: 5555

bench: 8639247
2015-11-03 11:27:00 +01:00
Marco Costalba
86f04dbcc0 Assorted trivia in search.cpp
The only interesting change is the moving of
stack[MAX_PLY+4] back to its original position
in id_loop (now renamed Thread::search).

No functional change.
2015-10-31 19:26:35 +01:00
Stéphane Nicolet
80d7556af7 Some code and comment cleanup
- Remove all references to split points
- Some grammar and spelling fixes

No Functional change

Resolves #478
2015-10-29 15:28:59 +00:00
lucasart
00d9e9fd28 Use atomics instead of volatile
Rely on well defined behaviour for message passing, instead of volatile. Three
versions have been tested, to make sure this wouldn't cause a slowdown on any
platform.

v1: Sequentially consistent atomics

No mesurable regression, despite the extra memory barriers on x86. Even with 15
threads and extreme time pressure, both acting as a magnifying glass:

threads=15, tc=2+0.02
ELO: 2.59 +-3.4 (95%) LOS: 93.3%
Total: 18132 W: 4113 L: 3978 D: 10041

threads=7, tc=2+0.02
ELO: -1.64 +-3.6 (95%) LOS: 18.8%
Total: 16914 W: 4053 L: 4133 D: 8728

v2: Acquire/Release semantics

This version generates no extra barriers for x86 (on the hot path). As expected,
no regression either, under the same conditions:

threads=15, tc=2+0.02
ELO: 2.85 +-3.3 (95%) LOS: 95.4%
Total: 19661 W: 4640 L: 4479 D: 10542

threads=7, tc=2+0.02
ELO: 0.23 +-3.5 (95%) LOS: 55.1%
Total: 18108 W: 4326 L: 4314 D: 9468

As suggested by Joona, another test at LTC:

threads=15, tc=20+0.05
ELO: 0.64 +-2.6 (95%) LOS: 68.3%
Total: 20000 W: 3053 L: 3016 D: 13931

v3: Final version: SeqCst/Relaxed

threads=15, tc=10+0.1
ELO: 0.87 +-3.9 (95%) LOS: 67.1%
Total: 9541 W: 1478 L: 1454 D: 6609

Resolves #474
2015-10-25 09:15:45 +00:00
Marco Costalba
307a5a4f63 Cleanup history stats
And other assorted trivia.

No functional change.
2015-10-24 17:29:12 +02:00
mbootsector
ecc5ff6693 Lazy SMP
Start all threads searching on root position and
use only the shared TT table as synching scheme.

It seems this scheme scales better than YBWC for
high number of threads.

Verified for nor regression at STC 3 threads
LLR: -2.95 (-2.94,2.94) [-3.00,1.00]
Total: 40232 W: 6908 L: 7130 D: 26194

Verified for nor regression at LTC 3 threads
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 28186 W: 3908 L: 3798 D: 20480

Verified for nor regression at STC 7 threads
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 3607 W: 674 L: 526 D: 2407

Verified for nor regression at LTC 7 threads
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 4235 W: 671 L: 528 D: 3036

Tested with fixed games at LTC with 20 threads
ELO: 44.75 +-7.6 (95%) LOS: 100.0%
Total: 2069 W: 407 L: 142 D: 1520

Tested with fixed games at XLTC (120secs) with 20 threads
ELO: 28.01 +-6.7 (95%) LOS: 100.0%
Total: 2275 W: 349 L: 166 D: 1760

Original patch of mbootsector, with additional work
from Ivan Ivec (log formula), Joerg Oster (id loop
simplification) and Marco Costalba (assorted formatting
and rework).

Bench: 8116244
2015-10-20 06:58:08 +02:00
Marco Costalba
3c0fe1d9b2 Rework lock protecting
When changing 'search' and 'splitPointsSize' we have to
use thread locks, not split point ones, because can_join()
is called under the formers.

Verified succesfully with 24 hours toruture tests with 20
cores machine by Louis Zulli: it does not hangs.

Verifyed for no regressions with STC, 7 threads:
LLR: 2.94 (-2.94,2.94) [-3.00,1.00]
Total: 52804 W: 8159 L: 8087 D: 36558

No functional change.
2015-09-30 10:47:20 +02:00
Joona Kiiski
613dc66c12 Careful SMP locking - Fix very occasional hangs
Louis Zulli reported that Stockfish suffers from very occasional hangs with his 20 cores machine.

Careful SMP debugging revealed that this was caused by "a ghost split point slave", where thread
was marked as a split point slave, but wasn't actually working on it.

The only logical explanation for this was double booking, where due to SMP race, the same thread
is booked for two different split points simultaneously.

Due to very intermittent nature of the problem, we can't say exactly how this happens.

The current handling of Thread specific variables is risky though. Volatile variables are in some
cases changed without spinlock being hold. In this case standard doesn't give us any kind of
guarantees about how the updated values are propagated to other threads.

We resolve the situation by enforcing very strict locking rules:
- Values for key thread variables (splitPointsSize, activeSplitPoint, searching)
can only be changed when the thread specific spinlock is held.
- Structural changes for splitPoints[] are only allowed when the thread specific spinlock is held.
- Thread booking decisions (per split point) can only be done when the thread specific spinlock is held.

With these changes hangs didn't occur anymore during 2 days torture testing on Zulli's machine.

We probably have a slight performance penalty in SMP mode due to more locking.

STC (7 threads):
ELO: -1.00 +-2.2 (95%) LOS: 18.4%
Total: 30000 W: 4538 L: 4624 D: 20838

However stability is worth more than 1-2 ELO points in this case.

No functional change

Resolves #422
2015-09-10 19:15:43 +01:00
Marco Costalba
fb03188fc7 Assorted cleanup of last patches
No functional change.
2015-04-11 23:24:43 +02:00
Stéphane Nicolet
2ca142a5b4 Use minimumSplitDepth = 5
Using minimumSplitDepth = 5 seems to be the best compromise in the
current SMP implementation

STC, 11 threads:

ELO: 14.87 +-4.1 (95%) LOS: 100.0%
Total: 8509 W: 1497 L: 1133 D: 5879

STC, 4 threads:

ELO: 0.30 +-2.8 (95%) LOS: 58.2%
Total: 20000 W: 3365 L: 3348 D: 13287

STC, 2 threads:

ELO: -1.02 +-2.0 (95%) LOS: 16.4%
Total: 40000 W: 7087 L: 7204 D: 25709

Resolves #324
2015-04-09 20:32:36 +01:00
Marco Costalba
5d1b92e8f9 Introduce elapsed_time()
And reformat a bit time manager code.

Note that now we set starting search time in think() and
no more in ThreadPool::start_thinking(), the added delay
is less than 1 msec, so below timer resolution (5msec) and
should not affect time lossses ratio.

No functional change.
2015-04-03 04:19:26 +02:00
Marco Costalba
dc3a5f791e Allow Bitbases::init() to be called more than once
Currently if we call it more than once, we crash.

This is not a real problem, because this function is
indeed called just once. Nevertheless with this small fix,
that gets rid of a hidden 'static' variable, we cleanly
resolve the issue.

While there, fix also ThreadPool::exit to return in a
consistent state. Now all the init() functions but
UCI::init() are reentrant and can be called multiple
times.

No functional change.
2015-03-23 17:14:31 +01:00
Marco Costalba
be77406a55 Get rid of nativeThread
No functional change.
2015-03-23 09:02:52 +01:00
Marco Costalba
26dabb1e6b Use only one ConditionVariable to sync UI
To sync UI with main thread it is enough a single
condition variable because here we have a single
producer / single consumer design pattern.

Two condition variables are strictly needed just for
many producers / many consumers case.

Note that this is possible because now we don't send to
sleep idle threads anymore while searching, so that now
only UI can wake up the main thread and we can use the
same ConditionVariable for both threads.

The natural consequence is to retire wait_for_think_finished()
and move all the logic under MainThread class, yielding the
rename of teh function to join()

No functional change.
2015-03-21 07:55:33 +01:00
Marco Costalba
9a6cfee73b Simplify nosleep logic
Avoid redundant 'while' conditions. It is enough to
check them in the outer loop.

Quick tested for no regression 10K games at 4 threads
ELO: -1.32 +-3.9 (95%) LOS: 25.6%
Total: 10000 W: 1653 L: 1691 D: 6656

No functional change.
2015-03-18 08:01:50 +01:00
Marco Costalba
13d4df95cd Use acquire() and release() for spinlocks
It is more idiomatick than lock() and unlock()

No functional change.
2015-03-16 08:14:08 +01:00
Joona Kiiski
f04f50b368 Do not sleep, but yield
During the search, do not block on condition variable, but instead use std::this_thread::yield().

Clear gain with 16 threads. Again results vary highly depending on hardware, but on average it's a clear gain.

ELO: 12.17 +-4.3 (95%) LOS: 100.0%
Total: 7998 W: 1407 L: 1127 D: 5464

There is no functional change in single thread mode

Resolves #294
2015-03-15 19:45:30 +00:00
Joona Kiiski
d71f707040 Introduce yielding spin locks
Idea and original implementation by Stephane Nicolet

7 threads 15+0.05
ELO: 3.54 +-2.9 (95%) LOS: 99.2%
Total: 17971 W: 2976 L: 2793 D: 12202

There is no functional change in single thread mode
2015-03-14 19:14:52 +00:00
Joona Kiiski
81c7975dcd Use thread specific mutexes instead of a global one.
This is necessary to improve the scalability with high number of cores.

There is no functional change in a single thread mode.

Resolves #281
2015-03-11 21:59:34 +00:00
Marco Costalba
4b59347194 Retire spinlocks
Use Mutex instead.

This is in preparaation for merging with master branch,
where we stilll don't have spinlocks.

Eventually spinlocks will be readded in some future
patch, once c++11 has been merged.

No functional change.
2015-03-11 21:20:47 +01:00
Marco Costalba
8725494966 Add thread_win32.h header
Workaround slow std::thread implementation in mingw
and gcc for Windows with our own old low level thread
functions.

No functional change.
2015-03-10 12:42:40 +01:00
Marco Costalba
63a5fc2366 Rename available_to()
Change this API to be more natural and simple.

Inspired by a patch by Joona.

No functional change.
2015-03-01 12:33:05 +01:00
Marco Costalba
0b36ba74fc Don't assume the type of Time::point
But instead use the proper definition. Also
rewrite chrono functions while there.

No functional change.
2015-02-24 14:08:14 +01:00
Marco Costalba
38112060dc Use spinlock instead of mutex for Threads and SplitPoint
It is reported to be defenitly faster with increasing
number of threads, we go from a +3.5% with 4 threads
to a +15% with 16 threads.

The only drawback is that now when testing with more
threads than physical available cores, the speed slows
down to a crawl. This is expected and was similar at what
we had setting the old sleepingThreads to false.

No functional change.
2015-02-23 13:47:07 +01:00
Marco Costalba
7ff965eebf Improve comments in SMP code
No functional change.
2015-02-20 12:38:54 +01:00
Marco Costalba
40548c9153 Sync with master
bench: 7911944
2015-02-20 10:37:29 +01:00
Marco Costalba
950c8436ed Use size_t consistently across thread code
No functional change.
2015-02-19 10:43:28 +01:00
Marco Costalba
8d47caa16e Retire redundant sp->slavesCount field
It should be used slavesMask.count() instead.

Verified 100% equivalent when sp->allSlavesSearching:

dbg_hit_on(sp->allSlavesSearching, sp->slavesCount != sp->slavesMask.count());

No functional change.
2015-02-19 10:36:15 +01:00
Marco Costalba
dccaa145d2 Compute SplitPoint::spLevel on the fly
And retire a redundant field. This is important also
from a concept point of view becuase we want to keep
SMP structures as simple as possible with the only
strictly necessary data.

Verified with

dbg_hit_on(sp->spLevel != level)

that the values are 100% the same out of more 50K samples.

No functional change.
2015-02-18 21:50:35 +01:00
Joona Kiiski
d65f75c153 Improve smp performance for high number of threads
Balance threads between split points.

There are huge differences between different machines and autopurging makes it very difficult to measure the improvement in fishtest, but the following was recorded for 16 threads at 15+0.05:

    For Bravone (1000 games): 0 ELO
    For Glinscott (1000 games): +20 ELO
    For bKingUs (1000 games): +50 ELO
    For fastGM (1500 games): +50 ELO

The change was regression for no one, and a big improvement for some, so it should be fine to commit it.
Also for 8 threads at 15+0.05 we measured a statistically significant improvement:
ELO: 6.19 +-3.9 (95%) LOS: 99.9%
Total: 10325 W: 1824 L: 1640 D: 6861

Finally it was verified that there was no (significant) regression for

4 threads:
ELO: 0.09 +-2.8 (95%) LOS: 52.4%
Total: 19908 W: 3422 L: 3417 D: 13069

2 threads:
ELO: 0.38 +-3.0 (95%) LOS: 60.0%
Total: 19044 W: 3480 L: 3459 D: 12105

1 thread:
ELO: -1.27 +-2.1 (95%) LOS: 12.3%
Total: 40000 W: 7829 L: 7975 D: 24196

Resolves #258
2015-02-16 20:36:13 +00:00
Marco Costalba
65f46794af Implicit conversion from ExtMove to Move
Verified with perft there is no speed regression,
and code is simpler. It is also conceptually correct
becuase an extended move is just a move that happens
to have also a score.

No functional change.
2015-01-31 19:22:07 +01:00
Marco Costalba
3c07603dac Import C++11 branch
Import C++11 branch from:

https://github.com/mcostalba/Stockfish/tree/c++11

The version imported is teh last one as of today:
6670e93e50

Branch is fully equivalent with master but syzygy
tablebases that are missing (but will be added with
next commit).

bench: 8080602
2015-01-18 08:00:50 +01:00
Marco Costalba
4eb2d8ce09 Assorted headers cleanup
Mostly comments fixing and other small things.

No functional change.
2015-01-11 22:56:35 +01:00
Marco Costalba
42b48b08e8 Update copyright year
No functional change.
2015-01-10 11:46:28 +01:00
Marco Costalba
62f531254e Fix comments in thread.cpp
And reshuffle a bit the functions to place
them in a consistent order.

To be on the safe side, patch has been
validated for no regression/crashes with
a small 8K games test with 3 threads:

ELO: 3.98 +-4.4 (95%) LOS: 96.3%
Total: 8388 W: 1500 L: 1404 D: 5484

No functional change.
2015-01-03 09:34:58 +01:00
hxim
fbb53524ef Rename some variables for more clarity.
No functional change.

Resolves #131
2014-12-08 07:53:33 +08:00
Gary Linscott
4739037f96 100% accurate PV display
This gives SF accurate PVs, such that the evaluation of the leaf node in
the PV matches the score backed up to the root (99% of the time.
q-search will use the value stored in the hash table instead of the eval
value sometimes).

One drawback is that fail-high/low only get a minimal 2 move PV.

It doesn't add any additional overhead to the non-PV codepath except an
extra eight bytes to the SearchStack structure in multi-threaded
searches.

A core part of this is not pruning based on TT score in PV nodes. This
was measured as not being a regression at multiple TCs, except for one
exception, fast TC with huge hash, which is not realistic for longer
searches.

STC - 1 thread, 128 mb hash
ELO: 1.42 +-3.1 (95%) LOS: 81.9%
Total: 20000 W: 4078 L: 3996 D: 11926

STC - 3 thread, 128 mb hash
ELO: -3.60 +-2.9 (95%) LOS: 0.8%
Total: 20000 W: 3575 L: 3782 D: 12643

STC - 3 thread, 8 mb hash
ELO: 0.12 +-2.9 (95%) LOS: 53.3%
Total: 20000 W: 3654 L: 3647 D: 12699

LTC - 3 thread, 32mb hash
ELO: 2.29 +-2.0 (95%) LOS: 98.8%
Total: 35740 W: 5618 L: 5382 D: 24740

Bench: 6984058

Resolves #102
2014-11-12 16:16:33 -05:00
Marco Costalba
5cbcff55cc Rename ucioption.h to uci.h
We are going to add all UCI related
functions here, so first rename it
to a more proper name.

No functional change.
2014-10-26 19:39:46 +00:00