1
0
Fork 0
mirror of https://github.com/sockspls/badfish synced 2025-05-02 17:49:35 +00:00
Commit graph

204 commits

Author SHA1 Message Date
lucasart
a037e20f28 Syzygy pull (#639)
* factorize repetitive code in calc_key() and calc_key_from_pcs()

and do some cleanup while at it.

* sync with master

No functional change.
2016-04-18 14:08:23 +02:00
Marco Costalba
356147d99a Rewrite time formula
Time management is really too complex, our aim is
to simplify it, but for time being at least rewrite
in an understandable way.

No functional change.
2016-01-18 17:12:18 +01:00
Lyudmil Antonov
89723339d9 Assorted English grammar changes
No functional change

Resolves #567
2016-01-16 21:34:29 +00:00
Leonid Pechenik
9eceb894ac Adjust time used for move based on previous score
Use less time if evaluation is not worse than for previous move and even less time if in addition no fail low encountered for current iteration.

STC: 10+0.1
ELO: 5.37 +-2.9 (95%) LOS: 100.0%
Total: 20000 W: 3832 L: 3523 D: 12645

STC: 10+0.1
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 17527 W: 3334 L: 3132 D: 11061

LTC: 60+0.6
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 28233 W: 3939 L: 3725 D: 20569

LTC: 60+0.6
ELO: 2.43 +-1.4 (95%) LOS: 100.0%
Total: 60000 W: 8266 L: 7847 D: 43887

LTC: 60+0.06
LLR: 2.95 (-2.94,2.94) [-1.00,3.00]
Total: 38932 W: 5408 L: 5207 D: 28317

Resolves #547
2016-01-03 14:01:15 +00:00
ppigazzini
d4af15f682 Update AUTHORS and copyright notice
No functional change

Resolves #555
2016-01-02 09:43:51 +00:00
Marco Costalba
9742fb10fd Update Copyright year
No functional change.

Resolves #554
2016-01-01 10:17:36 +00:00
Marco Costalba
1b5b900a29 Move some globals into main thread scope
Make it explicit that those variables are not globals, but
are used only by main thread. I think it is a sensible
clarification because easy move is already tricky enough
and current patch makes the involved actors explicit.

No functional change.

Resolves #537
2015-12-27 19:29:16 +00:00
Marco Costalba
93195555ed Rewrite how threads are spawned
Instead of creating a running std::thread and
returning, wait in Thread c'tor that the native
thread of execution goes to sleep in idle_loop().

In this way we can simplify how search is started,
because when main thread is idle we are sure also
all other threads will be idle, in any case, even
at thread creation and startup.

After lazy smp went in, we can simpify and rewrite
a lot of logic that is now no more needed. This is
hopefully the final big cleanup.

Tested for no regression at 5+0.1 with 3 threads:
LLR: 2.95 (-2.94,2.94) [-5.00,0.00]
Total: 17411 W: 3248 L: 3198 D: 10965

No functional change.
2015-11-21 07:48:50 +01:00
Marco Costalba
76ed0ab501 Retire ThreadBase
Now that we don't have anymore TimerThread, there is
no need of this long class hierarchy.

Also assorted reformatting while there.

To verify no regression, passed at STC with 7 threads:
LLR: 2.97 (-2.94,2.94) [-5.00,0.00]
Total: 30990 W: 4945 L: 4942 D: 21103

No functional change.
2015-11-13 08:22:44 +01:00
Marco Costalba
9c9205860c Get rid of timer thread
Unfortunately std::condition_variable::wait_for()
is not accurate in general case and the timer thread
can wake up also after tens or even hundreds of
millisecs after time has elapsded. CPU load, process
priorities, number of concurrent threads, even from
other processes, will have effect upon it.

Even official documentation says: "This function may
block for longer than timeout_duration due to scheduling
or resource contention delays."

So retire timer and use a polling scheme based on a
local thread counter that counts search() calls and
a small trick to keep polling frequency constant,
independently from the number of threads.

Tested for no regression at very fast TC 2+0.05 th 7:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 32969 W: 6720 L: 6620 D: 19629

TC 2+0.05 th 1:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 7765 W: 1917 L: 1765 D: 4083

And at STC TC, both single thread
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 15587 W: 3036 L: 2905 D: 9646

And with 7 threads
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 8149 W: 1367 L: 1227 D: 5555

bench: 8639247
2015-11-03 11:27:00 +01:00
mbootsector
27c5cb5912 Pick bestmove from the deepest thread.
STC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 26930 W: 4441 L: 4214 D: 18275

LTC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 7783 W: 1017 L: 876 D: 5890

No functional change in single thread mode

Resolves #485
2015-11-02 10:05:43 +00:00
Marco Costalba
86f04dbcc0 Assorted trivia in search.cpp
The only interesting change is the moving of
stack[MAX_PLY+4] back to its original position
in id_loop (now renamed Thread::search).

No functional change.
2015-10-31 19:26:35 +01:00
Stéphane Nicolet
80d7556af7 Some code and comment cleanup
- Remove all references to split points
- Some grammar and spelling fixes

No Functional change

Resolves #478
2015-10-29 15:28:59 +00:00
lucasart
00d9e9fd28 Use atomics instead of volatile
Rely on well defined behaviour for message passing, instead of volatile. Three
versions have been tested, to make sure this wouldn't cause a slowdown on any
platform.

v1: Sequentially consistent atomics

No mesurable regression, despite the extra memory barriers on x86. Even with 15
threads and extreme time pressure, both acting as a magnifying glass:

threads=15, tc=2+0.02
ELO: 2.59 +-3.4 (95%) LOS: 93.3%
Total: 18132 W: 4113 L: 3978 D: 10041

threads=7, tc=2+0.02
ELO: -1.64 +-3.6 (95%) LOS: 18.8%
Total: 16914 W: 4053 L: 4133 D: 8728

v2: Acquire/Release semantics

This version generates no extra barriers for x86 (on the hot path). As expected,
no regression either, under the same conditions:

threads=15, tc=2+0.02
ELO: 2.85 +-3.3 (95%) LOS: 95.4%
Total: 19661 W: 4640 L: 4479 D: 10542

threads=7, tc=2+0.02
ELO: 0.23 +-3.5 (95%) LOS: 55.1%
Total: 18108 W: 4326 L: 4314 D: 9468

As suggested by Joona, another test at LTC:

threads=15, tc=20+0.05
ELO: 0.64 +-2.6 (95%) LOS: 68.3%
Total: 20000 W: 3053 L: 3016 D: 13931

v3: Final version: SeqCst/Relaxed

threads=15, tc=10+0.1
ELO: 0.87 +-3.9 (95%) LOS: 67.1%
Total: 9541 W: 1478 L: 1454 D: 6609

Resolves #474
2015-10-25 09:15:45 +00:00
Marco Costalba
307a5a4f63 Cleanup history stats
And other assorted trivia.

No functional change.
2015-10-24 17:29:12 +02:00
mbootsector
ecc5ff6693 Lazy SMP
Start all threads searching on root position and
use only the shared TT table as synching scheme.

It seems this scheme scales better than YBWC for
high number of threads.

Verified for nor regression at STC 3 threads
LLR: -2.95 (-2.94,2.94) [-3.00,1.00]
Total: 40232 W: 6908 L: 7130 D: 26194

Verified for nor regression at LTC 3 threads
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 28186 W: 3908 L: 3798 D: 20480

Verified for nor regression at STC 7 threads
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 3607 W: 674 L: 526 D: 2407

Verified for nor regression at LTC 7 threads
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 4235 W: 671 L: 528 D: 3036

Tested with fixed games at LTC with 20 threads
ELO: 44.75 +-7.6 (95%) LOS: 100.0%
Total: 2069 W: 407 L: 142 D: 1520

Tested with fixed games at XLTC (120secs) with 20 threads
ELO: 28.01 +-6.7 (95%) LOS: 100.0%
Total: 2275 W: 349 L: 166 D: 1760

Original patch of mbootsector, with additional work
from Ivan Ivec (log formula), Joerg Oster (id loop
simplification) and Marco Costalba (assorted formatting
and rework).

Bench: 8116244
2015-10-20 06:58:08 +02:00
Joona Kiiski
a7381d5e81 Fully yielding locks, no spinning
7 threads:

ELO: 2.00 +-2.7 (95%) LOS: 92.4%
Total: 20000 W: 3276 L: 3161 D: 13563

There is no functional change in single thread mode

Resolves #304
2015-03-24 21:34:19 +00:00
Marco Costalba
be77406a55 Get rid of nativeThread
No functional change.
2015-03-23 09:02:52 +01:00
Marco Costalba
26dabb1e6b Use only one ConditionVariable to sync UI
To sync UI with main thread it is enough a single
condition variable because here we have a single
producer / single consumer design pattern.

Two condition variables are strictly needed just for
many producers / many consumers case.

Note that this is possible because now we don't send to
sleep idle threads anymore while searching, so that now
only UI can wake up the main thread and we can use the
same ConditionVariable for both threads.

The natural consequence is to retire wait_for_think_finished()
and move all the logic under MainThread class, yielding the
rename of teh function to join()

No functional change.
2015-03-21 07:55:33 +01:00
Marco Costalba
13d4df95cd Use acquire() and release() for spinlocks
It is more idiomatick than lock() and unlock()

No functional change.
2015-03-16 08:14:08 +01:00
Joona Kiiski
d71f707040 Introduce yielding spin locks
Idea and original implementation by Stephane Nicolet

7 threads 15+0.05
ELO: 3.54 +-2.9 (95%) LOS: 99.2%
Total: 17971 W: 2976 L: 2793 D: 12202

There is no functional change in single thread mode
2015-03-14 19:14:52 +00:00
Joona Kiiski
81c7975dcd Use thread specific mutexes instead of a global one.
This is necessary to improve the scalability with high number of cores.

There is no functional change in a single thread mode.

Resolves #281
2015-03-11 21:59:34 +00:00
Marco Costalba
4b59347194 Retire spinlocks
Use Mutex instead.

This is in preparaation for merging with master branch,
where we stilll don't have spinlocks.

Eventually spinlocks will be readded in some future
patch, once c++11 has been merged.

No functional change.
2015-03-11 21:20:47 +01:00
Marco Costalba
04372316b3 Disable spinlocks
To allow testing on fishtest.

No functional change.
2015-03-10 12:47:49 +01:00
Marco Costalba
8725494966 Add thread_win32.h header
Workaround slow std::thread implementation in mingw
and gcc for Windows with our own old low level thread
functions.

No functional change.
2015-03-10 12:42:40 +01:00
Marco Costalba
a590d1d52d Re-enable spinlocks
For branch C++11, that doe snot run on fishtest,
there is no need of this kludge, let only master
have it.

No functional change.
2015-03-07 08:38:26 +01:00
Marco Costalba
6645115377 Allow to disable spinlocks
And use mutex instead. You may never want to do this.
It is a workaround to run c++11 on fishtest where many
machiens have HTenabled and this can be a problem when
number of cores set is higher than number of physical cores.

To disable spinlocks, just compile with -DNO_SPINLOCK flag

No functional change.
2015-03-01 17:16:05 +01:00
Marco Costalba
63a5fc2366 Rename available_to()
Change this API to be more natural and simple.

Inspired by a patch by Joona.

No functional change.
2015-03-01 12:33:05 +01:00
Marco Costalba
d3d26a94b3 Improve spinlock implementation
Calling lock.test_and_set() in a tight loop creates expensive
memory synchronizations among processors and penalize other
running threads. So syncronize only only once at the beginning
with fetch_sub() and then loop on a simple load() that puts much
less pressure on the system.

Reported about 2-3% speed up on various systems.

Patch by Ronald de Man.

No functional change.
2015-02-23 19:48:46 +01:00
Marco Costalba
38112060dc Use spinlock instead of mutex for Threads and SplitPoint
It is reported to be defenitly faster with increasing
number of threads, we go from a +3.5% with 4 threads
to a +15% with 16 threads.

The only drawback is that now when testing with more
threads than physical available cores, the speed slows
down to a crawl. This is expected and was similar at what
we had setting the old sleepingThreads to false.

No functional change.
2015-02-23 13:47:07 +01:00
Marco Costalba
775f8239d3 Introduce Spinlock class
Initialization is more complex than what I'd like due
to MSVC compatibility that for some reason does not like:

std::atomic_flag lock = ATOMIC_FLAG_INIT;

No functional change.
2015-02-23 13:37:46 +01:00
Marco Costalba
7ff965eebf Improve comments in SMP code
No functional change.
2015-02-20 12:38:54 +01:00
Marco Costalba
40548c9153 Sync with master
bench: 7911944
2015-02-20 10:37:29 +01:00
Marco Costalba
950c8436ed Use size_t consistently across thread code
No functional change.
2015-02-19 10:43:28 +01:00
Marco Costalba
8d47caa16e Retire redundant sp->slavesCount field
It should be used slavesMask.count() instead.

Verified 100% equivalent when sp->allSlavesSearching:

dbg_hit_on(sp->allSlavesSearching, sp->slavesCount != sp->slavesMask.count());

No functional change.
2015-02-19 10:36:15 +01:00
Marco Costalba
dccaa145d2 Compute SplitPoint::spLevel on the fly
And retire a redundant field. This is important also
from a concept point of view becuase we want to keep
SMP structures as simple as possible with the only
strictly necessary data.

Verified with

dbg_hit_on(sp->spLevel != level)

that the values are 100% the same out of more 50K samples.

No functional change.
2015-02-18 21:50:35 +01:00
Joona Kiiski
d65f75c153 Improve smp performance for high number of threads
Balance threads between split points.

There are huge differences between different machines and autopurging makes it very difficult to measure the improvement in fishtest, but the following was recorded for 16 threads at 15+0.05:

    For Bravone (1000 games): 0 ELO
    For Glinscott (1000 games): +20 ELO
    For bKingUs (1000 games): +50 ELO
    For fastGM (1500 games): +50 ELO

The change was regression for no one, and a big improvement for some, so it should be fine to commit it.
Also for 8 threads at 15+0.05 we measured a statistically significant improvement:
ELO: 6.19 +-3.9 (95%) LOS: 99.9%
Total: 10325 W: 1824 L: 1640 D: 6861

Finally it was verified that there was no (significant) regression for

4 threads:
ELO: 0.09 +-2.8 (95%) LOS: 52.4%
Total: 19908 W: 3422 L: 3417 D: 13069

2 threads:
ELO: 0.38 +-3.0 (95%) LOS: 60.0%
Total: 19044 W: 3480 L: 3459 D: 12105

1 thread:
ELO: -1.27 +-2.1 (95%) LOS: 12.3%
Total: 40000 W: 7829 L: 7975 D: 24196

Resolves #258
2015-02-16 20:36:13 +00:00
Marco Costalba
96e36a7897 Explicitly defaulted and deleted members
Better than a bit obscure implicit ones.

No functional change.
2015-01-21 13:18:19 +01:00
Marco Costalba
f53aea45e3 Add syzygy support
bench: 8080602
2015-01-18 08:27:46 +01:00
Marco Costalba
3c07603dac Import C++11 branch
Import C++11 branch from:

https://github.com/mcostalba/Stockfish/tree/c++11

The version imported is teh last one as of today:
6670e93e50

Branch is fully equivalent with master but syzygy
tablebases that are missing (but will be added with
next commit).

bench: 8080602
2015-01-18 08:00:50 +01:00
Marco Costalba
4eb2d8ce09 Assorted headers cleanup
Mostly comments fixing and other small things.

No functional change.
2015-01-11 22:56:35 +01:00
Marco Costalba
42b48b08e8 Update copyright year
No functional change.
2015-01-10 11:46:28 +01:00
Gary Linscott
4739037f96 100% accurate PV display
This gives SF accurate PVs, such that the evaluation of the leaf node in
the PV matches the score backed up to the root (99% of the time.
q-search will use the value stored in the hash table instead of the eval
value sometimes).

One drawback is that fail-high/low only get a minimal 2 move PV.

It doesn't add any additional overhead to the non-PV codepath except an
extra eight bytes to the SearchStack structure in multi-threaded
searches.

A core part of this is not pruning based on TT score in PV nodes. This
was measured as not being a regression at multiple TCs, except for one
exception, fast TC with huge hash, which is not realistic for longer
searches.

STC - 1 thread, 128 mb hash
ELO: 1.42 +-3.1 (95%) LOS: 81.9%
Total: 20000 W: 4078 L: 3996 D: 11926

STC - 3 thread, 128 mb hash
ELO: -3.60 +-2.9 (95%) LOS: 0.8%
Total: 20000 W: 3575 L: 3782 D: 12643

STC - 3 thread, 8 mb hash
ELO: 0.12 +-2.9 (95%) LOS: 53.3%
Total: 20000 W: 3654 L: 3647 D: 12699

LTC - 3 thread, 32mb hash
ELO: 2.29 +-2.0 (95%) LOS: 98.8%
Total: 35740 W: 5618 L: 5382 D: 24740

Bench: 6984058

Resolves #102
2014-11-12 16:16:33 -05:00
Joona Kiiski
eb50793cff Retire FakeSplit
- Currently broken
    - Never been really useful
    - Does not work well with new splitting model

Verified for no regression at STC with 3 threads:
LLR: 2.96 (-2.94,2.94) [-6.00,0.00]
Total: 81905 W: 12122 L: 12381 D: 57402

No functional change
2014-07-09 07:19:06 +08:00
Marco Costalba
9f843adf89 Retire "Idle Threads Sleep" UCI option
After last Joona's patch there is no measurable
difference between the option set or unset.

Tested by Andreas Strangmüller with 16 threads
on his Dual Opteron 6376.

After 5000 games at 15+0.05 the result is:

1 Stockfish_14050822_T16_on   : 3003  5000 (+849,=3396,-755), 50.9 %
2 Stockfish_14050822_T16_off  : 2997  5000 (+755,=3396,-849), 49.1 %

bench: 880215
2014-05-11 10:29:56 +02:00
Marco Costalba
7e3dba4f4c Reformat and simplify previous patch
No functional change.
2014-05-07 08:56:16 +02:00
Joona Kiiski
f6e98a924a Allow a slave to 'late join' another splitpoint
Instead of waiting to be allocated, actively search
for another split point to join when finishes its
search. Also modify split conditions.

This patch has been tested with 7 threads SMP and
passed both STC:

LLR: 2.97 (-2.94,2.94) [-1.50,4.50]
Total: 2885 W: 519 L: 410 D: 1956

And a reduced-LTC at  25+0.05
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 4401 W: 684 L: 566 D: 3151

Was then retested against regression in 3 thread case
at standard LTC of  60+0.05:

LLR: 2.96 (-2.94,2.94) [-4.00,0.00]
Total: 40809 W: 5446 L: 5406 D: 29957

bench: 8802105
2014-05-07 08:38:56 +02:00
Marco Costalba
55604f156b Fix issues detected by Coverity Scan
Most of Coverity Scan reports are false
positives, but in rare cases we have
confirmed (very small) issues.

No functional change.
2014-04-26 09:33:50 +02:00
Marco Costalba
aab5863dd4 Increase max threads to 128
Thanks to std::bitset we can easily increase
the limit of active threads above 64.

Thanks to Lucas Braesch for pointing at the
correct solution of using std::bitset.

No functional change.
2014-03-18 12:07:26 +01:00
Marco Costalba
a1a7bc84da Remove "Max Threads per Split Point" UCI option
Experimental patch to verify if drop of nps
in endgames at very long TC is due to this.

Suggested by Ronald de Man.

bench: 7451319
2014-03-15 21:26:04 +01:00