1
0
Fork 0
mirror of https://github.com/sockspls/badfish synced 2025-04-30 00:33:09 +00:00
Commit graph

182 commits

Author SHA1 Message Date
Marco Costalba
c0a1676a65 Retire spinlocks
Use Mutex instead.

This is in preparaation for merging with master branch,
where we stilll don't have spinlocks.

Eventually spinlocks will be readded in some future
patch, once c++11 has been merged.

No functional change.
2015-03-10 21:50:45 +01:00
Marco Costalba
04372316b3 Disable spinlocks
To allow testing on fishtest.

No functional change.
2015-03-10 12:47:49 +01:00
Marco Costalba
8725494966 Add thread_win32.h header
Workaround slow std::thread implementation in mingw
and gcc for Windows with our own old low level thread
functions.

No functional change.
2015-03-10 12:42:40 +01:00
Marco Costalba
a590d1d52d Re-enable spinlocks
For branch C++11, that doe snot run on fishtest,
there is no need of this kludge, let only master
have it.

No functional change.
2015-03-07 08:38:26 +01:00
Marco Costalba
6645115377 Allow to disable spinlocks
And use mutex instead. You may never want to do this.
It is a workaround to run c++11 on fishtest where many
machiens have HTenabled and this can be a problem when
number of cores set is higher than number of physical cores.

To disable spinlocks, just compile with -DNO_SPINLOCK flag

No functional change.
2015-03-01 17:16:05 +01:00
Marco Costalba
63a5fc2366 Rename available_to()
Change this API to be more natural and simple.

Inspired by a patch by Joona.

No functional change.
2015-03-01 12:33:05 +01:00
Marco Costalba
d3d26a94b3 Improve spinlock implementation
Calling lock.test_and_set() in a tight loop creates expensive
memory synchronizations among processors and penalize other
running threads. So syncronize only only once at the beginning
with fetch_sub() and then loop on a simple load() that puts much
less pressure on the system.

Reported about 2-3% speed up on various systems.

Patch by Ronald de Man.

No functional change.
2015-02-23 19:48:46 +01:00
Marco Costalba
38112060dc Use spinlock instead of mutex for Threads and SplitPoint
It is reported to be defenitly faster with increasing
number of threads, we go from a +3.5% with 4 threads
to a +15% with 16 threads.

The only drawback is that now when testing with more
threads than physical available cores, the speed slows
down to a crawl. This is expected and was similar at what
we had setting the old sleepingThreads to false.

No functional change.
2015-02-23 13:47:07 +01:00
Marco Costalba
775f8239d3 Introduce Spinlock class
Initialization is more complex than what I'd like due
to MSVC compatibility that for some reason does not like:

std::atomic_flag lock = ATOMIC_FLAG_INIT;

No functional change.
2015-02-23 13:37:46 +01:00
Marco Costalba
7ff965eebf Improve comments in SMP code
No functional change.
2015-02-20 12:38:54 +01:00
Marco Costalba
40548c9153 Sync with master
bench: 7911944
2015-02-20 10:37:29 +01:00
Marco Costalba
950c8436ed Use size_t consistently across thread code
No functional change.
2015-02-19 10:43:28 +01:00
Marco Costalba
8d47caa16e Retire redundant sp->slavesCount field
It should be used slavesMask.count() instead.

Verified 100% equivalent when sp->allSlavesSearching:

dbg_hit_on(sp->allSlavesSearching, sp->slavesCount != sp->slavesMask.count());

No functional change.
2015-02-19 10:36:15 +01:00
Marco Costalba
dccaa145d2 Compute SplitPoint::spLevel on the fly
And retire a redundant field. This is important also
from a concept point of view becuase we want to keep
SMP structures as simple as possible with the only
strictly necessary data.

Verified with

dbg_hit_on(sp->spLevel != level)

that the values are 100% the same out of more 50K samples.

No functional change.
2015-02-18 21:50:35 +01:00
Joona Kiiski
d65f75c153 Improve smp performance for high number of threads
Balance threads between split points.

There are huge differences between different machines and autopurging makes it very difficult to measure the improvement in fishtest, but the following was recorded for 16 threads at 15+0.05:

    For Bravone (1000 games): 0 ELO
    For Glinscott (1000 games): +20 ELO
    For bKingUs (1000 games): +50 ELO
    For fastGM (1500 games): +50 ELO

The change was regression for no one, and a big improvement for some, so it should be fine to commit it.
Also for 8 threads at 15+0.05 we measured a statistically significant improvement:
ELO: 6.19 +-3.9 (95%) LOS: 99.9%
Total: 10325 W: 1824 L: 1640 D: 6861

Finally it was verified that there was no (significant) regression for

4 threads:
ELO: 0.09 +-2.8 (95%) LOS: 52.4%
Total: 19908 W: 3422 L: 3417 D: 13069

2 threads:
ELO: 0.38 +-3.0 (95%) LOS: 60.0%
Total: 19044 W: 3480 L: 3459 D: 12105

1 thread:
ELO: -1.27 +-2.1 (95%) LOS: 12.3%
Total: 40000 W: 7829 L: 7975 D: 24196

Resolves #258
2015-02-16 20:36:13 +00:00
Marco Costalba
96e36a7897 Explicitly defaulted and deleted members
Better than a bit obscure implicit ones.

No functional change.
2015-01-21 13:18:19 +01:00
Marco Costalba
f53aea45e3 Add syzygy support
bench: 8080602
2015-01-18 08:27:46 +01:00
Marco Costalba
3c07603dac Import C++11 branch
Import C++11 branch from:

https://github.com/mcostalba/Stockfish/tree/c++11

The version imported is teh last one as of today:
6670e93e50

Branch is fully equivalent with master but syzygy
tablebases that are missing (but will be added with
next commit).

bench: 8080602
2015-01-18 08:00:50 +01:00
Marco Costalba
4eb2d8ce09 Assorted headers cleanup
Mostly comments fixing and other small things.

No functional change.
2015-01-11 22:56:35 +01:00
Marco Costalba
42b48b08e8 Update copyright year
No functional change.
2015-01-10 11:46:28 +01:00
Gary Linscott
4739037f96 100% accurate PV display
This gives SF accurate PVs, such that the evaluation of the leaf node in
the PV matches the score backed up to the root (99% of the time.
q-search will use the value stored in the hash table instead of the eval
value sometimes).

One drawback is that fail-high/low only get a minimal 2 move PV.

It doesn't add any additional overhead to the non-PV codepath except an
extra eight bytes to the SearchStack structure in multi-threaded
searches.

A core part of this is not pruning based on TT score in PV nodes. This
was measured as not being a regression at multiple TCs, except for one
exception, fast TC with huge hash, which is not realistic for longer
searches.

STC - 1 thread, 128 mb hash
ELO: 1.42 +-3.1 (95%) LOS: 81.9%
Total: 20000 W: 4078 L: 3996 D: 11926

STC - 3 thread, 128 mb hash
ELO: -3.60 +-2.9 (95%) LOS: 0.8%
Total: 20000 W: 3575 L: 3782 D: 12643

STC - 3 thread, 8 mb hash
ELO: 0.12 +-2.9 (95%) LOS: 53.3%
Total: 20000 W: 3654 L: 3647 D: 12699

LTC - 3 thread, 32mb hash
ELO: 2.29 +-2.0 (95%) LOS: 98.8%
Total: 35740 W: 5618 L: 5382 D: 24740

Bench: 6984058

Resolves #102
2014-11-12 16:16:33 -05:00
Joona Kiiski
eb50793cff Retire FakeSplit
- Currently broken
    - Never been really useful
    - Does not work well with new splitting model

Verified for no regression at STC with 3 threads:
LLR: 2.96 (-2.94,2.94) [-6.00,0.00]
Total: 81905 W: 12122 L: 12381 D: 57402

No functional change
2014-07-09 07:19:06 +08:00
Marco Costalba
9f843adf89 Retire "Idle Threads Sleep" UCI option
After last Joona's patch there is no measurable
difference between the option set or unset.

Tested by Andreas Strangmüller with 16 threads
on his Dual Opteron 6376.

After 5000 games at 15+0.05 the result is:

1 Stockfish_14050822_T16_on   : 3003  5000 (+849,=3396,-755), 50.9 %
2 Stockfish_14050822_T16_off  : 2997  5000 (+755,=3396,-849), 49.1 %

bench: 880215
2014-05-11 10:29:56 +02:00
Marco Costalba
7e3dba4f4c Reformat and simplify previous patch
No functional change.
2014-05-07 08:56:16 +02:00
Joona Kiiski
f6e98a924a Allow a slave to 'late join' another splitpoint
Instead of waiting to be allocated, actively search
for another split point to join when finishes its
search. Also modify split conditions.

This patch has been tested with 7 threads SMP and
passed both STC:

LLR: 2.97 (-2.94,2.94) [-1.50,4.50]
Total: 2885 W: 519 L: 410 D: 1956

And a reduced-LTC at  25+0.05
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 4401 W: 684 L: 566 D: 3151

Was then retested against regression in 3 thread case
at standard LTC of  60+0.05:

LLR: 2.96 (-2.94,2.94) [-4.00,0.00]
Total: 40809 W: 5446 L: 5406 D: 29957

bench: 8802105
2014-05-07 08:38:56 +02:00
Marco Costalba
55604f156b Fix issues detected by Coverity Scan
Most of Coverity Scan reports are false
positives, but in rare cases we have
confirmed (very small) issues.

No functional change.
2014-04-26 09:33:50 +02:00
Marco Costalba
aab5863dd4 Increase max threads to 128
Thanks to std::bitset we can easily increase
the limit of active threads above 64.

Thanks to Lucas Braesch for pointing at the
correct solution of using std::bitset.

No functional change.
2014-03-18 12:07:26 +01:00
Marco Costalba
a1a7bc84da Remove "Max Threads per Split Point" UCI option
Experimental patch to verify if drop of nps
in endgames at very long TC is due to this.

Suggested by Ronald de Man.

bench: 7451319
2014-03-15 21:26:04 +01:00
Marco Costalba
41641e3b1e Assorted tweaks from DON
Mainly renames and some little code style improvment,
inspired by looking at DON sources:

https://github.com/erashid/DON

No functional change.
2014-02-09 17:31:45 +01:00
Marco Costalba
c9dcda6ac4 Update copyright year
No functional change.
2014-01-02 01:49:18 +01:00
Lucas Braesch
f5727deee3 Remove threat move stuff
A great simplification that shows no regression
and it seems even a bit scalable.

Tested with fixed number of games:

Short TC
ELO: 0.60 +-2.1 (95%) LOS: 71.1%
Total: 39554 W: 7477 L: 7409 D: 24668

Long TC
ELO: 2.97 +-2.0 (95%) LOS: 99.8%
Total: 36424 W: 5894 L: 5583 D: 24947

bench: 8184352
2013-12-15 09:43:29 +01:00
Jerry Donald
a8af78c833 Another round of spelling fixes
And also renamed a loop variable while there.

No functional change.
2013-12-02 23:51:29 +01:00
Richard Lloyd
13a73f67c0 Big assorted spelling fixes
No functional change.
2013-12-02 20:29:35 +01:00
Marco Costalba
a3a0df92a3 Set timer to a fixed interval
And remove a complex (and broken) formula.

Indeed previous code was broken in case of TC with big
time increments where available_time() was too similar
to total time yielding to many time losses, so for instance:

go wtime 2600 winc 2600
info nodes 4432770 time 2601 <-- time forfeit!

maximum search time = 2530 ms
available_time = 2300 ms

For a reference and further details see:

https://groups.google.com/forum/?fromgroups=#!topic/fishcooking/dCPAvQDcm2E

Speed tested with bench disabling timer alltogheter vs timer set at
max resolution, showed we have no speed regressions both in single
core and when using all physical cores.

No functional change.
2013-11-01 08:56:15 +01:00
Marco Costalba
cca34e234c Drop 'is' prefix from query functions
Most but not all.

No functional change.
2013-09-28 06:47:59 -07:00
Marco Costalba
c65d67feb5 Revert "Use a per-thread array"
This reverts commit 800410eef1 and instead increases
stack size.

I went through the old emails with Daylen that reported the
crash issue on Mac OS X and was fixed by 0049d3f337.

It was reported default stack size for a thread in Mac OS X is 8
megabytes while the patch that we are reverting allows to reduce
stack size at max of about 217KB, so the reason for the crash was
only marginal in MAX_MOVES value. On those emails Daylen also
hinted how to increase stack size for Mac OS X to 16MB.

So prefer to increase stack size to 16MB instad of re-inventing
the wheel and do our home grown stack as we did with the patch
that we are now reverting (it will remain anyhow in git history
for documentation purposes).

No functional change.
2013-09-28 10:10:51 +02:00
Marco Costalba
800410eef1 Use a per-thread array for generated moves
This greately reduces stack usage and is a
prerequisite for next patch.

Verified with 40K games both in single and SMP
case that there are no regressions.

No functional change.
2013-09-27 08:44:36 +02:00
homoSapiensSapiens
e005270fb6 Use constants arguments where possible
No functional changes.
2013-08-16 09:57:21 +02:00
Marco Costalba
55948623e7 Rework Thread hierarchy
Introduce ThreadBase struct that is search
agnostic and just handles low level stuff,
and derive all the other specialized classes
form here.

In particular TimerThread does not hinerits
anymore all the search related stuff from Thread.

Also some renaming while there.

Suggested by Steven Edwards

No functional change.
2013-07-31 18:35:52 +02:00
Marco Costalba
4d46d29efe Fix a race at thread creation
At thread creation start_routine() is called
and from there the virtual function idle_loop()
because we do this inside Thread c'tor, where the
virtual mechanism is disabled, it could happen that
the base class idle_loop() is called instead.

The issue happens with TimerThread and MainThread
where, at launch, start_routine calls
Thread::idle_loop instead of the derived ones.

Normally this bug is hidden because c'tor finishes
before start_routine() is actually called in the
just created execution thread, but on some platforms
and in some cases this is not guaranteed and the
engine hangs.

Reported by Ted Wong on talkchess

No functional change.
2013-07-31 18:35:32 +02:00
homoSapiensSapiens
002062ae93 Use #ifndef instead of #if !defined
And #ifdef instead of #if defined

This is more standard form (see for example iostream file).

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2013-07-24 19:49:17 +02:00
Marco Costalba
3b8f66f8ac Introduce Cut/All node definitions
Follow Don Dailey definition of cut/all node:

"If the previous node was a cut node, we consider this an ALL node.
The only exception is for PV nodes which are a special case of ALL nodes.
In the PVS framework, the first zero width window searched from a PV
node is by our definition a CUT node and if you have to do a re-search
then it is suddenly promoted to a PV nodes (as per PVS search) and only
then can the cut and all nodes swap positions. In other words, these
internal search failures can force the status of every node in the subtree
to swap if it propagates back to the last PV nodes."

http://talkchess.com/forum/viewtopic.php?topic_view=threads&p=519741&t=47577

With this definition we have an hit rate higher than 90% on:

    if (!PvNode && depth > 4 * ONE_PLY)
        dbg_hit_on_c(cutNode, (bestValue >= beta));

And an hit rate of just 28% on:

    if (!PvNode && depth > 4 * ONE_PLY)
        dbg_hit_on_c(!cutNode, (bestValue >= beta));

No functional change.
2013-06-13 19:46:49 +02:00
Marco Costalba
db322e6a63 Revert "Store moves sent with "position" UCI command"
This reverts commit 0d68b523a3.

After easy move semplification this machinery is not
needed anymore (because of we don't need to know if a
root move is a recapture)

No functional change.
2013-03-04 09:29:46 +01:00
Marco Costalba
0d68b523a3 Store moves sent with "position" UCI command
Store all the game moves until current position.

This will be used by next patch.

No functional change.
2013-03-02 13:08:50 +01:00
Marco Costalba
c5ec94d0f1 Update copyright year
No functional change.
2013-02-19 07:54:14 +01:00
Marco Costalba
e5bc79fb9c Retire slavesPositions
Save the current active position in each Thread
instead of keeping a centralized array in struct
SplitPoint.

This allow to skip a memset() call at each split.

No functional change.
2013-02-08 11:45:33 +01:00
Marco Costalba
14c2c1395b Change slave_available() API
To return a pointer to the available
thread instead of a bool. This allows
to simplify the core loop in split().

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2013-02-06 20:48:26 +01:00
Marco Costalba
bf706c4a4f Slightly change split() API
This function "returns" two values: bestValue and bestMove

Instead of returning one and passing as pointer the other
be consistent and pass as pointers both.

No functional change.
2013-02-05 06:35:38 +01:00
Marco Costalba
1a414cd9cb Derive ThreadPool from std::vector
Prefer sub-classing to composition in this case.

No functional change.
2013-02-04 22:59:20 +01:00
Marco Costalba
91427c8242 Move split() under Thread
Previous renaming patch suggested this reformat:
when a better naming leads to a better code!

No functional change.
2013-02-04 22:17:04 +01:00