Use an eval cache instead of TT to store node
position evaluations.
It is already an improvment and, because it frees
two TT entry slots, paves the way to extend TT to
store both upper and lower bounds.
After 4855 games, single thread, 15"+0.05
Mod vs Orig 1165 -920 - 2770 ELO +18
bench: 5149248
And document why this is an hard limit. It
seems for some (lucky) people 32 threads
are not enough.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
With this patch series we want to introduce a per-thread
evaluation cache to store node evaluation and do not
rely anymore on the TT table for this.
This patch just introduces the infrastructure.
No functional change.
Handle also the SMP case. This has been quite tricky, not
trivial to enforce the node limit in SMP case becuase
with "helpful master" concept we can have recursive split
points and we cannot lock them all at once so there is the
risk of counting the same nodes more than once.
Anyhow this patch should be race free and counted nodes are
correct.
No functional change.
It is very difficult and risky to assure
that a running thread doesn't access a global
variable. This is currently true, but could
change in the future and we don't want to rely
on code that works 'by accident'. The threads
are still running when ThreadPool destructor is
called (after main() returns) and this could
lead to crashes if a thread accesses a global
that has been already freed. The solution is to
use an exit() function and call it while we are
still in main(), ensuring global variables are
still alive at threads termination time.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Before the search we setup the starting position doing all the
moves (sent by GUI) from start position to the position just
before to start searching.
To do this we use a set of StateInfo records used by each
do_move() call. These records shall be kept valid during all
the search because repetition draw detection uses them to back
track all the earlier positions keys. The problem is that, while
searching, the GUI could send another 'position' command, this
calls set_position() that clears the states! Of course a crash
follows shortly.
Before searching all the relevant parameters are copied in
start_searching() just for this reason: to fully detach data
accessed during the search from the UCI protocol handling.
So the natural solution would be to copy also the setup states.
Unfortunatly this approach does not work because StateInfo
contains a pointer to the previous record, so naively copying and
then freeing the original memory leads to a crash.
That's why we use two std::auto_ptr (one belonging to UCI and another
to Search) to safely transfer ownership of the StateInfo records to
the search, after we have setup the root position.
As a nice side-effect all the possible memory leaks are magically
sorted out for us by std::auto_ptr semantic.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
To mimics C++11 std::mutex and std::condition_variable,
also rename locks and condition variables to be more
uniform across the classes.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
This better mimics std::vector::operator[] and
fixes a warning with MSVC 64bit.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
We can detect the split point master also from within idle_loop,
so we can call the function without parameters and remove an
overloaded member hack in Thread class.
Note that we don't need to take a lock around curSplitPoint
when entering idle_loop() because if we are the master then
curSplitPoint cannot change under our feet (because is_searching
is set and so we cannot be reallocated), if we are a slave
we enter idle_loop() only upon Thread creation and in that case
is always splitPointsCnt == 0. This is true even in the very rare
case that curSplitPoint != NULL, if we have been already allocated
even before entering idle_loop().
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
After 6K games at 60" + 0.1 on QUAD with 4 threads
this implementation fails to show a measurable increase,
result is well within error bar.
Perhaps with 8 or more threads resut is better but we
don't have the hardware to test. So retire for now and
in case re-add in the future if it proves good on big
machines.
The only good news is that we don't have a regression and
implementation is stable and bug-free, so could be reused
somewhere in the future.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
The check for detecting when a split point has all the
slaves still running is done with:
slavesMask == allSlavesMask
When a thread reparents, slavesMask is increased, then, if
the same thread finishes, because there are no more moves,
slavesMask returns to original value and the above condition
returns to be true. So that the just finished thread immediately
reparents again with the same split point, then starts and
then immediately exits in a tight loop that ends only when a
second slave finishes, so that slavesMask decrements and the
condition becomes false. This gives a spurious and anomaly
high number of faked reparents.
With this patch, that rewrites the logic to avoid this pitfall,
the reparenting success rate drops to a more realistical 5-10%
for 4 threads case.
As a side effect note that now there is no more the limit of
maxThreadsPerSplitPoint when reparenting.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
In Young Brothers Wait Concept (YBWC) available slaves are
booked by the split point master, then start to search below
the assigned split point and, once finished, return in idle
state waiting to be booked by another master.
This patch introduces "Active Reparenting" so that when a
slave finishes its job on the assigned split point, instead
of passively waiting to be booked, searches a suitable active
split point and reprents itselfs to that split point. Then
immediately starts to search below the split point in exactly
the same way of the others split point's slaves. This reduces
to zero the time waiting in idle loop and should increase
scalability especially whit many (8 or more) cores.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Unfortunatly accessing thread local variable
is much slower than object data (see previous
patch log msg), so we have to revert to old code
to avoid speed regression.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Much faster then pthread_getspecific() but still a
speed regression against the original code.
Following are the nps on a bench:
Position
454165
454838
455433
tls
441046
442767
442767
ms (Win)
450521
447510
451105
ms (pthread)
422115
422115
424276
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
But use the newly introduced local storage
for this. A good code semplification and also
the correct way to go.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Use thread local storage to store a pointer to the thread we
are running on. This will allow to remove thread info from
Position class.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
A std::set (that is a rb_tree) seems really
overkill to store at most a handful of moves
and nothing in the common case.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
And add final touches to this long patch series.
All the series has been verified against regression with
20K games at fast TC.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
We store pointers instead of Thread objects because
Thread is not copy-constructible nor copy-assignable
and default ones are not suitable. So we cannot store
directly in a std::vector.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Associate platform OS thread to the Thread class instead of
creating it from ThreadsManager.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Split the data allocation, now done (mostly once)
in read_uci_options(), from the wake up and sleeping
of the slave threads upon entering/exiting the search.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
And introduce SPlitPoint bestMove to pass back the
best move after a split point.
This allow to define as const the search stack passed
to split.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
This method belongs to Thread, not to ThreadsManager.
Reshuffle stuff in thread.cpp while there.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Release split point lock before to wake up
master thread. This seems to increase speed
in case "sleeping threads" are used:
After 7792 games with 4 threads at very fast TC (2"+0.05)
Mod vs Orig 1722 - 1627 - 4443 ELO +4 (+- 5.1)
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
When allocating a slave we set both is_searching
and splitPoint under lock protection.
Unfortunatly the order in which the variables are
set is not defined. This article was very clarifying:
http://software.intel.com/en-us/blogs/2007/11/30/volatile-almost-useless-for-multi-threaded-programming/
So when in idle loop we test for is_searching and then
access splitPoint, it could happen that splitPoint is still
not updated leading to a possible crash.
Fix the race lock protecting splitPoint access.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Apart from some renaming the biggest change
is the retire of split_point_finished()
replaced by slavesMask flags. As a side
effect we now take also split point lock
when allocation available threads.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
pass references (Windows style) instead of
pointers (Posix style) as function arguments.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
And directly pass RootMoves instead of SearchMoves
to main thread. A class declaration is better suited
in a header and slims a bit the fatty search.cpp
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
We just need to verify if a legal move is among the
SearchMoves, so we don't need a vector for this.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Currently after a 'quit' command UI thread raises stop
signal, exits from uci_loop() and calls Threads.exit()
while the search threads are still active.
In Threads.exit() main thread is asked to terminate, but
if it is parked in idle_loop() it will exit and free its
resources (in particular the shared Movepicker object) while
sibling slaves are still active and this leads to a crash.
The fix is to let the UI thread always wait for main thread
to finish the search before to return from uci_loop().
Found by Valgrind when running with 8 threads.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Detach from the UI thread the input arguments used by
the search threads so that the UI thread is able to receive
and process any command sent by the GUI while other threads
keep searching.
With this patch there is no more need to block the UI
thread after a "stop", so it is a more reliable and
robust solution than the previous patch.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Unfortunatly xboard sends immediately the new position to
search after sending "stop" when we have a ponder miss.
Becuase main thread position is not copied but is referenced
directly from root position and the latter is modified by
the "position.." UCI command we end up with the working position
that changes under our feet while the search is still recovering
after the "stop" and this causes a crash.
This happens only with the (broken) xboard, native UCI does not
have this problem.
Reported by otello1984
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Use the starting thread to wait for GUI input and instead use
the other threads to search. The consequence is that now think()
is alwasy started on a differnt thread than the caller that
returns immediately waiting for input. This reformat greatly
simplifies the code and is more in line with the common way
to implement this feature.
As a side effect now we don't need anymore Makefile tricks
with sleep() to allow profile builds.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
The timer will be fired asynchronously to handle
time management flags, while other threads are
searching.
This implementation uses a thread waiting on a
timed condition variable instead of real timers.
This approach allow to reduce platform dependant
code to a minimum and also is the most portable given
that timers libraries are very different among platforms
and also the best ones are not compatible with olds
Windows.
Also retire the now unused polling code.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Instead of polling for input use a dedicated listener
thread to read commands from the GUI independently
from other threads.
To do this properly we have to delegate to the listener
all the reading from the GUI: while searching but also
while waiting for a command, like in std::getline().
So we have two possible behaviours: in-sync mode, in which
the thread mimics std::getline() and the caller blocks until
something is read from GUI, and async mode where the listener
continuously reads and processes GUI commands while other
threads are searching.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>