This was subtle and google was my friend.
The leak was in _dl_allocate_tls called by pthread_create() and
is due to the fact that threads are created in joinable state so that
once terminated are not freed. To make the thread to release
its resources upon termination we should set them in detached state.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Fix warning: "Source and destination overlap in memcpy"
This happens when we call multiple time do_move() with the
same state, for instance when we don't need to undo the move.
This is what valgrind docs say:
You don't want the two blocks to overlap because one of them could
get partially overwritten by the copying.
You might think that Memcheck is being overly pedantic reporting this
in the case where 'dst' is less than 'src'. For example, the obvious way
to implement memcpy() is by copying from the first byte to the last.
However, the optimisation guides of some architectures recommend copying
from the last byte down to the first. Also, some implementations of
memcpy() zero 'dst' before copying, because zeroing the destination's
cache line(s) can improve performance.
In addition, for many of these functions, the POSIX standards have wording
along the lines "If copying takes place between objects that overlap,
the behavior is undefined." Hence overlapping copies violate the standard.
The moral of the story is: if you want to write truly portable code, don't
make any assumptions about the language implementation.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
It seems we have a speed regression under Linux, anyhow
commit and revert to leave some documentation in case we
want to try again in the future.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Let to sleep even split point master, it will be waken up
by its slaves when they return from the search.
With this patch we get maximum HT speedup
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
It has more sense to treat the two evaluation metrics
in the same way.
As a side effect now we use the correct eval margin when
pruning in a SplitPoint node.
No functional change in single thread.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Fix the movcount updating bug and let search() to completely
subsititute sp_search().
No functional change even with fakes split.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
There is a bug in the conversion that is triggered when testing
with faked split and that I missed somehow :-(
To allow proper testing on cluster restore old sp_search()
until I don't fiugre up what's happened.
Restored to be functional equivalent to old behaviour both in
single thread and in faked split.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
When entering and exiting from think() we don't need any special
wake up / sleeping code because we want available threads to keep
sleeping.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
This simple patch has devastating consequences ;-)
Now an available thread goes to sleep and is waked up after
being allocated.
This patch allows Stockfish to dramatically increase performances
on HyperThreading systems.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
They are fast and also have the same semantic of Linux ones.
This allow to simplify the code and especially to use
SleepConditionVariableSRW() to wait on a condition releaseing the lock,
this has the same semantic as pthread_cond_wait().
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
This is a prerequisite for future work and anyhow removes
a state flag, so it is good anyhow.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
It is redundant and complicates the already complicated
SMP code for no reason.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
We already do this for locks. Also rename SitIdleEvent
in WaitCond to be uniform with Lunix naming.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
This is the native way done in Windows and we will use it
for future work, so change Linux to do the same.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Plus some other icc warnings popped up with new and strictier
compile options.
No functional and speed change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
It has more sense to treat the two evaluation metrics
in the same way.
As a side effect now we use the correct eval margin when
pruning in a SplitPoint node.
No functional change in single thread.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Rewrite sp_search() to have same signature of search()
This is the first prerequistite step toward unification.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Actually it is an error to update back moveCount value after split()
because it is used in update_history() to access movesSearched[]
array. But becasue this vector is not updated in the split point
we end up with an access of stale data.
Bug has been hidden til now because we 'forgot' to update
moveCount before returning from split().
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Mostly suggested by Justin (UncombedCoconut), the 0ULL -> 0 conversion
is mine.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Language guarantees that c'tor is called, but without any c'tor
it happens to work by accident because OS zeroes out the freshly
allocated pages. The problem is that if I deallocate and allocate
again, the second time pages are no more newly come by the OS and
so could contain stale info.
A practical case could be if we change TT size or numbers of
threads on the fly while already running.
Bug spotted by Justin Blanchard.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Fix release to workaround chess960 on some GUIs
Signature is:
stockfish bench 128 1 12 default depth
Node counts: 10914593
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Get rid of macros and use templates instead,
this is safer and allows us fix the warning:
ISO C++ forbids braced-groups within expressions
That broke compilation with -pedantic flag under
gcc and POPCNT enabled.
No functional and no performance change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
This patch from Joona greatly reduces move count pruning,
below is the old and new move count limits starting from
ONE_PLY with half-play increment:
Old: 4,5,5,5, 7, 7,11,11,11,19,19,19,35,35
New: 4,5,7,9,12,15,19,23,28,33,39,45,52,59
Surprisingly results are even a bit better at a quite
fast time control.
After 5260 games at 30"+0.1
Mod - Orig: 864 - 806 - 3590 ELO +3 (+- 3.8)
Signed-off-by: Marco Costalba <mcostalba@gmail.com>