1
0
Fork 0
mirror of https://github.com/sockspls/badfish synced 2025-04-30 16:53:09 +00:00
Commit graph

778 commits

Author SHA1 Message Date
Marco Costalba
9e4befe3f1 Use pointers instead of array indices in MovePicker
This avoids calculating the array entry position
at each access and gives another boost of almost 1%.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-29 06:48:31 +01:00
Marco Costalba
6cf28d4aa7 Change the flow in wich moves are generated and picked
In MovePicker we get the next move with pick_move_from_list(),
then check if the return value is equal to MOVE_NONE and
in this case we update the state to the new phase.

This patch reorders the flow so that now from pick_move_from_list()
renamed get_next_move() we directly call go_next_phase() to
generate and sort the next bunch of moves when there are no more
move to try. This avoids to always check for pick_move_from_list()
returned value and the flow is more linear and natural.

Also use a local variable instead of a pointer dereferencing in a
time critical switch statement in get_next_move()

With this patch alone we have an incredible speed up of 3.2% !!!

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-27 19:56:26 +01:00
Marco Costalba
129cde008c Disable again null move at depth == OnePly
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-26 16:59:58 +01:00
Joona Kiiski
b088f0aefd Use special null move technique in low depth.
Try good captures before null move when depth < 3 * OnePly.
Use this kind of null move also in Depth == OnePly.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-26 16:30:39 +01:00
Joona Kiiski
a5d699d62f Use nullMove only through MovePicker.
No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-26 16:30:35 +01:00
Joona Kiiski
f6d2452916 Add Null move support to MovePicker.
No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-26 16:29:18 +01:00
Joona Kiiski
268c53ac51 Create useNullMove local variable
No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-26 15:42:58 +01:00
Marco Costalba
595a90dfd0 Clean killers handling in movepicker
Original patch from Joona with added optimizations
by me.

Great cleanup of MovePicker with speed improvment of 1%

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-26 15:38:47 +01:00
Marco Costalba
e217407450 Micro-optimze extension()
Explicitly write the conditions for pawn to 7th
and passed pawn instead of wrapping in redundant
helpers.

Also retire the now unused move_is_pawn_push_to_7th()
and the never used move_was_passed_pawn_push() and
move_is_deep_pawn_push()

Function extension() is so time critical that this
simple patch speeds up the pgo compile of 0.5% and
it is also more clear what actually happens there.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-25 15:11:05 +01:00
Marco Costalba
2078878376 Merge branch 'master' of git-Stockfish@free2.projectlocker.com:sf 2009-08-23 18:57:11 +01:00
Marco Costalba
d1d4437699 Remove a local variable from pop_1st_bit()
Remove the 'b' uint32_t local variable.
Optimized assembly is more or less the same
(one 'mov' instruction less), but now it is
written in a way more similar to the final assembly
flow so it should be easier for compiler to optimize.

Also guarantee that BitTable[] is always aligned.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-23 18:55:07 +01:00
Marco Costalba
ba04eb0446 Poly ampli+bias values after 73831 games
Verified correct against tuning branch.

After 999 games at 1+0

Mod vs Orig +257 =510 -232 51.20%  +9 ELO

Very small increase but an increase anyway !

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-23 18:51:01 +01:00
Tord Romstad
ed347e7cbd Added a few new targets to the Makefile for OS X with icpc.
The following new targets were added:
   * osx-icc32: 32-bit x86 compiled with icpc.
   * osx-icc64: 64-bit x86 compiled with icpc.
   * osx-icc32-profile: 32-bit x86 compiled with icpc and pgo.
   * osx-icc64-profile: 64-bit x86 compiled with icpc and pgo.
2009-08-21 10:50:34 +02:00
Marco Costalba
95af1e28be Fix some asserts raised by is_ok()
There were two asserts.

The first was raised because is_ok() was called at the
beginning of do_castle_move() and this is wrong after
the last code reformatting because at that point the state
is already modified by the caller do_move().

The second, raised by debugIncrementalEval, was due to a
rounding error in compute_value() that occurs because
TempoValueEndgame was updated in an odd number by patch

"Merge Joona Kiiski evaluation tweaks" (3ed603cd) of 13/3/2009

This line in compute_value() is the guilty one:

result += (side_to_move() == WHITE)? TempoValue / 2 : -TempoValue / 2;

The fix is to increment TempoValueEndgame so to be even.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-20 17:48:52 +01:00
Tord Romstad
e9aa20ad13 Fixed incorrect material key update when making promotion moves. 2009-08-20 16:54:20 +02:00
Marco Costalba
e01fefbbaf More use of memset() in Position::clear()
No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-18 21:21:28 +01:00
Marco Costalba
e4fc957898 Little do_move() micro optimizations
Also a few remaining style touches.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-18 08:58:19 +01:00
Marco Costalba
693b38a5e7 Better clarify how pieceList[] and index[] work
Rearrange the code a bit to be more self-documenting.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-17 23:15:35 +01:00
Marco Costalba
fbec55e52e Unify patch series summary
This patch seems bigger then what actually is.

It just moves some code around and adds a bit of coding style fixes
to do_move() and undo_move() so to have uniformity of naming in both
functions.

The diffstat for the whole patch series is

239 insertions(+), 426 deletions(-)

And final MSVC pgo build is even a bit faster:

Before 448.051 nodes/sec

After 453.810  nodes/sec (+1.3%)

No functional change (tested on more then 100M of nodes)

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-17 15:09:20 +01:00
Marco Costalba
05e70d6740 Unify undo_ep_move(m)
Integrate undo_ep_move in undo_move() this reduces line count
and code readibility.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-17 14:48:45 +01:00
Marco Costalba
b4cb1a3a9e Unify undo_promotion_move()
Integrate do_ep_move in undo_move() this reduces line count
and code readibility.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-17 14:48:33 +01:00
Marco Costalba
ec14fb1b33 Unify do_promotion_move()
Integrate do_promotion_move() in do_move() this reduces line count
and code readibility.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-17 14:48:20 +01:00
Marco Costalba
cb506d3b16 Unify do_ep_move()
Integrate do_ep_move in do_move() this reduces line count
and code readibility.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-17 14:47:12 +01:00
Marco Costalba
e0c47a6ceb L1/L2 friendly PhaseTable[]
In Movepicker c'tor we access during initialization one of
MainSearchPhaseIndex..QsearchWithoutChecksPhaseIndex globals.

Postpone definition of PhaseTable[] just after them so that
when PhaseTable[] will be accessed later in get_next_move()
it will be already present in L1/L2.

It works like an implicit prefetching of PhaseTable[].

Also shrink PhaseTable[] to fit an L1 cache line of 16 bytes
using uint8_t instead of int.

This apparentely innocuous patch gives an astonish speed
up of 1.6% under MSVC 2010 beta, pgo optimized !

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-15 16:09:10 +01:00
Marco Costalba
f3d0b76feb Use optimized pop_1st_bit() under Windows 64 with icc
Intel compiler can handle this code even under Windows.

So lift the costrain.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-14 12:47:49 +01:00
Marco Costalba
bfd4421f49 Better naming and document some endgame functions
In particular the generic scaling functions.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-14 08:19:55 +01:00
Marco Costalba
fd12e8cb23 Finally fix prefetch on Linux
It was due to a missing -msse compiler option !

Without this option the CPU silently discards
prefetcht2 instructions during execution.

Also added a (gcc documented) hack to prevent Intel
compiler to optimize away the prefetches.

Special thanks to Heinz for testing and suggesting
improvments. And for Jim for testing icc on Windows.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-14 08:13:42 +01:00
Marco Costalba
166c09a7a0 Reuse 5 slots instead of 4
But this time with the guarantee of an always aligned
access so that prefetching is not adversely impacted.

On Joona PC
1+0, 64Mb hash:

Orig - Mod: 174 - 237 - 359

Instead after 1000 games at 1+0 with 128MB hash size
we are at + 1 ELO (just 4 games of difference).

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-14 08:13:13 +01:00
Marco Costalba
8d369600ec Double prefetch on Windows
After fixing the cpu frequency with RightMark tool I was
able to test speed all the different prefetch combinations.

Here the results:

OS Windows Vista 32bit, MSVC compile
CPU Intecl Core 2 Duo T5220 1.55 GHz
bench on depth 12, 1 thread, 26552844 nodes searched
results in nodes/sec

no-prefetch
402486, 402005, 402767, 401439, 403060

single prefetch (aligned 64)
410145, 409159, 408078, 410443, 409652

double prefetch (aligned 64) 0+32
414739, 411238, 413937, 414641, 413834

double prefetch (aligned 64) 0+64
413537, 414337, 413537, 414842, 414240

And now also some crazy stuff:

single prefetch (aligned 128)
410145, 407395, 406230, 410050, 409949

double prefetch (aligned 64) 0+0
409753, 410044, 409456

single prefetch (aligned 64) +32
408379, 408272, 406809

single prefetch (aligned 64) +64
408279, 409059, 407395

So it seems the best is a double prefetch at the addres + 32 or +64,
I will choose the second one because it seems more natural to me.

It is still a mystery why it doesn't work under Linux :-(

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-10 22:35:08 +01:00
Marco Costalba
f4140ecc0c Avoid Intel compiler optimizes away prefetching
Without this hack Intel compiler happily optimizes
away the gcc builtin call.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-10 13:49:12 +01:00
Marco Costalba
60b5da4cc8 Use aligned prefetch address
Prefetch always form a chache line boundary. It seems
that if prefetch address is not cache line aligned then
performance is adversely impacted.

Hopefully we will resuse that 32 bits of padding for something
useful in the future.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-10 13:49:00 +01:00
Marco Costalba
55c46b2399 Remove old BishopPairBonus constants
Now that we have poly imbalance these ones
are no more used.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-10 13:47:39 +01:00
Marco Costalba
76ae0e36be Enable prefetch also for gcc
This fix a compile error under Linux with gcc when
there aren't the intel dev libraries.

Also simplify the previous patch moving TT definition
from search.cpp to tt.cpp so to avoid using passing a
pointer to TT to the current position.

Finally simplify do_move(), now we miss a prefetch in the
rare case of setting an en-passant square but code is
much cleaner and performance penalty is almost zero.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-10 01:42:35 +01:00
Marco Costalba
4251eac860 Try to prefetch as soon as position key is ready
Move prefetching code inside do_move() so to allow a
very early prefetching and to put as many instructions
as possible between prefetching and following retrieve().

With this patch retrieve() times are cutted of another 25%

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-09 16:45:37 +01:00
Marco Costalba
cd4604b05c Add TT prefetching support
TT.retrieve() is the most time consuming function
because almost always involves a very slow RAM access.

TT table is so big that is never cached. This patch
prefetches TT data just after a move is done, so that
subsequent TT.retrieve will be very  fast.

Profiling with VTune shows that TT:retrieve() times are
almost cutted in half !

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-09 14:18:15 +01:00
Marco Costalba
e6863f46de Use 5 TTEntry slots instead of 4
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-09 04:42:26 +01:00
Marco Costalba
6f1475b6fc Use 32 bit key in TT
Shrink key to 32 bits instead of 64. To still avoid
collisions use the high 32 bits of position key as the
TT key and the low 32 bits to retrieve the correct
cluster index in the table.

With this patch size og TTentry shrinks to 96 bits instead
of 128 and the cluster of 4 TTEntry sums to 48 bytes instead
of 64.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-09 04:42:07 +01:00
Marco Costalba
4a777954e1 Makefile: added 'make strip' target
Binaries are always built with symbol table in to easy
debugging and profiling.

It is now possible to run:

make strip

To remove symbol table from the compiled binary. This
could be useful to prepare the release version.

Patch by Heinz van Saanen.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-08 17:37:13 +01:00
Marco Costalba
54382f8b07 Let LMR at root be independent of MultiPV value
Current formula enable LMR when

i + MultiPV >= LMRPVMoves

It means that, for instance, if MultiPV == 1 then LMR
will be started to be considered at move i = LMRPVMoves - 1,
while if MultiPV == 3 then it will start before,
at move i = LMRPVMoves - 3.

With this patch the formula becomes

i >= MultiPV + LMRPVMoves - 2

So that LMR will always start after LMRPVMoves - 1 moves
from the last PV move.

No functional change when MultiPV == 1

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-08 17:30:46 +01:00
Marco Costalba
339bb8a524 Speed up polynomial material imbalance loop
Access pos.piece_count() only once and avoid some
branches in the inner loop.

Profiling with VTune shows a 20% speed improvement in
get_material_info(), and it is also a bit more cleaned
up this way ;-)

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-08 14:12:04 +01:00
Marco Costalba
aa925a0e29 There is no need to special case KNNK ending
It is always draw, so use the corresponding proper
evaluation function.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-08 13:10:10 +01:00
Marco Costalba
23ceb66950 Move halfOpenFiles[] calculation out of a loop
And put it in an already existing one so to
optimze a bit.

Also additional cleanups and code shuffles
all around the place.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-08 09:21:42 +01:00
Marco Costalba
565d12bf42 Compile without DEBUG flag by default
And build also symbol table. It can easily stripped
after .exe is done and it is necessary for profiling.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-08 09:21:29 +01:00
Marco Costalba
00eab73399 Revert material balance values after 100000 games
After Joona's direct testing with ~2000 games it seems
values after 100.000 games does not give any advantage,
so revert for now.

Score of Stockfish_0 vs Stockfish_15: 491 - 392 - 1102
Score of Stockfish_0 vs Stockfish_40: 461 - 439 - 1076
Score of Stockfish_0 vs Stockfish_65: 442 - 518 - 1018 (13 elo)
Score of Stockfish_0 vs Stockfish_100: 504 - 502 - 984

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-08 03:49:49 +01:00
Joona Kiiski
5be3d98d17 Do not adjust Minimum Split Depth automatically
Currently minimum split depth is set automatically to 6
when number of CPUs is more than 4. I believe this is a bad
idea since for example my quad (4CPU with hyperthreading) is
detected as 8CPU computer. I've manually lowered down the number
of Threads, but so far I have played all games with Minimum
Split Depth set to 6!

Since 4CPU computers with hyperthreading are quite common and
8 CPU computers extremely rear (I expect we can get a direct
jump to 16 or 32 cores), this automatic adjusting is likely
to do more harm than good. Add a note in Readme.txt, so that
those rear 8CPU owners can manually tweak the "Minimum Split
Depth" parameter

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-08 03:36:20 +01:00
Marco Costalba
5b3fcab1ad Polished Makefile for *nix
Greately improved Makefile from Heinz van Saanen

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-08 03:30:27 +01:00
Tord Romstad
977ca40d6d Supply the "upperbound" and "lowerbound" parameters in UCI search
output when the score is outside the root window.
2009-08-07 16:26:24 +02:00
Tord Romstad
ae49677446 Fixed a bug in PV extraction from the transposition table: The
previous used move_is_legal to verify that the move from the TT
was legal, and the old version of move_is_legal only works when
the side to move is not in check. Fixed this by adding a separate,
slower version of move_is_legal which works even when the side to
move is in check.
2009-08-06 18:07:32 +02:00
Tord Romstad
2fff532f4e Moved the code for extracting the PV from the TT to tt.cpp, where
it belongs.
2009-08-06 14:02:53 +02:00
Tord Romstad
da854fe83a Added a new function build_pv(), which extends a PV by walking
down the transposition table.

When the search was stopped before a fail high at the root was
resolved, Stockfish would often print a very short PV, sometimes
consisting of just a single move. This was not only a little
user-unfriendly, but also harmed the strength a little in
ponder-on games: Single-move PVs mean that there is no ponder
move to search.

It is perhaps worth considering to remove the pv[][] array
entirely, and always build the entire PV from the transposition
table. This would simplify the source code somewhat and probably
make the program infinitesimally faster, at the expense of
sometimes getting shorter PVs or PVs with rubbish moves near
the end.
2009-08-06 13:27:49 +02:00