1
0
Fork 0
mirror of https://github.com/sockspls/badfish synced 2025-04-30 16:53:09 +00:00
Commit graph

651 commits

Author SHA1 Message Date
Marco Costalba
bfd4421f49 Better naming and document some endgame functions
In particular the generic scaling functions.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-14 08:19:55 +01:00
Marco Costalba
fd12e8cb23 Finally fix prefetch on Linux
It was due to a missing -msse compiler option !

Without this option the CPU silently discards
prefetcht2 instructions during execution.

Also added a (gcc documented) hack to prevent Intel
compiler to optimize away the prefetches.

Special thanks to Heinz for testing and suggesting
improvments. And for Jim for testing icc on Windows.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-14 08:13:42 +01:00
Marco Costalba
166c09a7a0 Reuse 5 slots instead of 4
But this time with the guarantee of an always aligned
access so that prefetching is not adversely impacted.

On Joona PC
1+0, 64Mb hash:

Orig - Mod: 174 - 237 - 359

Instead after 1000 games at 1+0 with 128MB hash size
we are at + 1 ELO (just 4 games of difference).

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-14 08:13:13 +01:00
Marco Costalba
8d369600ec Double prefetch on Windows
After fixing the cpu frequency with RightMark tool I was
able to test speed all the different prefetch combinations.

Here the results:

OS Windows Vista 32bit, MSVC compile
CPU Intecl Core 2 Duo T5220 1.55 GHz
bench on depth 12, 1 thread, 26552844 nodes searched
results in nodes/sec

no-prefetch
402486, 402005, 402767, 401439, 403060

single prefetch (aligned 64)
410145, 409159, 408078, 410443, 409652

double prefetch (aligned 64) 0+32
414739, 411238, 413937, 414641, 413834

double prefetch (aligned 64) 0+64
413537, 414337, 413537, 414842, 414240

And now also some crazy stuff:

single prefetch (aligned 128)
410145, 407395, 406230, 410050, 409949

double prefetch (aligned 64) 0+0
409753, 410044, 409456

single prefetch (aligned 64) +32
408379, 408272, 406809

single prefetch (aligned 64) +64
408279, 409059, 407395

So it seems the best is a double prefetch at the addres + 32 or +64,
I will choose the second one because it seems more natural to me.

It is still a mystery why it doesn't work under Linux :-(

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-10 22:35:08 +01:00
Marco Costalba
f4140ecc0c Avoid Intel compiler optimizes away prefetching
Without this hack Intel compiler happily optimizes
away the gcc builtin call.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-10 13:49:12 +01:00
Marco Costalba
60b5da4cc8 Use aligned prefetch address
Prefetch always form a chache line boundary. It seems
that if prefetch address is not cache line aligned then
performance is adversely impacted.

Hopefully we will resuse that 32 bits of padding for something
useful in the future.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-10 13:49:00 +01:00
Marco Costalba
55c46b2399 Remove old BishopPairBonus constants
Now that we have poly imbalance these ones
are no more used.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-10 13:47:39 +01:00
Marco Costalba
76ae0e36be Enable prefetch also for gcc
This fix a compile error under Linux with gcc when
there aren't the intel dev libraries.

Also simplify the previous patch moving TT definition
from search.cpp to tt.cpp so to avoid using passing a
pointer to TT to the current position.

Finally simplify do_move(), now we miss a prefetch in the
rare case of setting an en-passant square but code is
much cleaner and performance penalty is almost zero.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-10 01:42:35 +01:00
Marco Costalba
4251eac860 Try to prefetch as soon as position key is ready
Move prefetching code inside do_move() so to allow a
very early prefetching and to put as many instructions
as possible between prefetching and following retrieve().

With this patch retrieve() times are cutted of another 25%

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-09 16:45:37 +01:00
Marco Costalba
cd4604b05c Add TT prefetching support
TT.retrieve() is the most time consuming function
because almost always involves a very slow RAM access.

TT table is so big that is never cached. This patch
prefetches TT data just after a move is done, so that
subsequent TT.retrieve will be very  fast.

Profiling with VTune shows that TT:retrieve() times are
almost cutted in half !

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-09 14:18:15 +01:00
Marco Costalba
e6863f46de Use 5 TTEntry slots instead of 4
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-09 04:42:26 +01:00
Marco Costalba
6f1475b6fc Use 32 bit key in TT
Shrink key to 32 bits instead of 64. To still avoid
collisions use the high 32 bits of position key as the
TT key and the low 32 bits to retrieve the correct
cluster index in the table.

With this patch size og TTentry shrinks to 96 bits instead
of 128 and the cluster of 4 TTEntry sums to 48 bytes instead
of 64.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-09 04:42:07 +01:00
Marco Costalba
4a777954e1 Makefile: added 'make strip' target
Binaries are always built with symbol table in to easy
debugging and profiling.

It is now possible to run:

make strip

To remove symbol table from the compiled binary. This
could be useful to prepare the release version.

Patch by Heinz van Saanen.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-08 17:37:13 +01:00
Marco Costalba
54382f8b07 Let LMR at root be independent of MultiPV value
Current formula enable LMR when

i + MultiPV >= LMRPVMoves

It means that, for instance, if MultiPV == 1 then LMR
will be started to be considered at move i = LMRPVMoves - 1,
while if MultiPV == 3 then it will start before,
at move i = LMRPVMoves - 3.

With this patch the formula becomes

i >= MultiPV + LMRPVMoves - 2

So that LMR will always start after LMRPVMoves - 1 moves
from the last PV move.

No functional change when MultiPV == 1

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-08 17:30:46 +01:00
Marco Costalba
339bb8a524 Speed up polynomial material imbalance loop
Access pos.piece_count() only once and avoid some
branches in the inner loop.

Profiling with VTune shows a 20% speed improvement in
get_material_info(), and it is also a bit more cleaned
up this way ;-)

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-08 14:12:04 +01:00
Marco Costalba
aa925a0e29 There is no need to special case KNNK ending
It is always draw, so use the corresponding proper
evaluation function.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-08 13:10:10 +01:00
Marco Costalba
23ceb66950 Move halfOpenFiles[] calculation out of a loop
And put it in an already existing one so to
optimze a bit.

Also additional cleanups and code shuffles
all around the place.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-08 09:21:42 +01:00
Marco Costalba
565d12bf42 Compile without DEBUG flag by default
And build also symbol table. It can easily stripped
after .exe is done and it is necessary for profiling.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-08 09:21:29 +01:00
Marco Costalba
00eab73399 Revert material balance values after 100000 games
After Joona's direct testing with ~2000 games it seems
values after 100.000 games does not give any advantage,
so revert for now.

Score of Stockfish_0 vs Stockfish_15: 491 - 392 - 1102
Score of Stockfish_0 vs Stockfish_40: 461 - 439 - 1076
Score of Stockfish_0 vs Stockfish_65: 442 - 518 - 1018 (13 elo)
Score of Stockfish_0 vs Stockfish_100: 504 - 502 - 984

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-08 03:49:49 +01:00
Joona Kiiski
5be3d98d17 Do not adjust Minimum Split Depth automatically
Currently minimum split depth is set automatically to 6
when number of CPUs is more than 4. I believe this is a bad
idea since for example my quad (4CPU with hyperthreading) is
detected as 8CPU computer. I've manually lowered down the number
of Threads, but so far I have played all games with Minimum
Split Depth set to 6!

Since 4CPU computers with hyperthreading are quite common and
8 CPU computers extremely rear (I expect we can get a direct
jump to 16 or 32 cores), this automatic adjusting is likely
to do more harm than good. Add a note in Readme.txt, so that
those rear 8CPU owners can manually tweak the "Minimum Split
Depth" parameter

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-08 03:36:20 +01:00
Marco Costalba
5b3fcab1ad Polished Makefile for *nix
Greately improved Makefile from Heinz van Saanen

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-08-08 03:30:27 +01:00
Tord Romstad
977ca40d6d Supply the "upperbound" and "lowerbound" parameters in UCI search
output when the score is outside the root window.
2009-08-07 16:26:24 +02:00
Tord Romstad
ae49677446 Fixed a bug in PV extraction from the transposition table: The
previous used move_is_legal to verify that the move from the TT
was legal, and the old version of move_is_legal only works when
the side to move is not in check. Fixed this by adding a separate,
slower version of move_is_legal which works even when the side to
move is in check.
2009-08-06 18:07:32 +02:00
Tord Romstad
2fff532f4e Moved the code for extracting the PV from the TT to tt.cpp, where
it belongs.
2009-08-06 14:02:53 +02:00
Tord Romstad
da854fe83a Added a new function build_pv(), which extends a PV by walking
down the transposition table.

When the search was stopped before a fail high at the root was
resolved, Stockfish would often print a very short PV, sometimes
consisting of just a single move. This was not only a little
user-unfriendly, but also harmed the strength a little in
ponder-on games: Single-move PVs mean that there is no ponder
move to search.

It is perhaps worth considering to remove the pv[][] array
entirely, and always build the entire PV from the transposition
table. This would simplify the source code somewhat and probably
make the program infinitesimally faster, at the expense of
sometimes getting shorter PVs or PVs with rubbish moves near
the end.
2009-08-06 13:27:49 +02:00
Tord Romstad
a1096e55cf Initial work towards adjustable playing strength.
Added the UCI_LimitStrength and the UCI_Elo options, with an Elo
range of 2100-2900. When UCI_LimitStrength is enabled, the number
of threads is set to 1, and the search speed is slowed down according
to the chosen Elo level.

Todo:

1. Implement Elo levels below 2100 by blundering on purpose and/or
   crippling the evaluation.
2. Automatically calibrate the maximum Elo by measuring the CPU speed
   during program initialization, perhaps by doing some bitboard
   computations and measuring the time taken.

No functional change when UCI_LimitStrength is false (the default).
2009-08-04 11:31:25 +02:00
Tord Romstad
dad632ce5b Added LMR at the root.
After 2000 games at 1+0

Mod vs Orig +534 =1033 -433 52.525%  1050.5/2000  +18 ELO
2009-08-03 09:08:59 +02:00
Joona Kiiski
2f7723fd44 Remove useless mate value special handling in null search
After 1200 games (1CPU), time control 1+0:

Mod vs Orig: +331 =564 -277  +16 ELO

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-07-26 18:55:17 +01:00
Marco Costalba
152f3b13b7 Yet another small touch to endgame functions handling
It is like a never finished painting. Everyday a little touch
more.

But this time it is very little ;-)

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-07-26 17:42:48 +01:00
Marco Costalba
bb1b049b83 Remove unused members in Application class
Also rearrange a bit the remining methods.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-07-26 16:11:20 +01:00
Marco Costalba
50f92bed06 Fix a spurious extra space
This morning it seems there is nothing better to do...

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-07-26 09:07:42 +01:00
Marco Costalba
bdb586ac2b Micro optimize extension() in search.cpp
Small micro-optimization in this very
time critical function.

Use bitwise 'or' instead of logic 'or' to avoid branches
in the assembly and use the result to skip an handful of checks.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-07-25 16:48:28 +01:00
Marco Costalba
1b0303b6e9 Polynomial material balance after 100.000 games
Verified it is equivalent to the tuning branch results
with parameter values sampled after 100.000 games.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-07-24 14:26:49 +01:00
Marco Costalba
5f232e0667 Revert Makefile changes
Some unwanted changes to Makefile slept in in patch
"Introduced the UCI_AnalyseMode option".

Revert them. No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-07-24 14:18:03 +01:00
Marco Costalba
080a4995a3 Simplify king shelter cache handling
This is more similar to how get_material_info() and
get_pawn_info() work and also removes some clutter from
evaluate_king().

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-07-24 14:13:13 +01:00
Marco Costalba
20224a5bbf Delay costly SEE call during captures ordering in MovePicker
When ordering moves we push all captures with negative SEE values
to badCaptures[] array during the scoring phase.

This patch delays the costly SEE call up to when the move has been
picked up in pick_move_from_list(), this way we save some SEE calls
in case we get a cutoff.

It seems we have a speed gain of about 1-1.5 % in terms of nodes/sec
and profiling seems to confirm the small but real speed increase.

Idea from Pablo Vazquez on talkchess.com
http://www.talkchess.com/forum/viewtopic.php?t=29018&start=20

It would be a no functional change but actually it is not because
now sorting set is different and so std::sort(), that is not a
stable sort, does not guarantees the order of same scored moves to
remain the same as before.

After 952 games at 1+0 we are below error bar, almost equal just
6 games of difference (+2 ELO)

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-07-24 14:12:33 +01:00
Marco Costalba
8654fee18c Microptimization in do_evaluate()
Do not call count_1s_max_15() if not necessary, as is
not in the common case (>95%).

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-07-23 22:01:42 +01:00
Marco Costalba
8b45b60327 Use do_move_bb() helpers when doing a castle
Small cleanup.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-07-23 10:43:58 +01:00
Marco Costalba
044ad593b3 Add Tord's polynomial material balance
Use a polynomial weighted evaluation to calculate
material value.

This is far more flexible and elegant then applying
a series of single euristic rules as before.

Also correct a design issue in which we returned two
values, one for middle game and one for endgame, while
instead, because game phase is a function of board
material itself, only one value should be calculated and
used both for mid and end game.

Verified it is equivalent to the tuning branch results with
parameter values sampled after 40.000 games.

After 999 games at 1+0

Mod vs Orig +277 =482 -240 51.85%  518.0/999  +13 ELO

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-07-23 00:03:30 +01:00
Marco Costalba
5600d91cff Rename int32 in int32_t
To use the same naming rule of the other types and
to be compatible with inttypes.h, used under Linux.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-07-20 10:53:41 +01:00
Marco Costalba
1cc44bcaae Correctly set mateThreat in search()
We do not accept null search returned mate values,
but we always do a full search in those cases.

So the variable mateThreat that is set only if null move
search returns a mate value is always false.

Restore the functionality of mateThreat moving the
assignement where it can be triggered.

After 999 games at 1+0

Mod vs Orig +253 =517 -229 51.20%  +8 ELO

Bug reported by xiaozhi

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-07-20 08:05:48 +01:00
Marco Costalba
15eb59683e Use increased LMR horizont also in PV search
Tord says that using a lower horizon at PV nodes
looks strange and inconsistent with the general
philosophy of our search (i.e. always being more
conservative at PV nodes). So set LMR at 3 also
on search_pv().

Test result after 601 games seems to confirm this.

Mod vs Orig +156 =318 -127 52.41%  315.0/601  +17 ELO

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-07-18 12:47:37 +02:00
Marco Costalba
620cfbb676 Reintroduce null move dynamic reduction
Test extension of LMR horizon to 3 plies alone, without
touching null move search. To keep the patch minimal we still
don't change LMR horizon in PV search. This will be the object
of the next patch.

Result seems good after 998 games:

Mod vs Orig  +252/=518/-228 51.20%  511.0/998 +8 ELO

So dynamic null move reduction seems a bit stronger then
fixed reduction even with LMR horizon set to 3.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-07-18 06:08:06 +01:00
Marco Costalba
fe523b2d18 Use increased LMR horizont only after a null move
Revert to LMR horizont of 2 plies. Only if parent move
is a null move increase to 3 so to avoid the bad combination
of null move reduction + LMR reduction. This is a more
aggressive patch then previous one, but it seems we are
going in the wromg direction.

After 531 games result is not good:

Mod vs Orig  +123/=265/-143 48.12%  255.5/531  -13 ELO

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-07-18 06:08:02 +01:00
Marco Costalba
2a203d8d6f Combine increased LMR horizont and fixed null move reduction
Set null move reduction to R=4, but increase the LMR horizon
to 3 plies. The two tweaks are related and should compensate
the combined effect of null move + LMR reduction at shallow
depths.

Idea from Tord.

After 999 games at 1+0

Mod vs Orig  +251 =522 -225 51.30% + 9 ELO

On Tord iMac Core 2 Duo 2.8 GHz, one thread,
Mac OS X 10.6, at 1+0 time control we have:

Mod vs Orig 994-1006  -1.4 ELO

But Orig version is pgo compiled and Mod is not.
The PGO compiled version is about 8% faster, which
corresponds to about 7 Elo points. This means that
results are reasonably consistent.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-07-18 06:07:58 +01:00
Tord Romstad
b8326edea3 Introduced the UCI_AnalyseMode option, and made the evaluation function
symmetrical in analyse mode.

No functional change when playing games.
2009-07-17 22:26:01 +02:00
Marco Costalba
20e8738901 Fix two compile errors in new endgame code
Code that compiles cleanly under MSVC triggers one
compile error (correct) under Intel C++ and two(!)
under gcc.

The first is the same complained by Intel, but the second
is an interesting corner case of C++ standard (there are many)
that is correctly spotted only by gcc.

Both MSVC and Intel pass this silently, probably to avoid
breaking people code.

Now we are fully C++ compliant ;-)

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-07-17 19:29:25 +01:00
Marco Costalba
b3b1d3aaa7 Move constant bitboard arrays from header to cpp file
This avoid to duplicate storage allocation for every file
where they are used.

Note that simple numeric constant can remain in header because
are automatically folded by the compiler.

Patch suggested by Tord.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-07-17 16:25:53 +01:00
Marco Costalba
0d69ac33ff Remove even more redundancy in endgame functions handling
Push on the templatization even more to chip out some code
and take the opportunity to show some neat template trick ;-)

Ok. I would say we can stop here now....it is quickly becoming
a style exercise but we are not boost developers so give it a stop.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2009-07-17 16:05:19 +01:00
Tord Romstad
342c8c883c Removed an incorrect assert() statement in search.cpp, which asserted that
a static eval cached in the transposition table would always equal the static
eval of the current position. This is in general not true, because the cached
value could be from a previous search with different evaluation parameter
settings, or from a search from the opposite side (Stockfish's evaluation
function is assymmetric by default).
2009-07-17 09:12:59 +02:00