1
0
Fork 0
mirror of https://github.com/sockspls/badfish synced 2025-05-02 09:39:36 +00:00
Commit graph

110 commits

Author SHA1 Message Date
Joost VandeVondele
c4d67d77c9 Update copyright years
No functional change
2021-01-08 17:04:23 +01:00
Tomasz Sobczyk
3f6451eff7 Manually align arrays on the stack
as a workaround to issues with overaligned alignas() on stack variables in gcc < 9.3 on windows.

closes https://github.com/official-stockfish/Stockfish/pull/3217

fixes #3216

No functional change
2020-11-04 19:52:42 +01:00
Sami Kiminki
485d517c68 Add large page support for NNUE weights and simplify TT mem management
Use TT memory functions to allocate memory for the NNUE weights. This
should provide a small speed-up on systems where large pages are not
automatically used, including Windows and some Linux distributions.

Further, since we now have a wrapper for std::aligned_alloc(), we can
simplify the TT memory management a bit:

- We no longer need to store separate pointers to the hash table and
  its underlying memory allocation.
- We also get to merge the Linux-specific and default implementations
  of aligned_ttmem_alloc().

Finally, we'll enable the VirtualAlloc code path with large page
support also for Win32.

STC: https://tests.stockfishchess.org/tests/view/5f66595823a84a47b9036fba
LLR: 2.94 (-2.94,2.94) {-0.25,1.25}
Total: 14896 W: 1854 L: 1686 D: 11356
Ptnml(0-2): 65, 1224, 4742, 1312, 105

closes https://github.com/official-stockfish/Stockfish/pull/3081

No functional change.
2020-09-21 08:43:48 +02:00
Stéphane Nicolet
406979ea12 Embed default net, and simplify using non-default nets
covers the most important cases from the user perspective:

It embeds the default net in the binary, so a download of that binary will result
in a working engine with the default net. The engine will be functional in the default mode
without any additional user action.

It allows non-default nets to be used, which will be looked for in up to
three directories (working directory, location of the binary, and optionally a specific default directory).
This mechanism is also kept for those developers that use MSVC,
the one compiler that doesn't have an easy mechanism for embedding data.

It is possible to disable embedding, and instead specify a specific directory, e.g. linux distros might want to use
CXXFLAGS="-DNNUE_EMBEDDING_OFF -DDEFAULT_NNUE_DIRECTORY=/usr/share/games/stockfish/" make -j ARCH=x86-64 profile-build

passed STC non-regression:
https://tests.stockfishchess.org/tests/view/5f4a581c150f0aef5f8ae03a
LLR: 2.95 (-2.94,2.94) {-1.25,-0.25}
Total: 66928 W: 7202 L: 7147 D: 52579
Ptnml(0-2): 291, 5309, 22211, 5360, 293

closes https://github.com/official-stockfish/Stockfish/pull/3070

fixes https://github.com/official-stockfish/Stockfish/issues/3030

No functional change.
2020-08-29 21:56:00 +02:00
Joost VandeVondele
5f1843c9cb Small trivial cleanups
closes https://github.com/official-stockfish/Stockfish/pull/2801

No functional change
2020-08-23 01:53:41 +02:00
nodchip
84f3e86790 Add NNUE evaluation
This patch ports the efficiently updatable neural network (NNUE) evaluation to Stockfish.

Both the NNUE and the classical evaluations are available, and can be used to
assign a value to a position that is later used in alpha-beta (PVS) search to find the
best move. The classical evaluation computes this value as a function of various chess
concepts, handcrafted by experts, tested and tuned using fishtest. The NNUE evaluation
computes this value with a neural network based on basic inputs. The network is optimized
and trained on the evalutions of millions of positions at moderate search depth.

The NNUE evaluation was first introduced in shogi, and ported to Stockfish afterward.
It can be evaluated efficiently on CPUs, and exploits the fact that only parts
of the neural network need to be updated after a typical chess move.
[The nodchip repository](https://github.com/nodchip/Stockfish) provides additional
tools to train and develop the NNUE networks.

This patch is the result of contributions of various authors, from various communities,
including: nodchip, ynasu87, yaneurao (initial port and NNUE authors), domschl, FireFather,
rqs, xXH4CKST3RXx, tttak, zz4032, joergoster, mstembera, nguyenpham, erbsenzaehler,
dorzechowski, and vondele.

This new evaluation needed various changes to fishtest and the corresponding infrastructure,
for which tomtor, ppigazzini, noobpwnftw, daylen, and vondele are gratefully acknowledged.

The first networks have been provided by gekkehenker and sergiovieri, with the latter
net (nn-97f742aaefcd.nnue) being the current default.

The evaluation function can be selected at run time with the `Use NNUE` (true/false) UCI option,
provided the `EvalFile` option points the the network file (depending on the GUI, with full path).

The performance of the NNUE evaluation relative to the classical evaluation depends somewhat on
the hardware, and is expected to improve quickly, but is currently on > 80 Elo on fishtest:

60000 @ 10+0.1 th 1
https://tests.stockfishchess.org/tests/view/5f28fe6ea5abc164f05e4c4c
ELO: 92.77 +-2.1 (95%) LOS: 100.0%
Total: 60000 W: 24193 L: 8543 D: 27264
Ptnml(0-2): 609, 3850, 9708, 10948, 4885

40000 @ 20+0.2 th 8
https://tests.stockfishchess.org/tests/view/5f290229a5abc164f05e4c58
ELO: 89.47 +-2.0 (95%) LOS: 100.0%
Total: 40000 W: 12756 L: 2677 D: 24567
Ptnml(0-2): 74, 1583, 8550, 7776, 2017

At the same time, the impact on the classical evaluation remains minimal, causing no significant
regression:

sprt @ 10+0.1 th 1
https://tests.stockfishchess.org/tests/view/5f2906a2a5abc164f05e4c5b
LLR: 2.94 (-2.94,2.94) {-6.00,-4.00}
Total: 34936 W: 6502 L: 6825 D: 21609
Ptnml(0-2): 571, 4082, 8434, 3861, 520

sprt @ 60+0.6 th 1
https://tests.stockfishchess.org/tests/view/5f2906cfa5abc164f05e4c5d
LLR: 2.93 (-2.94,2.94) {-6.00,-4.00}
Total: 10088 W: 1232 L: 1265 D: 7591
Ptnml(0-2): 49, 914, 3170, 843, 68

The needed networks can be found at https://tests.stockfishchess.org/nns
It is recommended to use the default one as indicated by the `EvalFile` UCI option.

Guidelines for testing new nets can be found at
https://github.com/glinscott/fishtest/wiki/Creating-my-first-test#nnue-net-tests

Integration has been discussed in various issues:
https://github.com/official-stockfish/Stockfish/issues/2823
https://github.com/official-stockfish/Stockfish/issues/2728

The integration branch will be closed after the merge:
https://github.com/official-stockfish/Stockfish/pull/2825
https://github.com/official-stockfish/Stockfish/tree/nnue-player-wip

closes https://github.com/official-stockfish/Stockfish/pull/2912

This will be an exciting time for computer chess, looking forward to seeing the evolution of
this approach.

Bench: 4746616
2020-08-06 16:37:45 +02:00
mstembera
1ea488d34c Use 128 bit multiply for TT index
Remove super cluster stuff from TT and just use a 128 bit multiply.

STC https://tests.stockfishchess.org/tests/view/5ee719b3aae8aec816ab7548
LLR: 2.94 (-2.94,2.94) {-1.50,0.50}
Total: 12736 W: 2502 L: 2333 D: 7901
Ptnml(0-2): 191, 1452, 2944, 1559, 222

LTC https://tests.stockfishchess.org/tests/view/5ee732d1aae8aec816ab7556
LLR: 2.93 (-2.94,2.94) {-1.50,0.50}
Total: 27584 W: 3431 L: 3350 D: 20803
Ptnml(0-2): 173, 2500, 8400, 2511, 208

Scheme back to being derived from https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/

Also the default optimized version of the index calculation now uses fewer instructions.
https://godbolt.org/z/Tktxbv
Might benefit from mulx (requires -mbmi2)

closes https://github.com/official-stockfish/Stockfish/pull/2744

bench: 4320954
2020-06-17 07:32:16 +02:00
Sami Kiminki
beb327f910 Fix a Windows-only crash on exit without 'quit'
There was a bug in commit d4763424d2
(Add support for Windows large pages) that could result in trying to
free memory allocated with VirtualAlloc incorrectly with free().

Fix this by reverting the TT.resize(0) logic in the previous commit,
and instead, just call aligned_ttmem_free() in
TranspositionTable::~TranspositionTable().

fixes https://github.com/official-stockfish/Stockfish/issues/2677

closes https://github.com/official-stockfish/Stockfish/pull/2679

No functional change
2020-05-14 20:35:40 +02:00
Sami Kiminki
d4763424d2 Add support for Windows large pages
for users that set the needed privilige "Lock Pages in Memory"
large pages will be automatically enabled (see Readme.md).

This expert setting might improve speed, 5% - 30%, depending
on the hardware, the number of threads and hash size. More for
large hashes, large number of threads and NUMA. If the operating
system can not allocate large pages (easier after a reboot), default
allocation is used automatically. The engine log provides details.

closes https://github.com/official-stockfish/Stockfish/pull/2656

fixes https://github.com/official-stockfish/Stockfish/issues/2619

No functional change
2020-05-13 20:57:47 +02:00
Gary Heckman
37e3863927 Fix ambiguity between clamp implementations
There is an ambiguity between global and std clamp implementations when compiling in c++17,
and on certain toolchains that are not strictly conforming to c++11.
This is solved by putting our clamp implementation in a namespace.

closes https://github.com/official-stockfish/Stockfish/pull/2572

No functional change.
2020-03-07 11:14:27 +01:00
Joost VandeVondele
0c878adb36 Small cleanups.
closes https://github.com/official-stockfish/Stockfish/pull/2532

Bench: 4869669
2020-02-05 15:32:29 +01:00
Sami Kiminki
39437f4e55 Advise the kernel to use huge pages (Linux)
Align the TT allocation by 2M to make it huge page friendly and advise the
kernel to use huge pages.

Benchmarks on my i7-8700K (6C/12T) box: (3 runs per bench per config)

                    vanilla (nps)               hugepages (nps)              avg
==================================================================================
bench             | 3012490  3024364  3036331   3071052  3067544  3071052    +1.5%
bench 16 12 20    | 19237932 19050166 19085315  19266346 19207025 19548758   +1.1%
bench 16384 12 20 | 18182313 18371581 18336838  19381275 19738012 19620225   +7.0%

On my box, huge pages have a significant perf impact when using a big
hash size. They also speed up TT initialization big time:

                                  vanilla (s)  huge pages (s)  speed-up
=======================================================================
time stockfish bench 16384 1 1  | 5.37         1.48            3.6x

In practice, huge pages with auto-defrag may always be enabled in the
system, in which case this patch has no effect. This
depends on the values in /sys/kernel/mm/transparent_hugepage/enabled
and /sys/kernel/mm/transparent_hugepage/defrag.

closes https://github.com/official-stockfish/Stockfish/pull/2463

No functional change
2020-01-27 11:16:10 +01:00
Stéphane Nicolet
9f800a2577 Show compiler info at startup
This patch shows a description of the compiler used to compile Stockfish,
when starting from the console.

Usage:

```
./stockfish
compiler
```

Example of output:

```
Stockfish 120120 64 POPCNT by T. Romstad, M. Costalba, J. Kiiski, G. Linscott

Compiled by clang++ 9.0.0 on Apple
 __VERSION__ macro expands to: 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.38)
```

No functional change
2020-01-12 11:54:15 +01:00
Alain SAVARD
09bef14c76 Update lists of authors and contributors
Preparing for version 11 of Stockfish: update lists of authors,
contributors giving CPU time to the fishtest framework, etc.

No functional change
2020-01-09 01:43:47 +01:00
Stéphane Nicolet
8726beba59 Restore development version (revert previous commit)
Revert the previous patch now that the binary for the super-final
of TCEC season 16 has been sent.

Maybe the feature of showing the name of compiler will be added to the
master branch in the future. But we may use a cleaner way to code it, see
some ideas using the Makefile approach at the end of pull request #2327 :
https://github.com/official-stockfish/Stockfish/pull/2327

Bench: 3618154
2019-09-26 23:27:48 +02:00
Stéphane Nicolet
0436f01d05 Temporary patch to show the compiler for TCEC submission
This patch shows a description of the compiler used to compile Stockfish,
when starting from the console.

Usage:

```
./stockfish
compiler
```

Example of output:

```
Stockfish 240919 64 POPCNT by T. Romstad, M. Costalba, J. Kiiski, G. Linscott

Compiled by clang++ 9.0.0 on Apple
 __VERSION__ macro expands to: 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.38)
```

No functional change
2019-09-25 22:28:51 +02:00
Marco Costalba
4e72e2a964 Assorted trivial cleanups 4/2019
No functional change.
2019-05-02 19:30:26 +02:00
Marco Costalba
82ad9ce9cf
Assorted trivial cleanups 3/2019 (#2030)
No functional change.
2019-03-31 11:47:36 +02:00
Stéphane Nicolet
cf5d683408 Stockfish 10-beta
Preparation commit for the upcoming Stockfish 10 version, giving a chance to catch last minute feature bugs and evaluation regression during the one-week code freeze period. Also changing the copyright dates to include 2019.

No functional change
2018-11-19 11:18:21 +01:00
Stéphane Nicolet
a03e98dcd3 Switch time management to 64 bits
This is a patch to fix issue #1498, switching the time management variables
to 64 bits to avoid overflow of time variables after 25 days.

There was a bug in Stockfish 9 causing the output to be wrong after
2^31 milliseconds search. Here is a long run from the starting position:

info depth 64 seldepth 87 multipv 1 score cp 23 nodes 13928920239402
nps 0 tbhits 0 time -504995523 pv g1f3 d7d5 d2d4 g8f6 c2c4 d5c4 e2e3 e7e6 f1c4
c7c5 e1g1 b8c6 d4c5 d8d1 f1d1 f8c5 c4e2 e8g8 a2a3 c5e7 b2b4 f8d8 b1d2 b7b6 c1b2
c8b7 a1c1 a8c8 c1c2 c6e5 d1c1 c8c2 c1c2 e5f3 d2f3 a7a5 b4b5 e7c5 f3d4 d8c8 d4b3
c5d6 c2c8 b7c8 b3d2 c8b7 d2c4 d6c5 e2f3 b7d5 f3d5 e6d5 c4e5 a5a4 e5d3 f6e4 d3c5
e4c5 b2d4 c5e4 d4b6 e4d6 g2g4 d6b5 b6c5 b5c7 g1g2 c7e6 c5d6 g7g6

We check at compile time that the TimePoint type is exactly 64 bits long for
the compiler (TimePoint is our alias in Stockfish for std::chrono::milliseconds
-- it is a signed integer type of at least 45 bits according to the C++ standard,
but will most probably be implemented as a 64 bits signed integer on modern
compilers), and we use this TimePoint type consistently across the code.

Bug report by user "fischerandom" on the TCEC chat (thanks), and the
patch includes code and suggestions by user "WOnder93" and Ronald de Man.

Fixes issue:          https://github.com/official-stockfish/Stockfish/issues/1498
Closes pull request:  https://github.com/official-stockfish/Stockfish/pull/1510

No functional change.
2018-03-27 16:25:41 +02:00
Joost VandeVondele
9afa1d7330 New Year 2018
Adjust copyright headers.

No functional change.
2018-01-01 13:18:10 +01:00
mstembera
d01b66ae8f Fix pawn entry prefetch
No functional change

Closes #1026
2017-03-14 20:56:26 -07:00
Joost VandeVondele
d8f683760c Adjust copyright headers to 2017 (#965)
No functional change.
2017-01-11 08:46:29 +01:00
Marco Costalba
0d9a9f5e98 Handle Windows Processors Groups
Under Windows it is not possible for a process to run on more than one
logical processor group. This usually means to be limited to use max 64
cores. To overcome this, some special platform specific API should be
called to set group affinity for each thread. Original code from Texel by
Peter sterlund.

Tested by Jean-Paul Vael on a Xeon E7-8890 v4 with 88 threads and confimed
speed up between 44 and 88 threads is about 30%, as expected.

No functional change.
2016-11-22 07:56:04 +01:00
lucasart
126036abb0 Do not hardcode Debug Log File
Allow to specifiy the log file name, this comes
handy in case of self-matches so that each SF
instance writes into a different log file.

No functional change.
2016-06-15 08:47:08 +02:00
ppigazzini
d4af15f682 Update AUTHORS and copyright notice
No functional change

Resolves #555
2016-01-02 09:43:51 +00:00
Marco Costalba
9742fb10fd Update Copyright year
No functional change.

Resolves #554
2016-01-01 10:17:36 +00:00
Marco Costalba
0b36ba74fc Don't assume the type of Time::point
But instead use the proper definition. Also
rewrite chrono functions while there.

No functional change.
2015-02-24 14:08:14 +01:00
Marco Costalba
99c9cae586 Avoid casting to char* in prefetch()
Funny enough, gcc __builtin_prefetch() expects
already a void*, instead Windows's _mm_prefetch()
requires a char*.

The patch allows to remove ugly casts from caller
sites.

No functional change.
2015-02-07 19:13:41 +01:00
Marco Costalba
8b0fee9998 Rename dbg_hit_on_c() to dbg_hit_on()
Use an overload instead of a new named function.

I have found this handier and easier when adding
some quick debug code.

No functional change.
2015-02-07 11:15:38 +01:00
Marco Costalba
96e36a7897 Explicitly defaulted and deleted members
Better than a bit obscure implicit ones.

No functional change.
2015-01-21 13:18:19 +01:00
Marco Costalba
3c07603dac Import C++11 branch
Import C++11 branch from:

https://github.com/mcostalba/Stockfish/tree/c++11

The version imported is teh last one as of today:
6670e93e50

Branch is fully equivalent with master but syzygy
tablebases that are missing (but will be added with
next commit).

bench: 8080602
2015-01-18 08:00:50 +01:00
Marco Costalba
42b48b08e8 Update copyright year
No functional change.
2015-01-10 11:46:28 +01:00
Marco Costalba
5943600a89 Assorted nitpicking code-style
No functional change.
2014-12-10 12:38:13 +01:00
Ernesto Gatti
158864270a Simpler PRNG and faster magics search
This patch replaces RKISS by a simpler and faster PRNG, xorshift64* proposed
by S. Vigna (2014). It is extremely simple, has a large enough period for
Stockfish's needs (2^64), requires no warming-up (allowing such code to be
removed), and offers slightly better randomness than MT19937.

Paper: http://xorshift.di.unimi.it/
Reference source code (public domain):
http://xorshift.di.unimi.it/xorshift64star.c

The patch also simplifies how init_magics() searches for magics:

- Old logic: seed the PRNG always with the same seed,
  then use optimized bit rotations to tailor the RNG sequence per rank.

- New logic: seed the PRNG with an optimized seed per rank.

This has two advantages:
1. Less code and less computation to perform during magics search (not ROTL).
2. More choices for random sequence tuning. The old logic only let us choose
from 4096 bit rotation pairs. With the new one, we can look for the best seeds
among 2^64 values. Indeed, the set of seeds[][] provided in the patch reduces
the effort needed to find the magics:

64-bit SF:
Old logic -> 5,783,789 rand64() calls needed to find the magics
New logic -> 4,420,086 calls

32-bit SF:
Old logic -> 2,175,518 calls
New logic -> 1,895,955 calls

In the 64-bit case, init_magics() take 25 ms less to complete (Intel Core i5).

Finally, when playing with strength handicap, non-determinism is achieved
by setting the seed of the static RNG only once. Afterwards, there is no need
to skip output values.

The bench only changes because the Zobrist keys are now different (since they
are random numbers straight out of the PRNG).

The RNG seed has been carefully chosen so that the
resulting Zobrist keys are particularly well-behaved:

1. All triplets of XORed keys are unique, implying that it
   would take at least 7 keys to find a 64-bit collision
   (test suggested by ceebo)

2. All pairs of XORed keys are unique modulo 2^32

3. The cardinality of { (key1 ^ key2) >> 48 } is as close
   as possible to the maximum (65536)

Point 2 aims at ensuring a good distribution among the bits
that determine an TT entry's cluster, likewise point 3
among the bits that form the TT entry's key16 inside a
cluster.

Details:

     Bitset   card(key1^key2)
     ------   ---------------
RKISS
     key16     64894   = 99.020% of theoretical maximum
     low18    180117   = 99.293%
     low32    305362   = 99.997%

Xorshift64*, old seed
     key16     64918   = 99.057%
     low18    179994   = 99.225%
     low32    305350   = 99.993%

Xorshift64*, new seed
     key16     65027   = 99.223%
     low18    181118   = 99.845%
     low32    305371   = 100.000%

Bench: 9324905

Resolves #148
2014-12-08 08:18:26 +08:00
hxim
fbb53524ef Rename some variables for more clarity.
No functional change.

Resolves #131
2014-12-08 07:53:33 +08:00
mstembera
bc83515c9e Removing some superfluous extern declarations
No functional change.

Resolves #93
2014-11-05 21:17:19 +00:00
Marco Costalba
ff480bdb83 Retire struct Log
No more used now that we have removed
"Write Search Log" UCI option.

No functional change.
2014-09-16 21:13:50 +01:00
Marco Costalba
e4695f15bc Additional renaming from DON
Assorted renaming and triviality.

No functional change.
2014-02-14 09:42:50 +01:00
Marco Costalba
41641e3b1e Assorted tweaks from DON
Mainly renames and some little code style improvment,
inspired by looking at DON sources:

https://github.com/erashid/DON

No functional change.
2014-02-09 17:31:45 +01:00
Marco Costalba
c9dcda6ac4 Update copyright year
No functional change.
2014-01-02 01:49:18 +01:00
Marco Costalba
f31847302d Streamline time computation
No functional change.
2013-08-03 18:30:43 +02:00
Joona Kiiski
a16ba5bbd1 Retire cpu_count()
Set threads number always to 1 at startup and let the
user explicitly to chose the number of threads.

Also preserve the useful behavior of automatically set
"Min Split Depth" according to the requested threads,
indeed this parameter is too technical for a casual user,
so, when left to zero, we set it on a sensible value.

No functional change
2013-08-02 16:48:25 +02:00
homoSapiensSapiens
002062ae93 Use #ifndef instead of #if !defined
And #ifdef instead of #if defined

This is more standard form (see for example iostream file).

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2013-07-24 19:49:17 +02:00
Marco Costalba
c5ec94d0f1 Update copyright year
No functional change.
2013-02-19 07:54:14 +01:00
Marco Costalba
b50ce5ebfb Get rid of struct Time
We just need the milliseconds of current system
time for our needs. This allows to simplify the
API.

No functional change.
2012-09-04 09:38:51 +02:00
Marco Costalba
5900ab76a0 Rename current_time() to now()
Follow C++11 naming conventions.

No functional change.
2012-09-02 17:04:22 +02:00
Marco Costalba
831f91b859 Retire Time::restart()
Simplify API.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2012-08-31 19:47:07 +02:00
Marco Costalba
1258c7aabe Don't need to memset HashTable
Default c'tor Entry() already initializes
to zero all its POD members.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2012-08-31 19:47:00 +02:00
Marco Costalba
92e759a676 Introduce serialization of accesses to std::cout
When many threds concurrently print you need to serialize
the access to std::cout to avoid output lines are intermixed
with the contents of each thread.

This is not strictly needed at the moment because
only main thread prints out, although some ad-hoc
test could trigger UCI::loop() printing while searching.

Anyhow we want to lift this pretty avoidable constrain
also as a prerequisite for future work.

This patch just introduces the support, next one will enable
the serialization.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
2012-08-29 19:11:31 +02:00