1
0
Fork 0
mirror of https://github.com/sockspls/badfish synced 2025-04-30 16:53:09 +00:00

Merge latest changes from nodchip repo

This commit is contained in:
Dariusz Orzechowski 2020-07-20 13:19:25 +02:00
commit 871e6b8c83
5 changed files with 187 additions and 19 deletions

63
README.md Normal file
View file

@ -0,0 +1,63 @@
<p align="center">
<img src="https://cdn.discordapp.com/attachments/724700045525647420/729135226365804594/SFNNUE2.png">
</p>
<h1 align="center">Stockfish NNUE</h1>
## Overview
Stockfish NNUE is a port of a shogi neural network named NNUE (efficiently updateable neural network backwards) to Stockfish 11. To learn more about the Stockfish chess engine, look [here](stockfish.md) for an overview and [here](https://github.com/official-stockfish/Stockfish) for the official repository.
## Training Guide
### Generating Training Data
Use the "no-nnue.nnue-gen-sfen-from-original-eval" binary. The given example is generation in its simplest form. There are more commands.
```
uci
setoption name Threads value x
setoption name Hash value y
setoption name SyzygyPath value path
isready
gensfen depth a loop b use_draw_in_training_data_generation 1 eval_limit 32000
```
Specify how many threads and how much memory you would like to use with the x and y values. The option SyzygyPath is not necessary, but if you would like to use it, you must first have Syzygy endgame tablebases on your computer, which you can find [here](http://oics.olympuschess.com/tracker/index.php). You will need to have a torrent client to download these tablebases, as that is probably the fastest way to obtain them. The path is the path to the folder containing those tablebases. It does not have to be surrounded in quotes.
This will save a file named "generated_kifu.bin" in the same folder as the binary. Once generation is done, rename the file to something like "1billiondepth12.bin" to remember the depth and quantity of the positions and move it to a folder named "trainingdata" in the same directory as the binaries.
#### Generation Parameters
- Depth is the searched depth per move, or how far the engine looks forward. This value is an integer.
- Loop is the amount of positions generated. This value is also an integer
### Generating Validation Data
The process is the same as the generation of training data, except for the fact that you need to set loop to 1 million, because you don't need a lot of validation data. The depth should be the same as before or slightly higher than the depth of the training data. After generation rename the validation data file to val.bin and drop it in a folder named "validationdata" in the same directory to make it easier.
### Training a Completely New Network
Use the "avx2.halfkp_256x2-32-32.nnue-learn.2020-07-11" binary. Create an empty folder named "evalsave" in the same directory as the binaries.
```
uci
setoption name SkipLoadingEval value true
setoption name Threads value x
isready
learn targetdir trainingdata loop 100 batchsize 1000000 use_draw_in_training 1 use_draw_in_validation 1 eta 1 lambda 1 eval_limit 32000 nn_batch_size 1000 newbob_decay 0.5 eval_save_interval 250000000 loss_output_interval 1000000 mirror_percentage 50 validation_set_file_name validationdata\val.bin
```
Nets get saved in the "evalsave" folder.
#### Training Parameters
- eta is the learning rate
- lambda is the amount of weight it puts to eval of learning data vs win/draw/loss results. 1 puts all weight on eval, lambda 0 puts all weight on WDL results.
### Reinforcement Learning
If you would like to do some reinforcement learning on your original network, you must first generate training data using the learn binaries. Make sure that your previously trained network is in the eval folder. Use the commands specified above. Make sure `SkipLoadingEval` is set to false so that the data generated is using the neural net's eval by typing the command `uci setoption name SkipLoadingEval value false` before typing the `isready` command. You should aim to generate less positions than the first run, around 1/10 of the number of positions generated in the first run. The depth should be higher as well. You should also do the same for validation data, with the depth being higher than the last run.
After you have generated the training data, you must move it into your training data folder and delete the older data so that the binary does not accidentally train on the same data again. Do the same for the validation data and name it to val-1.bin to make it less confusing. Make sure the evalsave folder is empty. Then, using the same binary, type in the training commands shown above. Do __NOT__ set `SkipLoadingEval` to true, it must be false or you will get a completely new network, instead of a network trained with reinforcement learning. You should also set eval_save_interval to a number that is lower than the amount of positions in your training data, perhaps also 1/10 of the original value. The validation file should be set to the new validation data, not the old data.
After training is finished, your new net should be located in the "final" folder under the "evalsave" directory. You should test this new network against the older network to see if there are any improvements.
## Using Your Trained Net
If you want to use your generated net, copy the net located in the "final" folder under the "evalsave" directory and move it into a new folder named "eval" under the directory with the binaries. You can then use the halfkp_256x2 binaries pertaining to your CPU with a standard chess GUI, such as Cutechess. Refer to the [releases page](https://github.com/nodchip/Stockfish/releases) to find out which binary is best for your CPU.
If the engine does not load any net file, or shows "Error! *** not found or wrong format", please try to sepcify the net with the full file path with the "EvalFile" option by typing the command `setoption name EvalFile value path` where path is the full file path.
## Resources
- [Stockfish NNUE Wiki](https://www.qhapaq.org/shogi/shogiwiki/stockfish-nnue/)
- [Training instructions](https://twitter.com/mktakizawa/status/1273042640280252416) from the creator of the Elmo shogi engine
- [Original Talkchess thread](http://talkchess.com/forum3/viewtopic.php?t=74059) discussing Stockfish NNUE
- [Guide to Stockfish NNUE](http://yaneuraou.yaneu.com/2020/06/19/stockfish-nnue-the-complete-guide/)
- [Unofficial Stockfish Discord](https://discord.gg/nv8gDtt)
A more updated list can be found in the #sf-nnue-resources channel in the Discord.

View file

@ -73,6 +73,7 @@ endif
# sse42 = yes/no --- -msse4.2 --- Use Intel Streaming SIMD Extensions 4.2
# avx2 = yes/no --- -mavx2 --- Use Intel Advanced Vector Extensions 2
# pext = yes/no --- -DUSE_PEXT --- Use pext x86_64 asm-instruction
# avx512 = yes/no --- -mavx512vbmi --- Use Intel Advanced Vector Extensions 512
#
# Note that Makefile is space sensitive, so when adding new architectures
# or modifying existing flags, you have to make sure there are no extra spaces
@ -86,11 +87,13 @@ bits = 64
prefetch = no
popcnt = no
sse = no
sse3 = no
ssse3 = no
sse41 = no
sse42 = no
avx2 = no
pext = no
avx512 = no
### 2.2 Architecture specific
ifeq ($(ARCH),general-32)
@ -120,13 +123,6 @@ ifeq ($(ARCH),x86-64)
sse = yes
endif
ifeq ($(ARCH),x86-64-ssse3)
arch = x86_64
prefetch = yes
sse = yes
ssse3 = yes
endif
ifeq ($(ARCH),x86-64-modern)
arch = x86_64
prefetch = yes
@ -134,10 +130,36 @@ ifeq ($(ARCH),x86-64-modern)
sse = yes
endif
ifeq ($(ARCH),x86-64-sse3)
arch = x86_64
prefetch = yes
sse = yes
sse3 = yes
ssse3 = yes
endif
ifeq ($(ARCH),x86-64-sse3-popcnt)
arch = x86_64
prefetch = yes
popcnt = yes
sse = yes
sse3 = yes
ssse3 = yes
endif
ifeq ($(ARCH),x86-64-ssse3)
arch = x86_64
prefetch = yes
sse = yes
sse3 = yes
ssse3 = yes
endif
ifeq ($(ARCH),x86-64-sse41)
arch = x86_64
prefetch = yes
sse = yes
sse3 = yes
ssse3 = yes
sse41 = yes
endif
@ -147,6 +169,7 @@ ifeq ($(ARCH),x86-64-sse42)
prefetch = yes
popcnt = yes
sse = yes
sse3 = yes
ssse3 = yes
sse41 = yes
sse42 = yes
@ -158,6 +181,7 @@ ifeq ($(ARCH),x86-64-avx2)
prefetch = yes
popcnt = yes
sse = yes
sse3 = yes
ssse3 = yes
sse41 = yes
sse42 = yes
@ -169,6 +193,7 @@ ifeq ($(ARCH),x86-64-bmi2)
prefetch = yes
popcnt = yes
sse = yes
sse3 = yes
ssse3 = yes
sse41 = yes
sse42 = yes
@ -176,6 +201,21 @@ ifeq ($(ARCH),x86-64-bmi2)
pext = yes
endif
ifeq ($(ARCH),x86-64-avx512)
arch = x86_64
bits = 64
prefetch = yes
popcnt = yes
sse = yes
sse3 = yes
ssse3 = yes
sse41 = yes
sse42 = yes
avx2 = yes
pext = yes
avx512 = yes
endif
ifeq ($(ARCH),armv7)
arch = armv7
prefetch = yes
@ -394,12 +434,11 @@ endif
### 3.6 popcnt
ifeq ($(popcnt),yes)
ifeq ($(arch),$(filter $(arch),ppc64 armv8-a))
CXXFLAGS += -DUSE_POPCNT
else ifeq ($(comp),icc)
CXXFLAGS += -msse3 -DUSE_POPCNT
else
CXXFLAGS += -msse3 -mpopcnt -DUSE_POPCNT
ifneq ($(arch),$(filter $(arch),ppc64 armv8-a))
ifeq ($(comp),$(filter $(comp),gcc clang mingw msys2))
CXXFLAGS += -mpopcnt
endif
endif
endif
@ -410,6 +449,13 @@ ifeq ($(avx2),yes)
endif
endif
ifeq ($(avx512),yes)
CXXFLAGS += -DUSE_AVX512
ifeq ($(comp),$(filter $(comp),gcc clang mingw msys2))
CXXFLAGS += -mavx512bw
endif
endif
ifeq ($(sse42),yes)
CXXFLAGS += -DUSE_SSE42
ifeq ($(comp),$(filter $(comp),gcc clang mingw msys2))
@ -431,6 +477,13 @@ ifeq ($(ssse3),yes)
endif
endif
ifeq ($(sse3),yes)
CXXFLAGS += -DUSE_SSE3
ifeq ($(comp),$(filter $(comp),gcc clang mingw msys2))
CXXFLAGS += -msse3
endif
endif
ifeq ($(arch),x86_64)
CXXFLAGS += -DUSE_SSE2
endif
@ -485,18 +538,22 @@ help:
@echo "Supported targets:"
@echo ""
@echo "build > Standard build"
@echo "profile-build > PGO build"
@echo "profile-build > Standard build with PGO"
@echo "strip > Strip executable"
@echo "install > Install executable"
@echo "clean > Clean up"
@echo ""
@echo "Supported archs:"
@echo ""
@echo "x86-64-avx512 > x86 64-bit with avx512 support"
@echo "x86-64-bmi2 > x86 64-bit with bmi2 support"
@echo "x86-64-avx2 > x86 64-bit with avx2 support"
@echo "x86-64-sse42 > x86 64-bit with sse42 support"
@echo "x86-64-sse41 > x86 64-bit with sse41 support"
@echo "x86-64-ssse3 > x86 64-bit with ssse3 support"
@echo "x86-64-sse3-popcnt > x86 64-bit with sse3 and popcnt support"
@echo "x86-64-sse3 > x86 64-bit with sse3 support"
@echo "x86-64-modern > x86 64-bit with popcnt support"
@echo "x86-64 > x86 64-bit generic"
@echo "x86-32 > x86 32-bit (also enables SSE)"
@echo "x86-32-old > x86 32-bit fall back for old hardware"
@ -593,11 +650,13 @@ config-sanity:
@echo "prefetch: '$(prefetch)'"
@echo "popcnt: '$(popcnt)'"
@echo "sse: '$(sse)'"
@echo "sse3: '$(sse3)'"
@echo "ssse3: '$(ssse3)'"
@echo "sse41: '$(sse41)'"
@echo "sse42: '$(sse42)'"
@echo "avx2: '$(avx2)'"
@echo "pext: '$(pext)'"
@echo "avx512: '$(avx512)'"
@echo ""
@echo "Flags:"
@echo "CXX: $(CXX)"
@ -616,11 +675,13 @@ config-sanity:
@test "$(prefetch)" = "yes" || test "$(prefetch)" = "no"
@test "$(popcnt)" = "yes" || test "$(popcnt)" = "no"
@test "$(sse)" = "yes" || test "$(sse)" = "no"
@test "$(sse3)" = "yes" || test "$(sse3)" = "no"
@test "$(ssse3)" = "yes" || test "$(ssse3)" = "no"
@test "$(sse41)" = "yes" || test "$(sse41)" = "no"
@test "$(sse42)" = "yes" || test "$(sse42)" = "no"
@test "$(avx2)" = "yes" || test "$(avx2)" = "no"
@test "$(pext)" = "yes" || test "$(pext)" = "no"
@test "$(avx512)" = "yes" || test "$(avx512)" = "no"
@test "$(comp)" = "gcc" || test "$(comp)" = "icc" || test "$(comp)" = "mingw" || test "$(comp)" = "clang"
$(EXE): $(OBJS)

View file

@ -67,7 +67,12 @@ namespace Eval::NNUE::Layers {
transformed_features, buffer + kSelfBufferSize);
const auto output = reinterpret_cast<OutputType*>(buffer);
#if defined(USE_AVX2)
#if defined(USE_AVX512)
constexpr IndexType kNumChunks = kPaddedInputDimensions / (kSimdWidth * 2);
const __m512i kOnes = _mm512_set1_epi16(1);
const auto input_vector = reinterpret_cast<const __m512i*>(input);
#elif defined(USE_AVX2)
constexpr IndexType kNumChunks = kPaddedInputDimensions / kSimdWidth;
const __m256i kOnes = _mm256_set1_epi16(1);
const auto input_vector = reinterpret_cast<const __m256i*>(input);
@ -85,8 +90,47 @@ namespace Eval::NNUE::Layers {
for (IndexType i = 0; i < kOutputDimensions; ++i) {
const IndexType offset = i * kPaddedInputDimensions;
#if defined(USE_AVX2)
__m256i sum = _mm256_set_epi32(0, 0, 0, 0, 0, 0, 0, biases_[i]);
#if defined(USE_AVX512)
__m512i sum = _mm512_setzero_si512();
const auto row = reinterpret_cast<const __m512i*>(&weights_[offset]);
for (IndexType j = 0; j < kNumChunks; ++j) {
#if defined(__MINGW32__) || defined(__MINGW64__)
__m512i product = _mm512_maddubs_epi16(_mm512_loadu_si512(&input_vector[j]), _mm512_load_si512(&row[j]));
#else
__m512i product = _mm512_maddubs_epi16(_mm512_load_si512(&input_vector[j]), _mm512_load_si512(&row[j]));
#endif
product = _mm512_madd_epi16(product, kOnes);
sum = _mm512_add_epi32(sum, product);
}
output[i] = _mm512_reduce_add_epi32(sum) + biases_[i];
// Note: Changing kMaxSimdWidth from 32 to 64 breaks loading existing networks.
// As a result kPaddedInputDimensions may not be an even multiple of 64(512bit)
// and we have to do one more 256bit chunk.
if (kPaddedInputDimensions != kNumChunks * kSimdWidth * 2)
{
const auto iv_256 = reinterpret_cast<const __m256i*>(input);
const auto row_256 = reinterpret_cast<const __m256i*>(&weights_[offset]);
int j = kNumChunks * 2;
#if defined(__MINGW32__) || defined(__MINGW64__) // See HACK comment below in AVX2.
__m256i sum256 = _mm256_maddubs_epi16(_mm256_loadu_si256(&iv_256[j]), _mm256_load_si256(&row_256[j]));
#else
__m256i sum256 = _mm256_maddubs_epi16(_mm256_load_si256(&iv_256[j]), _mm256_load_si256(&row_256[j]));
#endif
sum256 = _mm256_madd_epi16(sum256, _mm256_set1_epi16(1));
sum256 = _mm256_hadd_epi32(sum256, sum256);
sum256 = _mm256_hadd_epi32(sum256, sum256);
const __m128i lo = _mm256_extracti128_si256(sum256, 0);
const __m128i hi = _mm256_extracti128_si256(sum256, 1);
output[i] += _mm_cvtsi128_si32(lo) + _mm_cvtsi128_si32(hi);
}
#elif defined(USE_AVX2)
__m256i sum = _mm256_setzero_si256();
const auto row = reinterpret_cast<const __m256i*>(&weights_[offset]);
for (IndexType j = 0; j < kNumChunks; ++j) {
__m256i product = _mm256_maddubs_epi16(
@ -108,7 +152,7 @@ namespace Eval::NNUE::Layers {
sum = _mm256_hadd_epi32(sum, sum);
const __m128i lo = _mm256_extracti128_si256(sum, 0);
const __m128i hi = _mm256_extracti128_si256(sum, 1);
output[i] = _mm_cvtsi128_si32(lo) + _mm_cvtsi128_si32(hi);
output[i] = _mm_cvtsi128_si32(lo) + _mm_cvtsi128_si32(hi) + biases_[i];
#elif defined(USE_SSSE3)
__m128i sum = _mm_cvtsi32_si128(biases_[i]);

View file

@ -95,7 +95,7 @@ void init(OptionsMap& o) {
o["Syzygy50MoveRule"] << Option(true);
o["SyzygyProbeLimit"] << Option(7, 0, 7);
o["Use NNUE"] << Option(false, on_use_nnue);
o["EvalFile"] << Option("nn.bin", on_eval_file);
o["EvalFile"] << Option("./eval/nn.bin", on_eval_file);
}