From 2fd1c48e6088856fdae9bea8218d419212447a33 Mon Sep 17 00:00:00 2001
From: xXH4CKST3RXx <52459831+xXH4CKST3RXx@users.noreply.github.com>
Date: Wed, 15 Jul 2020 23:15:34 -0400
Subject: [PATCH 01/22] Rename Readme.md to stockfish.md

---
 Readme.md => stockfish.md | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename Readme.md => stockfish.md (100%)

diff --git a/Readme.md b/stockfish.md
similarity index 100%
rename from Readme.md
rename to stockfish.md

From 6118151c6613bfb2fb01329987ce26dd565803ce Mon Sep 17 00:00:00 2001
From: xXH4CKST3RXx <52459831+xXH4CKST3RXx@users.noreply.github.com>
Date: Thu, 16 Jul 2020 00:00:29 -0400
Subject: [PATCH 02/22] Create README.md

Added and cleaned up Gekkehenker's training guide.
---
 README.md | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)
 create mode 100644 README.md

diff --git a/README.md b/README.md
new file mode 100644
index 00000000..9af97bee
--- /dev/null
+++ b/README.md
@@ -0,0 +1,41 @@
+# Stockfish NNUE
+
+## Overview
+Stockfish NNUE is a port of a shogi NN called NNUE (efficiently updateable neural network backwards) to Stockfish 11.
+
+## Training Guide
+### Generating Training Data
+Use the "no-nnue.nnue-gen-sfen-from-original-eval" binary. The given example is generation in its simplest form. There are more commands. 
+```
+uci
+setoption name Threads value x
+setoption name Hash value y
+setoption name SyzygyPath value path
+isready
+gensfen depth a loop b  use_draw_in_training_data_generation 1 eval_limit 32000
+```
+Specify how many threads and how much memory you would like to use with the x and y values. The option SyzygyPath is not necessary, but if you would like to use it, you must first have Syzygy endgame tablebases on your computer, which you can find [here](http://oics.olympuschess.com/tracker/index.php). You will need to have a torrent client to download these tablebases, as that is probably the fastest way to obtain them. The path is the path to the folder containing those tablebases. It does not have to be surrounded in quotes.
+
+This will save a file named "generated_kifu.bin" in the same folder as the binary. Once generation is done, rename the file to something like "1billiondepth12.bin" to remember the depth and quantity of the positions and move it to a folder named "trainingdata" in the same directory as the binaries.
+#### Generation Parameters
+- Depth is the searched depth per move, or how far the engine looks forward. This value is an integer.
+- Loop is the amount of positions generated. This value is also an integer
+### Generating validation data
+The process is the same as the generation of training data, except for the fact that you need to set loop to 1 million, because you don't need a lot of validation data. The depth should be the same as before or a little higher than the depth of the training data. After generation rename the validation data file to val.bin and drop it in a folder named "validationdata" in the same directory to make it easier. 
+### Training a completely new network
+Use the "avx2.halfkp_256x2-32-32.nnue-learn.2020-07-11" binary. Create an empty folder named "evalsave" in the same directory as the binaries.
+```
+uci
+setoption name SkipLoadingEval value true
+setoption name Threads value x
+isready
+learn targetdir trainingdata loop 100 batchsize 1000000 use_draw_in_training 1 use_draw_in_validation 1 eta 1 lambda 1 eval_limit 32000 nn_batch_size 1000 newbob_decay 0.5 eval_save_interval 250000000 loss_output_interval 1000000 mirror_percentage 50 validation_set_file_name validationdata\val.bin
+```
+Nets get saved in the "evalsave" folder. 
+
+#### Training Parameters
+- eta is the learning rate
+- lambda is the amount of weight it puts to eval of learning data vs win/draw/loss results. 1 puts all weight on eval, lambda 0 puts all weight on WDL results.
+
+### Using the Trained Net
+If you want to use your generated net, copy the net located in the "final" folder under the "evalsave" directory and move it into the "eval" folder. You can then use the halfkp_256x2 binaries with a standard chess GUI, such as Cutechess.

From df4da8dc41381a85f7f02dbafddcced5e41c5cce Mon Sep 17 00:00:00 2001
From: xXH4CKST3RXx <52459831+xXH4CKST3RXx@users.noreply.github.com>
Date: Thu, 16 Jul 2020 00:01:02 -0400
Subject: [PATCH 03/22] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 9af97bee..5d30b021 100644
--- a/README.md
+++ b/README.md
@@ -12,7 +12,7 @@ setoption name Threads value x
 setoption name Hash value y
 setoption name SyzygyPath value path
 isready
-gensfen depth a loop b  use_draw_in_training_data_generation 1 eval_limit 32000
+gensfen depth a loop b use_draw_in_training_data_generation 1 eval_limit 32000
 ```
 Specify how many threads and how much memory you would like to use with the x and y values. The option SyzygyPath is not necessary, but if you would like to use it, you must first have Syzygy endgame tablebases on your computer, which you can find [here](http://oics.olympuschess.com/tracker/index.php). You will need to have a torrent client to download these tablebases, as that is probably the fastest way to obtain them. The path is the path to the folder containing those tablebases. It does not have to be surrounded in quotes.
 

From ec5ef2b6dfad8b7d33aa504afd8c28bdf2b63396 Mon Sep 17 00:00:00 2001
From: xXH4CKST3RXx <52459831+xXH4CKST3RXx@users.noreply.github.com>
Date: Thu, 16 Jul 2020 00:01:59 -0400
Subject: [PATCH 04/22] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 5d30b021..555b76b1 100644
--- a/README.md
+++ b/README.md
@@ -1,7 +1,7 @@
 # Stockfish NNUE
 
 ## Overview
-Stockfish NNUE is a port of a shogi NN called NNUE (efficiently updateable neural network backwards) to Stockfish 11.
+Stockfish NNUE is a port of a shogi neural network named NNUE (efficiently updateable neural network backwards) to Stockfish 11.
 
 ## Training Guide
 ### Generating Training Data

From be754a237972c2085a39a6947a585a283f971603 Mon Sep 17 00:00:00 2001
From: xXH4CKST3RXx <52459831+xXH4CKST3RXx@users.noreply.github.com>
Date: Thu, 16 Jul 2020 00:10:30 -0400
Subject: [PATCH 05/22] Update README.md

---
 README.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/README.md b/README.md
index 555b76b1..44a8d1e0 100644
--- a/README.md
+++ b/README.md
@@ -1,7 +1,7 @@
 # Stockfish NNUE
 
 ## Overview
-Stockfish NNUE is a port of a shogi neural network named NNUE (efficiently updateable neural network backwards) to Stockfish 11.
+Stockfish NNUE is a port of a shogi neural network named NNUE (efficiently updateable neural network backwards) to Stockfish 11. To learn more about the Stockfish chess engine, look [here](stockfish.md) for an overview and [here](https://github.com/official-stockfish/Stockfish) for the official repository.
 
 ## Training Guide
 ### Generating Training Data
@@ -20,9 +20,9 @@ This will save a file named "generated_kifu.bin" in the same folder as the binar
 #### Generation Parameters
 - Depth is the searched depth per move, or how far the engine looks forward. This value is an integer.
 - Loop is the amount of positions generated. This value is also an integer
-### Generating validation data
-The process is the same as the generation of training data, except for the fact that you need to set loop to 1 million, because you don't need a lot of validation data. The depth should be the same as before or a little higher than the depth of the training data. After generation rename the validation data file to val.bin and drop it in a folder named "validationdata" in the same directory to make it easier. 
-### Training a completely new network
+### Generating Validation Data
+The process is the same as the generation of training data, except for the fact that you need to set loop to 1 million, because you don't need a lot of validation data. The depth should be the same as before or slightly higher than the depth of the training data. After generation rename the validation data file to val.bin and drop it in a folder named "validationdata" in the same directory to make it easier. 
+### Training a Completely New Network
 Use the "avx2.halfkp_256x2-32-32.nnue-learn.2020-07-11" binary. Create an empty folder named "evalsave" in the same directory as the binaries.
 ```
 uci
@@ -38,4 +38,4 @@ Nets get saved in the "evalsave" folder.
 - lambda is the amount of weight it puts to eval of learning data vs win/draw/loss results. 1 puts all weight on eval, lambda 0 puts all weight on WDL results.
 
 ### Using the Trained Net
-If you want to use your generated net, copy the net located in the "final" folder under the "evalsave" directory and move it into the "eval" folder. You can then use the halfkp_256x2 binaries with a standard chess GUI, such as Cutechess.
+If you want to use your generated net, copy the net located in the "final" folder under the "evalsave" directory and move it into a new folder named "eval" under the directory with the binaries. You can then use the halfkp_256x2 binaries pertaining to your CPU with a standard chess GUI, such as Cutechess. Refer to the [releases page](https://github.com/nodchip/Stockfish/releases) to find out which binary is best for your CPU.

From 2b821682aa9d815c00040dd8669bcaa017119e7c Mon Sep 17 00:00:00 2001
From: nodchip <nodchip@gmail.com>
Date: Fri, 17 Jul 2020 11:55:30 +0900
Subject: [PATCH 06/22] Update README.md

---
 README.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/README.md b/README.md
index 44a8d1e0..daa8fefb 100644
--- a/README.md
+++ b/README.md
@@ -39,3 +39,5 @@ Nets get saved in the "evalsave" folder.
 
 ### Using the Trained Net
 If you want to use your generated net, copy the net located in the "final" folder under the "evalsave" directory and move it into a new folder named "eval" under the directory with the binaries. You can then use the halfkp_256x2 binaries pertaining to your CPU with a standard chess GUI, such as Cutechess. Refer to the [releases page](https://github.com/nodchip/Stockfish/releases) to find out which binary is best for your CPU.
+
+If the engine does not load any net file, or shows "Error! *** not found or wrong format", please try to sepcify the net with the full file path by the "EvalFile" option.

From 4d4c80d7fdc4bb44036644f026e6deb25a580aa4 Mon Sep 17 00:00:00 2001
From: xXH4CKST3RXx <52459831+xXH4CKST3RXx@users.noreply.github.com>
Date: Thu, 16 Jul 2020 23:34:38 -0400
Subject: [PATCH 07/22] Update README.md

Added logo, reinforcement learning instructions, and resources list.
---
 README.md | 26 +++++++++++++++++++++++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index daa8fefb..73eec1fb 100644
--- a/README.md
+++ b/README.md
@@ -1,4 +1,8 @@
-# Stockfish NNUE
+<p align="center">
+  <img src="https://cdn.discordapp.com/attachments/724700045525647420/729135226365804594/SFNNUE2.png">
+</p>
+
+<h1 align="center">Stockfish NNUE</h1>
 
 ## Overview
 Stockfish NNUE is a port of a shogi neural network named NNUE (efficiently updateable neural network backwards) to Stockfish 11. To learn more about the Stockfish chess engine, look [here](stockfish.md) for an overview and [here](https://github.com/official-stockfish/Stockfish) for the official repository.
@@ -37,7 +41,23 @@ Nets get saved in the "evalsave" folder.
 - eta is the learning rate
 - lambda is the amount of weight it puts to eval of learning data vs win/draw/loss results. 1 puts all weight on eval, lambda 0 puts all weight on WDL results.
 
-### Using the Trained Net
+### Reinforcement Learning
+If you would like to do some reinforcement learning on your original network, you must first generate training data using the learn binaries. Make sure that your previously trained network is in the eval folder. Use the commands specified above. Make sure `SkipLoadingEval` is set to false so that the data generated is using the neural net's eval by typing the command `uci setoption name SkipLoadingEval value false` before typing the `isready` command. You should aim to generate less positions than the first run, around 1/10 of the number of positions generated in the first run. The depth should be higher as well. You should also do the same for validation data, with the depth being higher than the last run.
+
+After you have generated the training data, you must move it into your training data folder and delete the older data so that the binary does not accidentally train on the same data again. Do the same for the validation data and name it to val-1.bin to make it less confusing. Make sure the evalsave folder is empty. Then, using the same binary, type in the training commands shown above. Do __NOT__ set `SkipLoadingEval` to true, it must be false or you will get a completely new network, instead of a network trained with reinforcement learning. You should also set eval_save_interval to a number that is lower than the amount of positions in your training data, perhaps also 1/10 of the original value. The validation file should be set to the new validation data, not the old data.
+
+After training is finished, your new net should be located in the "final" folder under the "evalsave" directory. You should test this new network against the older network to see if there are any improvements.
+
+## Using Your Trained Net
 If you want to use your generated net, copy the net located in the "final" folder under the "evalsave" directory and move it into a new folder named "eval" under the directory with the binaries. You can then use the halfkp_256x2 binaries pertaining to your CPU with a standard chess GUI, such as Cutechess. Refer to the [releases page](https://github.com/nodchip/Stockfish/releases) to find out which binary is best for your CPU.
 
-If the engine does not load any net file, or shows "Error! *** not found or wrong format", please try to sepcify the net with the full file path by the "EvalFile" option.
+If the engine does not load any net file, or shows "Error! *** not found or wrong format", please try to sepcify the net with the full file path with the "EvalFile" option by typing the command `setoption name EvalFile value path` where path is the full file path.
+
+## Resources
+- [Stockfish NNUE Wiki](https://www.qhapaq.org/shogi/shogiwiki/stockfish-nnue/)
+- [Training instructions](https://twitter.com/mktakizawa/status/1273042640280252416) from the creator of the Elmo shogi engine
+- [Original Talkchess thread](http://talkchess.com/forum3/viewtopic.php?t=74059) discussing Stockfish NNUE
+- [Guide to Stockfish NNUE](http://yaneuraou.yaneu.com/2020/06/19/stockfish-nnue-the-complete-guide/) 
+- [Unofficial Stockfish Discord](https://discord.gg/nv8gDtt)
+
+A more updated list can be found in the #sf-nnue-resources channel in the Discord.

From 7a13d4ed60b09a9ce1b5aee46aa2a596bc4ca0fd Mon Sep 17 00:00:00 2001
From: nodchip <nodchip@gmail.com>
Date: Fri, 17 Jul 2020 15:40:01 +0900
Subject: [PATCH 08/22] Changed the default eval file path so that more GUIs
 can use Stockfish NNUE.

---
 src/ucioption.cpp | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/src/ucioption.cpp b/src/ucioption.cpp
index 8658adb4..ac5a6a16 100644
--- a/src/ucioption.cpp
+++ b/src/ucioption.cpp
@@ -81,11 +81,8 @@ void init(OptionsMap& o) {
   o["Syzygy50MoveRule"]      << Option(true);
   o["SyzygyProbeLimit"]      << Option(7, 0, 7);
   // Evaluation function file name. When this is changed, it is necessary to reread the evaluation function at the next ucinewgame timing.
-#if defined(__linux__)
-  o["EvalFile"]              << Option("eval/nn.bin", on_eval_file);
-#else
-  o["EvalFile"]              << Option("eval\\nn.bin", on_eval_file);
-#endif
+  // Without the preceding "./", some GUIs can not load he net file.
+  o["EvalFile"]              << Option("./eval/nn.bin", on_eval_file);
   // When the evaluation function is loaded at the ucinewgame timing, it is necessary to convert the new evaluation function.
   // I want to hit the test eval convert command, but there is no new evaluation function
   // It ends abnormally before executing this command.

From 961a4dad5ce83a7795a5e60f4f34dd56212621db Mon Sep 17 00:00:00 2001
From: mstembera <MissingEmail@email>
Date: Sat, 18 Jul 2020 19:21:46 -0700
Subject: [PATCH 09/22] Add AVX512 support. bench: 3909820

---
 src/Makefile                            | 28 ++++++++++++++-
 src/eval/nnue/layers/affine_transform.h | 47 ++++++++++++++++++++++---
 2 files changed, 70 insertions(+), 5 deletions(-)

diff --git a/src/Makefile b/src/Makefile
index 585d93a4..254f9bac 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -87,6 +87,7 @@ endif
 # sse42 = yes/no      --- -msse4.2         --- Use Intel Streaming SIMD Extensions 4.2
 # avx2 = yes/no       --- -mavx2           --- Use Intel Advanced Vector Extensions 2
 # pext = yes/no       --- -DUSE_PEXT       --- Use pext x86_64 asm-instruction
+# avx512 = yes/no     --- -mavx512vbmi     --- Use Intel Advanced Vector Extensions 512
 #
 # Note that Makefile is space sensitive, so when adding new architectures
 # or modifying existing flags, you have to make sure there are no extra spaces
@@ -105,6 +106,7 @@ sse41 = no
 sse42 = no
 avx2 = no
 pext = no
+avx512 = no
 
 ### 2.2 Architecture specific
 ifeq ($(ARCH),general-32)
@@ -183,6 +185,20 @@ ifeq ($(ARCH),x86-64-bmi2)
 	pext = yes
 endif
 
+ifeq ($(ARCH),x86-64-avx512)
+	arch = x86_64
+	bits = 64
+	prefetch = yes
+	popcnt = yes
+	sse = yes
+	ssse3 = yes
+	sse41 = yes
+	sse42 = yes
+	avx2 = yes
+	pext = yes
+	avx512 = yes
+endif
+
 ifeq ($(ARCH),armv7)
 	arch = armv7
 	prefetch = yes
@@ -407,7 +423,14 @@ endif
 ifeq ($(avx2),yes)
 	CXXFLAGS += -DUSE_AVX2
 	ifeq ($(comp),$(filter $(comp),gcc clang mingw msys2))
-		CXXFLAGS += -mavx2
+	CXXFLAGS += -mavx2
+	endif
+endif
+
+ifeq ($(avx512),yes)
+	CXXFLAGS += -DUSE_AVX512
+	ifeq ($(comp),$(filter $(comp),gcc clang mingw msys2))
+	CXXFLAGS += -mavx512vbmi
 	endif
 endif
 
@@ -493,6 +516,7 @@ help:
 	@echo ""
 	@echo "Supported archs:"
 	@echo ""
+	@echo "x86-64-avx512           > x86 64-bit with avx512 support"
 	@echo "x86-64-bmi2             > x86 64-bit with bmi2 support"
 	@echo "x86-64-avx2             > x86 64-bit with avx2 support"
 	@echo "x86-64-sse42            > x86 64-bit with sse42 support"
@@ -599,6 +623,7 @@ config-sanity:
 	@echo "sse42: '$(sse42)'"
 	@echo "avx2: '$(avx2)'"
 	@echo "pext: '$(pext)'"
+	@echo "avx512: '$(avx512)'"
 	@echo ""
 	@echo "Flags:"
 	@echo "CXX: $(CXX)"
@@ -622,6 +647,7 @@ config-sanity:
 	@test "$(sse42)" = "yes" || test "$(sse42)" = "no"
 	@test "$(avx2)" = "yes" || test "$(avx2)" = "no"
 	@test "$(pext)" = "yes" || test "$(pext)" = "no"
+	@test "$(avx512)" = "yes" || test "$(avx512)" = "no"
 	@test "$(comp)" = "gcc" || test "$(comp)" = "icc" || test "$(comp)" = "mingw" || test "$(comp)" = "clang"
 
 $(EXE): $(OBJS)
diff --git a/src/eval/nnue/layers/affine_transform.h b/src/eval/nnue/layers/affine_transform.h
index cb56b07d..2db7f731 100644
--- a/src/eval/nnue/layers/affine_transform.h
+++ b/src/eval/nnue/layers/affine_transform.h
@@ -82,7 +82,11 @@ class AffineTransform {
     const auto input = previous_layer_.Propagate(
         transformed_features, buffer + kSelfBufferSize);
     const auto output = reinterpret_cast<OutputType*>(buffer);
-#if defined(USE_AVX2)
+#if defined(USE_AVX512)
+    constexpr IndexType kNumChunks = kPaddedInputDimensions / (kSimdWidth * 2);
+    const __m512i kOnes = _mm512_set1_epi16(1);
+    const auto input_vector = reinterpret_cast<const __m512i*>(input);
+#elif defined(USE_AVX2)
     constexpr IndexType kNumChunks = kPaddedInputDimensions / kSimdWidth;
     const __m256i kOnes = _mm256_set1_epi16(1);
     const auto input_vector = reinterpret_cast<const __m256i*>(input);
@@ -96,8 +100,43 @@ class AffineTransform {
 #endif
     for (IndexType i = 0; i < kOutputDimensions; ++i) {
       const IndexType offset = i * kPaddedInputDimensions;
-#if defined(USE_AVX2)
-      __m256i sum = _mm256_set_epi32(0, 0, 0, 0, 0, 0, 0, biases_[i]);
+#if defined(USE_AVX512)
+      __m512i sum = _mm512_setzero_si512();
+      const auto row = reinterpret_cast<const __m512i*>(&weights_[offset]);
+      for (IndexType j = 0; j < kNumChunks; ++j) {
+#if defined(__MINGW32__) || defined(__MINGW64__)
+          __m512i product = _mm512_maddubs_epi16(_mm512_loadu_si512(&input_vector[j]), _mm512_load_si512(&row[j]));
+#else
+          __m512i product = _mm512_maddubs_epi16(_mm512_load_si512(&input_vector[j]), _mm512_load_si512(&row[j]));
+#endif
+          product = _mm512_madd_epi16(product, kOnes);
+          sum = _mm512_add_epi32(sum, product);
+      }
+      output[i] = _mm512_reduce_add_epi32(sum) + biases_[i];
+      
+      // Note: Changing kMaxSimdWidth from 32 to 64 breaks loading existing networks.
+      // As a result kPaddedInputDimensions may not be an even multiple of 64(512bit)
+      // and we have to do one more 256bit chunk.
+      if (kPaddedInputDimensions != kNumChunks * kSimdWidth * 2)
+      {
+          const auto iv_256  = reinterpret_cast<const __m256i*>(input);
+          const auto row_256 = reinterpret_cast<const __m256i*>(&weights_[offset]);
+          int j = kNumChunks * 2;
+#if defined(__MINGW32__) || defined(__MINGW64__)  // See HACK comment below in AVX2.
+          __m256i sum256 = _mm256_maddubs_epi16(_mm256_loadu_si256(&iv_256[j]), _mm256_load_si256(&row_256[j]));
+#else
+          __m256i sum256 = _mm256_maddubs_epi16(_mm256_load_si256(&iv_256[j]), _mm256_load_si256(&row_256[j]));
+#endif
+          sum256 = _mm256_madd_epi16(sum256, _mm256_set1_epi16(1));
+
+          sum256 = _mm256_hadd_epi32(sum256, sum256);
+          sum256 = _mm256_hadd_epi32(sum256, sum256);
+          const __m128i lo = _mm256_extracti128_si256(sum256, 0);
+          const __m128i hi = _mm256_extracti128_si256(sum256, 1);
+          output[i] += _mm_cvtsi128_si32(lo) + _mm_cvtsi128_si32(hi);
+      }
+#elif defined(USE_AVX2)
+      __m256i sum = _mm256_setzero_si256();
       const auto row = reinterpret_cast<const __m256i*>(&weights_[offset]);
       for (IndexType j = 0; j < kNumChunks; ++j) {
         __m256i product = _mm256_maddubs_epi16(
@@ -117,7 +156,7 @@ class AffineTransform {
       sum = _mm256_hadd_epi32(sum, sum);
       const __m128i lo = _mm256_extracti128_si256(sum, 0);
       const __m128i hi = _mm256_extracti128_si256(sum, 1);
-      output[i] = _mm_cvtsi128_si32(lo) + _mm_cvtsi128_si32(hi);
+      output[i] = _mm_cvtsi128_si32(lo) + _mm_cvtsi128_si32(hi) + biases_[i];
 #elif defined(USE_SSSE3)
       __m128i sum = _mm_cvtsi32_si128(biases_[i]);
       const auto row = reinterpret_cast<const __m128i*>(&weights_[offset]);

From c24ad8d8b5cfa4a6b3b47b087d3fa32dfb3337c0 Mon Sep 17 00:00:00 2001
From: nodchip <nodchip@gmail.com>
Date: Sun, 19 Jul 2020 12:26:37 +0900
Subject: [PATCH 10/22] Supported sse3 build.

---
 src/Makefile | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/src/Makefile b/src/Makefile
index 254f9bac..245fda0a 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -101,6 +101,7 @@ bits = 64
 prefetch = no
 popcnt = no
 sse = no
+sse3 = no
 ssse3 = no
 sse41 = no
 sse42 = no
@@ -136,10 +137,19 @@ ifeq ($(ARCH),x86-64)
 	sse = yes
 endif
 
+ifeq ($(ARCH),x86-64-sse3)
+	arch = x86_64
+	prefetch = yes
+	sse = yes
+	sse3 = yes
+	ssse3 = yes
+endif
+
 ifeq ($(ARCH),x86-64-ssse3)
 	arch = x86_64
 	prefetch = yes
 	sse = yes
+	sse3 = yes
 	ssse3 = yes
 endif
 
@@ -147,6 +157,7 @@ ifeq ($(ARCH),x86-64-sse41)
 	arch = x86_64
 	prefetch = yes
 	sse = yes
+	sse3 = yes
 	ssse3 = yes
 	sse41 = yes
 endif
@@ -156,6 +167,7 @@ ifeq ($(ARCH),x86-64-sse42)
 	prefetch = yes
 	popcnt = yes
 	sse = yes
+	sse3 = yes
 	ssse3 = yes
 	sse41 = yes
 	sse42 = yes
@@ -167,6 +179,7 @@ ifeq ($(ARCH),x86-64-avx2)
 	prefetch = yes
 	popcnt = yes
 	sse = yes
+	sse3 = yes
 	ssse3 = yes
 	sse41 = yes
 	sse42 = yes
@@ -178,6 +191,7 @@ ifeq ($(ARCH),x86-64-bmi2)
 	prefetch = yes
 	popcnt = yes
 	sse = yes
+	sse3 = yes
 	ssse3 = yes
 	sse41 = yes
 	sse42 = yes
@@ -191,6 +205,7 @@ ifeq ($(ARCH),x86-64-avx512)
 	prefetch = yes
 	popcnt = yes
 	sse = yes
+	sse3 = yes
 	ssse3 = yes
 	sse41 = yes
 	sse42 = yes
@@ -455,6 +470,13 @@ ifeq ($(ssse3),yes)
 	endif
 endif
 
+ifeq ($(sse3),yes)
+	CXXFLAGS += -DUSE_SSE3
+	ifeq ($(comp),$(filter $(comp),gcc clang mingw msys2))
+		CXXFLAGS += -msse3
+	endif
+endif
+
 ifeq ($(arch),x86_64)
 	CXXFLAGS += -DUSE_SSE2
 endif
@@ -522,6 +544,7 @@ help:
 	@echo "x86-64-sse42            > x86 64-bit with sse42 support"
 	@echo "x86-64-sse41            > x86 64-bit with sse41 support"
 	@echo "x86-64-ssse3            > x86 64-bit with ssse3 support"
+	@echo "x86-64-sse3             > x86 64-bit with ssse3 support"
 	@echo "x86-64                  > x86 64-bit generic"
 	@echo "x86-32                  > x86 32-bit (also enables SSE)"
 	@echo "x86-32-old              > x86 32-bit fall back for old hardware"
@@ -618,6 +641,7 @@ config-sanity:
 	@echo "prefetch: '$(prefetch)'"
 	@echo "popcnt: '$(popcnt)'"
 	@echo "sse: '$(sse)'"
+	@echo "sse3: '$(sse3)'"
 	@echo "ssse3: '$(ssse3)'"
 	@echo "sse41: '$(sse41)'"
 	@echo "sse42: '$(sse42)'"
@@ -642,6 +666,7 @@ config-sanity:
 	@test "$(prefetch)" = "yes" || test "$(prefetch)" = "no"
 	@test "$(popcnt)" = "yes" || test "$(popcnt)" = "no"
 	@test "$(sse)" = "yes" || test "$(sse)" = "no"
+	@test "$(sse3)" = "yes" || test "$(sse3)" = "no"
 	@test "$(ssse3)" = "yes" || test "$(ssse3)" = "no"
 	@test "$(sse41)" = "yes" || test "$(sse41)" = "no"
 	@test "$(sse42)" = "yes" || test "$(sse42)" = "no"

From a4786db4c2a0270d215550e7fec22edea691b123 Mon Sep 17 00:00:00 2001
From: nodchip <nodchip@gmail.com>
Date: Sun, 19 Jul 2020 12:41:50 +0900
Subject: [PATCH 11/22] Added support for architectures which supports
 SSE3+POPCNT, SSSE3+POPCNT and SSE41+POPCNT.

---
 src/Makefile | 35 +++++++++++++++++++++++++++++++++--
 1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/src/Makefile b/src/Makefile
index 245fda0a..c1b03dd8 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -145,6 +145,15 @@ ifeq ($(ARCH),x86-64-sse3)
 	ssse3 = yes
 endif
 
+ifeq ($(ARCH),x86-64-sse3-popcnt)
+	arch = x86_64
+	prefetch = yes
+	popcnt = yes
+	sse = yes
+	sse3 = yes
+	ssse3 = yes
+endif
+
 ifeq ($(ARCH),x86-64-ssse3)
 	arch = x86_64
 	prefetch = yes
@@ -153,6 +162,15 @@ ifeq ($(ARCH),x86-64-ssse3)
 	ssse3 = yes
 endif
 
+ifeq ($(ARCH),x86-64-ssse3-popcnt)
+	arch = x86_64
+	prefetch = yes
+	popcnt = yes
+	sse = yes
+	sse3 = yes
+	ssse3 = yes
+endif
+
 ifeq ($(ARCH),x86-64-sse41)
 	arch = x86_64
 	prefetch = yes
@@ -162,6 +180,16 @@ ifeq ($(ARCH),x86-64-sse41)
 	sse41 = yes
 endif
 
+ifeq ($(ARCH),x86-64-sse41-popcnt)
+	arch = x86_64
+	prefetch = yes
+	popcnt = yes
+	sse = yes
+	sse3 = yes
+	ssse3 = yes
+	sse41 = yes
+endif
+
 ifeq ($(ARCH),x86-64-sse42)
 	arch = x86_64
 	prefetch = yes
@@ -433,19 +461,22 @@ endif
 ### 3.6 popcnt
 ifeq ($(popcnt),yes)
 	CXXFLAGS += -DUSE_POPCNT
+	ifeq ($(comp),$(filter $(comp),gcc clang mingw msys2))
+		CXXFLAGS += -mpopcnt
+	endif
 endif
 
 ifeq ($(avx2),yes)
 	CXXFLAGS += -DUSE_AVX2
 	ifeq ($(comp),$(filter $(comp),gcc clang mingw msys2))
-	CXXFLAGS += -mavx2
+		CXXFLAGS += -mavx2
 	endif
 endif
 
 ifeq ($(avx512),yes)
 	CXXFLAGS += -DUSE_AVX512
 	ifeq ($(comp),$(filter $(comp),gcc clang mingw msys2))
-	CXXFLAGS += -mavx512vbmi
+		CXXFLAGS += -mavx512vbmi
 	endif
 endif
 

From 92c21674812fc1d7dcb9baf3d7e0b0999071a17b Mon Sep 17 00:00:00 2001
From: nodchip <nodchip@gmail.com>
Date: Sun, 19 Jul 2020 12:52:20 +0900
Subject: [PATCH 12/22] Removed x86-64-ssse3-popcnt and x86-64-sse41-popcnt.

---
 src/Makefile | 19 -------------------
 1 file changed, 19 deletions(-)

diff --git a/src/Makefile b/src/Makefile
index c1b03dd8..a504ce27 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -162,15 +162,6 @@ ifeq ($(ARCH),x86-64-ssse3)
 	ssse3 = yes
 endif
 
-ifeq ($(ARCH),x86-64-ssse3-popcnt)
-	arch = x86_64
-	prefetch = yes
-	popcnt = yes
-	sse = yes
-	sse3 = yes
-	ssse3 = yes
-endif
-
 ifeq ($(ARCH),x86-64-sse41)
 	arch = x86_64
 	prefetch = yes
@@ -180,16 +171,6 @@ ifeq ($(ARCH),x86-64-sse41)
 	sse41 = yes
 endif
 
-ifeq ($(ARCH),x86-64-sse41-popcnt)
-	arch = x86_64
-	prefetch = yes
-	popcnt = yes
-	sse = yes
-	sse3 = yes
-	ssse3 = yes
-	sse41 = yes
-endif
-
 ifeq ($(ARCH),x86-64-sse42)
 	arch = x86_64
 	prefetch = yes

From 1536e31065df90060b9053acdbc21b4319da7de9 Mon Sep 17 00:00:00 2001
From: No name <no@email>
Date: Fri, 17 Jul 2020 10:25:23 +0300
Subject: [PATCH 13/22] Load the parameter set on an `isready' as well

Unbreaks Scid vs. PC, which doesn't send `ucinewgame'.
---
 src/uci.cpp | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/uci.cpp b/src/uci.cpp
index 6d86ebca..c775f333 100644
--- a/src/uci.cpp
+++ b/src/uci.cpp
@@ -383,8 +383,12 @@ void UCI::loop(int argc, char* argv[]) {
 #endif
           Search::clear();
       }
-      else if (token == "isready")    sync_cout << "readyok" << sync_endl;
-
+      else if (token == "isready") {
+#if defined(EVAL_NNUE)
+          init_nnue(true);
+#endif
+          sync_cout << "readyok" << sync_endl;
+      }
       // Additional custom non-UCI commands, mainly for debugging.
       // Do not use these commands during a search!
       else if (token == "flip")     pos.flip();

From c001a4e62d8d63de6145a45f19cd35d855444e5c Mon Sep 17 00:00:00 2001
From: nodchip <nodchip@gmail.com>
Date: Sun, 19 Jul 2020 13:58:19 +0900
Subject: [PATCH 14/22] Revert "Removed x86-64-ssse3-popcnt and
 x86-64-sse41-popcnt."

This reverts commit 92c21674812fc1d7dcb9baf3d7e0b0999071a17b.
---
 src/Makefile | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/src/Makefile b/src/Makefile
index a504ce27..c1b03dd8 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -162,6 +162,15 @@ ifeq ($(ARCH),x86-64-ssse3)
 	ssse3 = yes
 endif
 
+ifeq ($(ARCH),x86-64-ssse3-popcnt)
+	arch = x86_64
+	prefetch = yes
+	popcnt = yes
+	sse = yes
+	sse3 = yes
+	ssse3 = yes
+endif
+
 ifeq ($(ARCH),x86-64-sse41)
 	arch = x86_64
 	prefetch = yes
@@ -171,6 +180,16 @@ ifeq ($(ARCH),x86-64-sse41)
 	sse41 = yes
 endif
 
+ifeq ($(ARCH),x86-64-sse41-popcnt)
+	arch = x86_64
+	prefetch = yes
+	popcnt = yes
+	sse = yes
+	sse3 = yes
+	ssse3 = yes
+	sse41 = yes
+endif
+
 ifeq ($(ARCH),x86-64-sse42)
 	arch = x86_64
 	prefetch = yes

From 3bbe4802b12bb7dd4298173ae002f87d2e1de476 Mon Sep 17 00:00:00 2001
From: nodchip <nodchip@gmail.com>
Date: Sun, 19 Jul 2020 14:02:49 +0900
Subject: [PATCH 15/22] Removed the sse41-popcnt architecture.

---
 src/Makefile | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/src/Makefile b/src/Makefile
index c1b03dd8..984f5871 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -180,16 +180,6 @@ ifeq ($(ARCH),x86-64-sse41)
 	sse41 = yes
 endif
 
-ifeq ($(ARCH),x86-64-sse41-popcnt)
-	arch = x86_64
-	prefetch = yes
-	popcnt = yes
-	sse = yes
-	sse3 = yes
-	ssse3 = yes
-	sse41 = yes
-endif
-
 ifeq ($(ARCH),x86-64-sse42)
 	arch = x86_64
 	prefetch = yes

From 36092b855a5e2bcfb587a36e2055ea068e4bd8e5 Mon Sep 17 00:00:00 2001
From: nodchip <nodchip@gmail.com>
Date: Sun, 19 Jul 2020 14:17:35 +0900
Subject: [PATCH 16/22] Removed the x86-64-ssse3-popcnt architecture.

---
 src/Makefile | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/src/Makefile b/src/Makefile
index 984f5871..a504ce27 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -162,15 +162,6 @@ ifeq ($(ARCH),x86-64-ssse3)
 	ssse3 = yes
 endif
 
-ifeq ($(ARCH),x86-64-ssse3-popcnt)
-	arch = x86_64
-	prefetch = yes
-	popcnt = yes
-	sse = yes
-	sse3 = yes
-	ssse3 = yes
-endif
-
 ifeq ($(ARCH),x86-64-sse41)
 	arch = x86_64
 	prefetch = yes

From afd7d0ea4d8ac031386ffc27f178c6dee49e0f89 Mon Sep 17 00:00:00 2001
From: nodchip <nodchip@gmail.com>
Date: Sun, 19 Jul 2020 18:34:35 +0900
Subject: [PATCH 17/22] Fixed a bug that Makefile specifies -mpopcnt for
 armv8-a.

---
 src/Makefile | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/Makefile b/src/Makefile
index a504ce27..4d56fc01 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -442,8 +442,10 @@ endif
 ### 3.6 popcnt
 ifeq ($(popcnt),yes)
 	CXXFLAGS += -DUSE_POPCNT
-	ifeq ($(comp),$(filter $(comp),gcc clang mingw msys2))
-		CXXFLAGS += -mpopcnt
+	ifneq ($(arch),$(filter $(arch),ppc64 armv8-a))
+		ifeq ($(comp),$(filter $(comp),gcc clang mingw msys2))
+			CXXFLAGS += -mpopcnt
+		endif
 	endif
 endif
 

From fd78fb05f6fbf3cab18160ce4f0bfba9d40bf5eb Mon Sep 17 00:00:00 2001
From: No name <no@email>
Date: Sun, 19 Jul 2020 13:50:00 +0300
Subject: [PATCH 18/22] Hide NNUE options if building without NNUE support

Also remove an unused option.
---
 src/ucioption.cpp | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/ucioption.cpp b/src/ucioption.cpp
index ac5a6a16..e145c34b 100644
--- a/src/ucioption.cpp
+++ b/src/ucioption.cpp
@@ -80,6 +80,7 @@ void init(OptionsMap& o) {
   o["SyzygyProbeDepth"]      << Option(1, 1, 100);
   o["Syzygy50MoveRule"]      << Option(true);
   o["SyzygyProbeLimit"]      << Option(7, 0, 7);
+#ifdef EVAL_NNUE
   // Evaluation function file name. When this is changed, it is necessary to reread the evaluation function at the next ucinewgame timing.
   // Without the preceding "./", some GUIs can not load he net file.
   o["EvalFile"]              << Option("./eval/nn.bin", on_eval_file);
@@ -90,8 +91,8 @@ void init(OptionsMap& o) {
   // Hit the test eval convert command.
   o["SkipLoadingEval"]       << Option(false);
   // how many moves to use a fixed move
-  o["BookMoves"] << Option(16, 0, 10000);
-
+  // o["BookMoves"] << Option(16, 0, 10000);
+#endif
 #if defined(EVAL_LEARN)
   // When learning the evaluation function, you can change the folder to save the evaluation function.
   // Evalsave by default. This folder shall be prepared in advance.

From 77018c77cc736854367a918eb14b45615a1f7587 Mon Sep 17 00:00:00 2001
From: mstembera <MissingEmail@email>
Date: Sun, 19 Jul 2020 05:16:13 -0700
Subject: [PATCH 19/22] Fix profile builds for AVX512.

---
 src/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/Makefile b/src/Makefile
index 4d56fc01..cfa96694 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -459,7 +459,7 @@ endif
 ifeq ($(avx512),yes)
 	CXXFLAGS += -DUSE_AVX512
 	ifeq ($(comp),$(filter $(comp),gcc clang mingw msys2))
-		CXXFLAGS += -mavx512vbmi
+		CXXFLAGS += -mavx512bw
 	endif
 endif
 

From fbdb373b6482db2b462b30d7399c3c7fed1d4f26 Mon Sep 17 00:00:00 2001
From: nodchip <nodchip@gmail.com>
Date: Mon, 20 Jul 2020 17:17:50 +0900
Subject: [PATCH 20/22] Changed to set the binary directory to the current
 working directory.

---
 src/main.cpp | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/src/main.cpp b/src/main.cpp
index fafefee2..6001432d 100644
--- a/src/main.cpp
+++ b/src/main.cpp
@@ -20,6 +20,15 @@
 
 #include <iostream>
 
+#ifdef _WIN32
+#include <filesystem>
+
+#ifndef NOMINMAX
+#define NOMINMAX
+#endif
+#include <Windows.h>
+#endif
+
 #include "bitboard.h"
 #include "endgame.h"
 #include "position.h"
@@ -34,6 +43,17 @@ namespace PSQT {
 }
 
 int main(int argc, char* argv[]) {
+  // Change the current working directory to the binary directory.  So that a
+  // net file path can be specified with a relative path from the binary
+  // directory.
+  // TODO(someone): Implement the logic for other OS.
+#ifdef _WIN32
+  TCHAR filename[_MAX_PATH];
+  ::GetModuleFileName(NULL, filename, sizeof(filename) / sizeof(filename[0]));
+  std::filesystem::path current_path = filename;
+  current_path.remove_filename();
+  std::filesystem::current_path(current_path);
+#endif
 
   std::cout << engine_info() << std::endl;
 

From 74049a450c99393912aa3e60da48f4fb7622fb95 Mon Sep 17 00:00:00 2001
From: No name <no@email>
Date: Mon, 20 Jul 2020 09:53:44 +0300
Subject: [PATCH 21/22] Add NNUE targets to the output of 'make help'

---
 src/Makefile | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/src/Makefile b/src/Makefile
index cfa96694..9c9aed5c 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -544,8 +544,17 @@ help:
 	@echo ""
 	@echo "Supported targets:"
 	@echo ""
-	@echo "build                   > Standard build"
-	@echo "profile-build           > PGO build"
+	@echo "build                   > Standard (without NNUE) build"
+	@echo "profile-build           > Standard build with PGO"
+	@echo "nnue                    > NNUE-enabled build"
+	@echo "profile-nnue            > NNUE-enabled build with PGO"
+	@echo "nnue-learn              > Produces or refines a NNUE parameter set."
+	@echo "                            Requires training data that can be"
+	@echo "                            generated by itself using an existing"
+	@echo "                            parameter set, or with the next tool"
+	@echo "nnue-gen-sfen-from-original-eval"
+	@echo "                        > Produces training data for 'nnue-learn'"
+	@echo "                        >   without using a NNUE parameter set"
 	@echo "strip                   > Strip executable"
 	@echo "install                 > Install executable"
 	@echo "clean                   > Clean up"

From c0e1235fef49b3338660c076e55cccf67331d77f Mon Sep 17 00:00:00 2001
From: nodchip <nodchip@gmail.com>
Date: Mon, 20 Jul 2020 17:36:09 +0900
Subject: [PATCH 22/22] Added a description to Makefile.

---
 src/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/Makefile b/src/Makefile
index 9c9aed5c..2e6c415d 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -567,6 +567,7 @@ help:
 	@echo "x86-64-sse42            > x86 64-bit with sse42 support"
 	@echo "x86-64-sse41            > x86 64-bit with sse41 support"
 	@echo "x86-64-ssse3            > x86 64-bit with ssse3 support"
+	@echo "x86-64-sse3-popcnt      > x86 64-bit with ssse3 and popcnt support"
 	@echo "x86-64-sse3             > x86 64-bit with ssse3 support"
 	@echo "x86-64                  > x86 64-bit generic"
 	@echo "x86-32                  > x86 32-bit (also enables SSE)"