- remove the blank line between the declaration of the function and it's
comment, leads to better IDE support when hovering over a function to see it's
description
- remove the unnecessary duplication of the function name in the functions
description
- slightly refactored code for lsb, msb in bitboard.h There are still a few
things we can be improved later on, move the description of a function where
it was declared (instead of implemented) and add descriptions to functions
which are behind macros ifdefs
closes https://github.com/official-stockfish/Stockfish/pull/4840
No functional change
This introduces clang-format to enforce a consistent code style for Stockfish.
Having a documented and consistent style across the code will make contributing easier
for new developers, and will make larger changes to the codebase easier to make.
To facilitate formatting, this PR includes a Makefile target (`make format`) to format the code,
this requires clang-format (version 17 currently) to be installed locally.
Installing clang-format is straightforward on most OS and distros
(e.g. with https://apt.llvm.org/, brew install clang-format, etc), as this is part of quite commonly
used suite of tools and compilers (llvm / clang).
Additionally, a CI action is present that will verify if the code requires formatting,
and comment on the PR as needed. Initially, correct formatting is not required, it will be
done by maintainers as part of the merge or in later commits, but obviously this is encouraged.
fixes https://github.com/official-stockfish/Stockfish/issues/3608
closes https://github.com/official-stockfish/Stockfish/pull/4790
Co-Authored-By: Joost VandeVondele <Joost.VandeVondele@gmail.com>
Introduces a new NNUE network architecture and associated network parameters
The summary of the changes:
* Position for each perspective mirrored such that the king is on e..h files. Cuts the feature transformer size in half, while preserving enough knowledge to be good. See https://docs.google.com/document/d/1gTlrr02qSNKiXNZ_SuO4-RjK4MXBiFlLE6jvNqqMkAY/edit#heading=h.b40q4rb1w7on.
* The number of neurons after the feature transformer increased two-fold, to 1024x2. This is possibly mostly due to the now very optimized feature transformer update code.
* The number of neurons after the second layer is reduced from 16 to 8, to reduce the speed impact. This, perhaps surprisingly, doesn't harm the strength much. See https://docs.google.com/document/d/1gTlrr02qSNKiXNZ_SuO4-RjK4MXBiFlLE6jvNqqMkAY/edit#heading=h.6qkocr97fezq
The AffineTransform code did not work out-of-the box with the smaller number of neurons after the second layer, so some temporary changes have been made to add a special case for InputDimensions == 8. Also additional 0 padding is added to the output for some archs that cannot process inputs by <=8 (SSE2, NEON). VNNI uses an implementation that can keep all outputs in the registers while reducing the number of loads by 3 for each 16 inputs, thanks to the reduced number of output neurons. However GCC is particularily bad at optimization here (and perhaps why the current way the affine transform is done even passed sprt) (see https://docs.google.com/document/d/1gTlrr02qSNKiXNZ_SuO4-RjK4MXBiFlLE6jvNqqMkAY/edit# for details) and more work will be done on this in the following days. I expect the current VNNI implementation to be improved and extended to other architectures.
The network was trained with a slightly modified version of the pytorch trainer (https://github.com/glinscott/nnue-pytorch); the changes are in https://github.com/glinscott/nnue-pytorch/pull/143
The training utilized 2 datasets.
dataset A - https://drive.google.com/file/d/1VlhnHL8f-20AXhGkILujnNXHwy9T-MQw/view?usp=sharing
dataset B - as described in ba01f4b954
The training process was as following:
train on dataset A for 350 epochs, take the best net in terms of elo at 20k nodes per move (it's fine to take anything from later stages of training).
convert the .ckpt to .pt
--resume-from-model from the .pt file, train on dataset B for <600 epochs, take the best net. Lambda=0.8, applied before the loss function.
The first training command:
python3 train.py \
../nnue-pytorch-training/data/large_gensfen_multipvdiff_100_d9.binpack \
../nnue-pytorch-training/data/large_gensfen_multipvdiff_100_d9.binpack \
--gpus "$3," \
--threads 1 \
--num-workers 1 \
--batch-size 16384 \
--progress_bar_refresh_rate 20 \
--smart-fen-skipping \
--random-fen-skipping 3 \
--features=HalfKAv2_hm^ \
--lambda=1.0 \
--max_epochs=600 \
--default_root_dir ../nnue-pytorch-training/experiment_$1/run_$2
The second training command:
python3 serialize.py \
--features=HalfKAv2_hm^ \
../nnue-pytorch-training/experiment_131/run_6/default/version_0/checkpoints/epoch-499.ckpt \
../nnue-pytorch-training/experiment_$1/base/base.pt
python3 train.py \
../nnue-pytorch-training/data/michael_commit_b94a65.binpack \
../nnue-pytorch-training/data/michael_commit_b94a65.binpack \
--gpus "$3," \
--threads 1 \
--num-workers 1 \
--batch-size 16384 \
--progress_bar_refresh_rate 20 \
--smart-fen-skipping \
--random-fen-skipping 3 \
--features=HalfKAv2_hm^ \
--lambda=0.8 \
--max_epochs=600 \
--resume-from-model ../nnue-pytorch-training/experiment_$1/base/base.pt \
--default_root_dir ../nnue-pytorch-training/experiment_$1/run_$2
STC: https://tests.stockfishchess.org/tests/view/611120b32a8a49ac5be798c4
LLR: 2.97 (-2.94,2.94) <-0.50,2.50>
Total: 22480 W: 2434 L: 2251 D: 17795
Ptnml(0-2): 101, 1736, 7410, 1865, 128
LTC: https://tests.stockfishchess.org/tests/view/611152b32a8a49ac5be798ea
LLR: 2.93 (-2.94,2.94) <0.50,3.50>
Total: 9776 W: 442 L: 333 D: 9001
Ptnml(0-2): 5, 295, 4180, 402, 6
closes https://github.com/official-stockfish/Stockfish/pull/3646
bench: 5189338
2021-08-15 12:05:43 +02:00
Renamed from src/nnue/features/half_ka_v2.cpp (Browse further)