1
0
Fork 0
mirror of https://github.com/sockspls/badfish synced 2025-05-01 01:03:09 +00:00
BadFish/src/nnue
Tomasz Sobczyk ba35c88ab8 AVX-512 for smaller affine and feature transforms.
For the feature transformer the code is analogical to AVX2 since there was room for easy adaptation of wider simd registers.

For the smaller affine transforms that have 32 byte stride we keep 2 columns in one zmm register. We also unroll more aggressively so that in the end we have to do 16 parallel horizontal additions on ymm slices each consisting of 4 32-bit integers. The slices are embedded in 8 zmm registers.

These changes provide about 1.5% speedup for AVX-512 builds.

Closes https://github.com/official-stockfish/Stockfish/pull/3218

No functional change.
2020-11-07 16:49:49 +01:00
..
architectures Add NNUE evaluation 2020-08-06 16:37:45 +02:00
features More incremental accumulator updates 2020-10-22 20:50:16 +02:00
layers AVX-512 for smaller affine and feature transforms. 2020-11-07 16:49:49 +01:00
evaluate_nnue.cpp Manually align arrays on the stack 2020-11-04 19:52:42 +01:00
evaluate_nnue.h Small cleanups 12 2020-09-21 10:41:10 +02:00
nnue_accumulator.h More incremental accumulator updates 2020-10-22 20:50:16 +02:00
nnue_architecture.h Add NNUE evaluation 2020-08-06 16:37:45 +02:00
nnue_common.h Manually align arrays on the stack 2020-11-04 19:52:42 +01:00
nnue_feature_transformer.h AVX-512 for smaller affine and feature transforms. 2020-11-07 16:49:49 +01:00