1
0
Fork 0
mirror of https://github.com/sockspls/badfish synced 2025-05-01 09:13:08 +00:00
Commit graph

6 commits

Author SHA1 Message Date
mstembera
1444837887 Remove inline assembly
closes https://github.com/official-stockfish/Stockfish/pull/4698

No functional change
2023-07-19 21:39:51 +02:00
Sebastian Buchwald
b4ad3a3c4b Add support for ARM dot product instructions
The sdot instruction computes (and accumulates) a signed dot product,
which is quite handy for Stockfish's NNUE code. The instruction is
optional for Armv8.2 and Armv8.3, and mandatory for Armv8.4 and above.

The commit adds a new 'arm-dotprod' architecture with enabled dot
product support. It also enables dot product support for the existing
'apple-silicon' architecture, which is at least Armv8.5.

The following local speed test was performed on an Apple M1 with
ARCH=apple-silicon. I had to remove CPU pinning from the benchmark
script. However, the results were still consistent: Checking both
binaries against themselves reported a speedup of +0.0000 and +0.0005,
respectively.

```
Result of 100 runs
==================
base (...ish.037ef3e1) =    1917997  +/- 7152
test (...fish.dotprod) =    2159682  +/- 9066
diff                   =    +241684  +/- 2923

speedup        = +0.1260
P(speedup > 0) =  1.0000

CPU: 10 x arm
Hyperthreading: off
```

Fixes #4193

closes https://github.com/official-stockfish/Stockfish/pull/4400

No functional change
2023-02-23 13:22:03 +01:00
MinetaS
2c36d1e7e7 Fix overflow in add_dpbusd_epi32x2
This patch fixes 16bit overflow in *_add_dpbusd_epi32x2 functions,
that can be triggered in rare cases depending on the NNUE weights.

While the code leads to some slowdown on affected architectures
(most notably avx2), the fix is simpler than some of the other
options discussed in
https://github.com/official-stockfish/Stockfish/pull/4394

Code suggested by Sopel97.

Result of "bench 4096 1 30 default depth nnue":

| Architecture        | master    | patch (gcc) | patch (clang) |
|---------------------|-----------|-------------|---------------|
| x86-64-vnni512      | 762122798 | 762122798   | 762122798     |
| x86-64-avx512       | 769723503 | 762122798   | 762122798     |
| x86-64-bmi2         | 769723503 | 762122798   | 762122798     |
| x86-64-ssse3        | 769723503 | 762122798   | 762122798     |
| x86-64              | 762122798 | 762122798   | 762122798     |

Following architectures will experience ~4% slowdown due to an
additional instruction in the middle of hot path:

* x86-64-avx512
* x86-64-bmi2
* x86-64-avx2
* x86-64-sse41-popcnt (x86-64-modern)
* x86-64-ssse3
* x86-32-sse41-popcnt

This patch clearly loses Elo against master with both STC and LTC.

Failed non-regression STC (256bit fix only):
LLR: -2.95 (-2.94,2.94) <-1.75,0.25>
Total: 33528 W: 8769 L: 9049 D: 15710
Ptnml(0-2): 96, 3616, 9600, 3376, 76
https://tests.stockfishchess.org/tests/view/63e6a5b44299542b1e26a485

60+0.6 @ 30000 games:
Elo: -1.67 +-1.7 (95%) LOS: 2.8%
Total: 30000 W: 7848 L: 7992 D: 14160
Ptnml(0-2): 12, 2847, 9436, 2683, 22
nElo: -3.84 +-3.9 (95%) PairsRatio: 0.95
https://tests.stockfishchess.org/tests/view/63e7ac716d0e1db55f35a660

However, a test against nn-a3dc078bafc7.nnue, which is the latest "safe"
network not causing the bug, passed with regular bounds.

Passed STC:
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 160456 W: 42658 L: 42175 D: 75623
Ptnml(0-2): 487, 17638, 43469, 18173, 461
https://tests.stockfishchess.org/tests/view/63e89836d62a5d02b0fa82c8

closes https://github.com/official-stockfish/Stockfish/pull/4391
closes https://github.com/official-stockfish/Stockfish/pull/4394

No functional change
2023-02-18 13:23:18 +01:00
Sebastian Buchwald
da5bcec481 Fix asm modifiers in add_dpbusd_epi32x2 implementations
The accumulator should be an earlyclobber because it is written before
all input operands are read. Otherwise, the asm code computes a wrong
result if the accumulator shares a register with one of the other input
operands (which happens if we pass in the same expression for the
accumulator and the operand).

Closes https://github.com/official-stockfish/Stockfish/pull/4339

No functional change
2023-01-22 10:51:02 +01:00
Sebastian Buchwald
b60f9cc451 Update copyright years
Happy New Year!

closes https://github.com/official-stockfish/Stockfish/pull/4315

No functional change
2023-01-02 19:07:38 +01:00
mstembera
82bb21dc7a Optimize AVX2 path in NNUE evaluation
always selecting AffineTransform specialization for small inputs.

A related patch was tested as

Initially tested as a simplification
STC https://tests.stockfishchess.org/tests/view/6317c3f437f41b13973d6dff
LLR: 2.95 (-2.94,2.94) <-1.75,0.25>
Total: 58072 W: 15619 L: 15425 D: 27028
Ptnml(0-2): 241, 6191, 15992, 6357, 255

Elo gain speedup test
STC https://tests.stockfishchess.org/tests/view/63181c1b37f41b13973d79dc
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 184496 W: 49922 L: 49401 D: 85173
Ptnml(0-2): 851, 19397, 51208, 19964, 828

and this patch gained in testing

speedup        = +0.0071
P(speedup > 0) =  1.0000
on CPU: 16 x AMD Ryzen 9 3950X

closes https://github.com/official-stockfish/Stockfish/pull/4158

No functional change
2022-09-11 14:19:57 +02:00
Renamed from src/simd.h (Browse further)