BadFish

5796 commits 22 branches 47 tags 18 MiB

Author	SHA1	Message	Date
mstembera	1444837887	Remove inline assembly closes https://github.com/official-stockfish/Stockfish/pull/4698 No functional change	2023-07-19 21:39:51 +02:00
Sebastian Buchwald	b4ad3a3c4b	Add support for ARM dot product instructions The sdot instruction computes (and accumulates) a signed dot product, which is quite handy for Stockfish's NNUE code. The instruction is optional for Armv8.2 and Armv8.3, and mandatory for Armv8.4 and above. The commit adds a new 'arm-dotprod' architecture with enabled dot product support. It also enables dot product support for the existing 'apple-silicon' architecture, which is at least Armv8.5. The following local speed test was performed on an Apple M1 with ARCH=apple-silicon. I had to remove CPU pinning from the benchmark script. However, the results were still consistent: Checking both binaries against themselves reported a speedup of +0.0000 and +0.0005, respectively. ``` Result of 100 runs ================== base (...ish.037ef3e1) = 1917997 +/- 7152 test (...fish.dotprod) = 2159682 +/- 9066 diff = +241684 +/- 2923 speedup = +0.1260 P(speedup > 0) = 1.0000 CPU: 10 x arm Hyperthreading: off ``` Fixes #4193 closes https://github.com/official-stockfish/Stockfish/pull/4400 No functional change	2023-02-23 13:22:03 +01:00
MinetaS	2c36d1e7e7	Fix overflow in add_dpbusd_epi32x2 This patch fixes 16bit overflow in _add_dpbusd_epi32x2 functions, that can be triggered in rare cases depending on the NNUE weights. While the code leads to some slowdown on affected architectures (most notably avx2), the fix is simpler than some of the other options discussed in https://github.com/official-stockfish/Stockfish/pull/4394 Code suggested by Sopel97. Result of "bench 4096 1 30 default depth nnue": \| Architecture \| master \| patch (gcc) \| patch (clang) \| \|---------------------\|-----------\|-------------\|---------------\| \| x86-64-vnni512 \| 762122798 \| 762122798 \| 762122798 \| \| x86-64-avx512 \| 769723503 \| 762122798 \| 762122798 \| \| x86-64-bmi2 \| 769723503 \| 762122798 \| 762122798 \| \| x86-64-ssse3 \| 769723503 \| 762122798 \| 762122798 \| \| x86-64 \| 762122798 \| 762122798 \| 762122798 \| Following architectures will experience ~4% slowdown due to an additional instruction in the middle of hot path: x86-64-avx512 * x86-64-bmi2 * x86-64-avx2 * x86-64-sse41-popcnt (x86-64-modern) * x86-64-ssse3 * x86-32-sse41-popcnt This patch clearly loses Elo against master with both STC and LTC. Failed non-regression STC (256bit fix only): LLR: -2.95 (-2.94,2.94) <-1.75,0.25> Total: 33528 W: 8769 L: 9049 D: 15710 Ptnml(0-2): 96, 3616, 9600, 3376, 76 https://tests.stockfishchess.org/tests/view/63e6a5b44299542b1e26a485 60+0.6 @ 30000 games: Elo: -1.67 +-1.7 (95%) LOS: 2.8% Total: 30000 W: 7848 L: 7992 D: 14160 Ptnml(0-2): 12, 2847, 9436, 2683, 22 nElo: -3.84 +-3.9 (95%) PairsRatio: 0.95 https://tests.stockfishchess.org/tests/view/63e7ac716d0e1db55f35a660 However, a test against nn-a3dc078bafc7.nnue, which is the latest "safe" network not causing the bug, passed with regular bounds. Passed STC: LLR: 2.94 (-2.94,2.94) <0.00,2.00> Total: 160456 W: 42658 L: 42175 D: 75623 Ptnml(0-2): 487, 17638, 43469, 18173, 461 https://tests.stockfishchess.org/tests/view/63e89836d62a5d02b0fa82c8 closes https://github.com/official-stockfish/Stockfish/pull/4391 closes https://github.com/official-stockfish/Stockfish/pull/4394 No functional change	2023-02-18 13:23:18 +01:00
Sebastian Buchwald	da5bcec481	Fix asm modifiers in add_dpbusd_epi32x2 implementations The accumulator should be an earlyclobber because it is written before all input operands are read. Otherwise, the asm code computes a wrong result if the accumulator shares a register with one of the other input operands (which happens if we pass in the same expression for the accumulator and the operand). Closes https://github.com/official-stockfish/Stockfish/pull/4339 No functional change	2023-01-22 10:51:02 +01:00
Sebastian Buchwald	b60f9cc451	Update copyright years Happy New Year! closes https://github.com/official-stockfish/Stockfish/pull/4315 No functional change	2023-01-02 19:07:38 +01:00
mstembera	82bb21dc7a	Optimize AVX2 path in NNUE evaluation always selecting AffineTransform specialization for small inputs. A related patch was tested as Initially tested as a simplification STC https://tests.stockfishchess.org/tests/view/6317c3f437f41b13973d6dff LLR: 2.95 (-2.94,2.94) <-1.75,0.25> Total: 58072 W: 15619 L: 15425 D: 27028 Ptnml(0-2): 241, 6191, 15992, 6357, 255 Elo gain speedup test STC https://tests.stockfishchess.org/tests/view/63181c1b37f41b13973d79dc LLR: 2.94 (-2.94,2.94) <0.00,2.00> Total: 184496 W: 49922 L: 49401 D: 85173 Ptnml(0-2): 851, 19397, 51208, 19964, 828 and this patch gained in testing speedup = +0.0071 P(speedup > 0) = 1.0000 on CPU: 16 x AMD Ryzen 9 3950X closes https://github.com/official-stockfish/Stockfish/pull/4158 No functional change	2022-09-11 14:19:57 +02:00

Renamed from src/simd.h (Browse further)

6 commits