Initial support for SkylakeX / AVX512 #1589

fenrus75 · 2018-06-03T07:29:07Z

This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server)
target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set,
which brings 2 basic things:

512 bit wide SIMD (2x width of AVX2)
32 SIMD registers (2x the number on AVX2)

This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel
to AVX512VL; more will follow later but this patch aims to get the infrastructure
in place for this "later".

Full performance tuning has not been done yet; with more registers and wider SIMD
it's in theory possible to retune the kernels but even without that there's an
interesting enough performance increase (30-40% range) with just this change.

This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server) target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set, which brings 2 basic things: 1) 512 bit wide SIMD (2x width of AVX2) 2) 32 SIMD registers (2x the number on AVX2) This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel to AVX512VL; more will follow later but this patch aims to get the infrastructure in place for this "later". Full performance tuning has not been done yet; with more registers and wider SIMD it's in theory possible to retune the kernels but even without that there's an interesting enough performance increase (30-40% range) with just this change.

martin-frbg · 2018-06-03T13:04:24Z

Build failures are probably due to lack of AVX512 support in the old compiler/assembler versions used on travis - same build runs fine locally. (I still wonder if we should add a test for this in c_check, as wanting to do a DYNAMIC_ARCH build on an older system may be "more legit" today than the same for Haswell/avx2 ?)
The appveyor DYNAMIC_ARCH build used to barely make it under the default 1h limit even before the addition of yet another CPU type...

fenrus75 · 2018-06-03T17:14:00Z

well I'm only half sympathetic to DYNAMIC_ARCH if you don't have a toolchain that supports AVX512... likely best to somehow not use it.

You can make an argument that on linux, with a sort of modern glibc you want something else;
you'd want a generic binary in /usr/lib64, a haswell style binary in /usr/lib64/haswell and a skylakex binary in /usr/lib64/haswell/avx512_1 .. glibc will pick up the right one. And with this you can compile the library with good compiler flags
(this is how the Clear Linux distro ships openblas)

fenrus75 · 2018-06-03T17:18:22Z

maybe SKYLAKEX shouldn't be in DYNAMIC_ARCH for now? might be the easiest path forward

it can be hooked up later once the CI infra issues are resolved

martin-frbg · 2018-06-03T17:25:25Z

I am just testing a patch to c_check locally (along the lines of the MIPS have_msa test) to set NO_AVX512=1 if the build system is not up to the task. This would indeed only be for users of DYNAMIC_ARCH who do not even expect to include support for such "modern" targets in their library.

martin-frbg · 2018-06-03T18:33:15Z

Well, probably best to remove SKYLAKEX from DYNAMIC_ARCH for the moment, and re-enable it once appropriate autodetection is in place.
I expect one would need to expand or clone suppport_avx() in cpuid_x86.c (build-time) and driver/others/dynamic.c (run-time detection) to modify the target type returned for the relevant cpuid numbers.
(As it is now, your SkylakeX code would never get called in DYNAMIC_ARCH mode anyway, and without DYNAMIC_ARCH it would only get built if one forces TARGET=SKYLAKEX. Which is probably fine at this stage - though perhaps it could be unconditionally enabled for Knights Landing to improve the situation with #991 ...)

martin-frbg · 2018-06-03T18:53:16Z

Never mind - the autodetection code is basically in place already, just that it has been treating Skylake X the same as Skylake so far. I'll try to fix this up with the NO_AVX512 define for backward compatibility later today.

fenrus75 · 2018-06-03T19:13:29Z

you're clearly much more familiar with this part of the code than I am, but if there's something I can do to help let me know.

In the mean time I'll see what it takes to get avx512 DGEMM going...so far it looks using dgemm_kernel_16x2_haswell.S and not dgemm_kernel_4x8_haswell.S as a base is the way to go for that.

fenrus75 · 2018-06-03T19:56:16Z

(also my patch won't run on KNL as it is; it's using AVX512VL not just base AVX512F. I can make it work on KNL I suppose, I'll poke around the office for a system to test if it's important)

this required switching to the generic gemm_beta code (which is faster anyway on SKX) for both DGEMM and SGEMM Performance for the not-retuned version is in the 30% range

martin-frbg · 2018-06-04T12:59:12Z

Not sure - should be "extern..." with the semicolon and "define" without it I think... sorry for trashing your PR like that BTW...

fenrus75 · 2018-06-04T14:08:40Z

sorry never mind, needed more coffee.

I don't think this is thrashing, it's more "collaborating" ;)

martin-frbg · 2018-06-05T20:17:05Z

Looks better now, but I have no idea why the two travis checks with the old llvm 3.4 fail - avx512 support appears to be there (at least enough of it to not trigger the check I added), and the library builds locally with llvm 3.9 (the oldest I have available).
The failing appveyor build uses llvm 6, but still cannot handle avx512 instructions for some reason
(maybe the defaults are different on windows), and the cmake system is still lacking a test for that.

fenrus75 · 2018-06-06T05:02:07Z

I seem to be missing where your check is ;-)
(but it's before coffee)

fenrus75 · 2018-06-06T10:06:50Z

it appears that for the clang build, not a correct -march is passed somehow

martin-frbg · 2018-06-06T10:23:48Z

This is probably peculiar to the build host - at least on the next appveyor run it should fail to compile the vaddps %zmm1, %zmm0, %zmm0 test case and alias SkylakeX to Haswell instead of bombing out. (The two failing clang runs in travis must have a different problem, but I have not found out what it is.)
I do not see an easy way to add -march=skylake-avx512 to the CFLAGS if the build host does not support it (in which case tools like getarch would not run anymore)

martin-frbg · 2018-06-06T16:43:20Z

Closing and reopening to trigger a CI rebuild

fenrus75 mentioned this pull request Jun 3, 2018

AVX512 sgemm support -- question #1588

Closed

fenrus75 force-pushed the skylakex branch from f655a85 to 99c7bba Compare June 3, 2018 07:58

fenrus75 and others added 3 commits June 3, 2018 07:58

Typo fix (misplaced parenthesis)

0023515

Propagate NO_AVX512 if needed

f1fb9a4

martin-frbg and others added 7 commits June 3, 2018 23:13

Add SKYLAKEX to DYNAMIC_CORE list only if AVX512 is available

a7d0f49

Separate Skylake X from Skylake

5a92b31

Separate Skylake X from Skylake

5a51cf4

typo fix

ef626c6

Use AVX512 also for DGEMM

89372e0

this required switching to the generic gemm_beta code (which is faster anyway on SKX) for both DGEMM and SGEMM Performance for the not-retuned version is in the 30% range

Fix misplaced endif

ac7b6e3

Update dynamic.c

8be027e

martin-frbg added 6 commits June 4, 2018 17:10

Update cpuid_x86.c

dc9fe05

Propagate NO_AVX512 via CCOMMON_OPT

b7feded

Extend loop range to find SkylakeX in force_coretype

38ad05b

export NO_AVX512 setting

15a78d6

disable quiet_make for the moment

e800253

Re-enable QUIET_MAKE

f6021c7

martin-frbg closed this Jun 6, 2018

martin-frbg reopened this Jun 6, 2018

martin-frbg merged commit cf234a0 into OpenMathLib:develop Jun 6, 2018

martin-frbg added this to the 0.3.1 milestone Jun 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial support for SkylakeX / AVX512 #1589

Initial support for SkylakeX / AVX512 #1589

fenrus75 commented Jun 3, 2018

martin-frbg commented Jun 3, 2018

fenrus75 commented Jun 3, 2018

fenrus75 commented Jun 3, 2018

martin-frbg commented Jun 3, 2018

martin-frbg commented Jun 3, 2018

martin-frbg commented Jun 3, 2018

fenrus75 commented Jun 3, 2018

fenrus75 commented Jun 3, 2018 •

edited

Loading

martin-frbg commented Jun 4, 2018

fenrus75 commented Jun 4, 2018

martin-frbg commented Jun 5, 2018

fenrus75 commented Jun 6, 2018

fenrus75 commented Jun 6, 2018

martin-frbg commented Jun 6, 2018

martin-frbg commented Jun 6, 2018

Initial support for SkylakeX / AVX512 #1589

Initial support for SkylakeX / AVX512 #1589

Conversation

fenrus75 commented Jun 3, 2018

martin-frbg commented Jun 3, 2018

fenrus75 commented Jun 3, 2018

fenrus75 commented Jun 3, 2018

martin-frbg commented Jun 3, 2018

martin-frbg commented Jun 3, 2018

martin-frbg commented Jun 3, 2018

fenrus75 commented Jun 3, 2018

fenrus75 commented Jun 3, 2018 • edited Loading

martin-frbg commented Jun 4, 2018

fenrus75 commented Jun 4, 2018

martin-frbg commented Jun 5, 2018

fenrus75 commented Jun 6, 2018

fenrus75 commented Jun 6, 2018

martin-frbg commented Jun 6, 2018

martin-frbg commented Jun 6, 2018

fenrus75 commented Jun 3, 2018 •

edited

Loading