-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial support for SkylakeX / AVX512 #1589
Conversation
This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server) target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set, which brings 2 basic things: 1) 512 bit wide SIMD (2x width of AVX2) 2) 32 SIMD registers (2x the number on AVX2) This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel to AVX512VL; more will follow later but this patch aims to get the infrastructure in place for this "later". Full performance tuning has not been done yet; with more registers and wider SIMD it's in theory possible to retune the kernels but even without that there's an interesting enough performance increase (30-40% range) with just this change.
Build failures are probably due to lack of AVX512 support in the old compiler/assembler versions used on travis - same build runs fine locally. (I still wonder if we should add a test for this in c_check, as wanting to do a DYNAMIC_ARCH build on an older system may be "more legit" today than the same for Haswell/avx2 ?) |
well I'm only half sympathetic to DYNAMIC_ARCH if you don't have a toolchain that supports AVX512... likely best to somehow not use it. You can make an argument that on linux, with a sort of modern glibc you want something else; |
maybe SKYLAKEX shouldn't be in DYNAMIC_ARCH for now? might be the easiest path forward it can be hooked up later once the CI infra issues are resolved |
I am just testing a patch to c_check locally (along the lines of the MIPS have_msa test) to set NO_AVX512=1 if the build system is not up to the task. This would indeed only be for users of DYNAMIC_ARCH who do not even expect to include support for such "modern" targets in their library. |
Well, probably best to remove SKYLAKEX from DYNAMIC_ARCH for the moment, and re-enable it once appropriate autodetection is in place. |
Never mind - the autodetection code is basically in place already, just that it has been treating Skylake X the same as Skylake so far. I'll try to fix this up with the NO_AVX512 define for backward compatibility later today. |
you're clearly much more familiar with this part of the code than I am, but if there's something I can do to help let me know. In the mean time I'll see what it takes to get avx512 DGEMM going...so far it looks using dgemm_kernel_16x2_haswell.S and not dgemm_kernel_4x8_haswell.S as a base is the way to go for that. |
(also my patch won't run on KNL as it is; it's using AVX512VL not just base AVX512F. I can make it work on KNL I suppose, I'll poke around the office for a system to test if it's important) |
this required switching to the generic gemm_beta code (which is faster anyway on SKX) for both DGEMM and SGEMM Performance for the not-retuned version is in the 30% range
Not sure - should be "extern..." with the semicolon and "define" without it I think... sorry for trashing your PR like that BTW... |
sorry never mind, needed more coffee. I don't think this is thrashing, it's more "collaborating" ;) |
Looks better now, but I have no idea why the two travis checks with the old llvm 3.4 fail - avx512 support appears to be there (at least enough of it to not trigger the check I added), and the library builds locally with llvm 3.9 (the oldest I have available). |
I seem to be missing where your check is ;-) |
it appears that for the clang build, not a correct -march is passed somehow |
This is probably peculiar to the build host - at least on the next appveyor run it should fail to compile the |
Closing and reopening to trigger a CI rebuild |
This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server)
target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set,
which brings 2 basic things:
This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel
to AVX512VL; more will follow later but this patch aims to get the infrastructure
in place for this "later".
Full performance tuning has not been done yet; with more registers and wider SIMD
it's in theory possible to retune the kernels but even without that there's an
interesting enough performance increase (30-40% range) with just this change.