-
Notifications
You must be signed in to change notification settings - Fork 7
On a stratergy for Porting SLEEF #10
Comments
I haven't looked at SLEEF sufficiently closely, but my experience with vectorised math libraries is that they tend to sacrifice some accuracy for performance, for example the Things can be even more problematic in the extremes, e.g. for trig functions: to compute One option would be to only expose such reduced-accuracy vectorised ops via |
The benefit of having julia implementations is that we can have nice interfaces to trade off accuracy for speed, and doing so transparently in user code. |
Related to that, one of the things that really concerns me about SLEEF is that it is not documented (at all), in particular that it is not documented for the source of its algorithms, or their accuracy. Where as OpenLibm (and its sources), are meticulously documented with their accuracys and references to the papers/textbooks where the algorithms was sourced. I know if you run the test code it will spit out some accuracy information in I ran that, results are here And for these there are sin_u1, cos_u1, sincos_u1, tan_u1, asin_u1, acos_u1, atan_u1, atan2_u1, log_u1, and cbrt_u1, which do come within 1ulp The ranges that it is tested on is pretty solid, I think. |
hi oxinabox good ideas! The ccall idea should be good way to test the waters, so we can benchmark benefits and potential downsides |
I now have SLEEF tied in the ccalls. |
nice. do you plan on exposing ccal's into the intrinsics for the vectorized stuff, esp for the funs that double2 use? |
Not right now for sure, prob not for a few weeks, I should really get my actual research/work done. If you need them though, I can take a crack at it, (Prob not for a few days) |
sounds good no rush on my end for the vectorized ver yet! |
actually @oxinabox all the double functions in julia are benefiting from llvm's autovectorization, so it's unlikely we need to perform any manual simd vectorization, which is great |
So one of the options would be to port SLEEF; by Naoki Shibata (repo), and Hal Finkel (repo).
ViralBShah said:
I dug through the SLEEF code on the weekend, thinking about this.
SLEEF has two implementations, each in a folder in its repo.
purec
which does not feature SIMD magic,and
simd
which does.In general they are very similar in implementation, it just comes down to whether the lowest level operations use SIMD (AFAICTICBWT).
In both case, the library is Branch-Free, and has no constant tables -- which is very different from OpenLibm.
So really we could implement port the simd, or the purec implementation. Or if I was correct and it is just the low level operation we can have both by just swapping out one set of definitions for the other -- but it is probably not that simple as otherwise it would be done in Naoki Shibata's C code/makefiles.
If we trusted LLVM's autovectorization was the go, then porting the purec would be the go.
Related on that note: https://groups.google.com/d/msg/llvm-dev/rvLlViuu2Aw/qotzAVlQDQAJ
I'm not sure how closely related, in general sounds like SIMD is sensitive.
So thinking mostly about porting the simd implementation.
I suggest that the various SIMD helperfiles (helperavx.h, helperavx2.h, helperfma4.h, helperneon.h, helperqpx.h, helpersse2.h) be kept as is in C for now, with a julia wrapper.
They deal with all the different processor specific intrinsics for targeting different architectures, and that is a lot easier with Makefiles and C processor stuff than in Julia.
And further to that, it seems a lot easier, to
ccall
the helpers, than to writeccalls
for every different archecture's SIMD intrinsics ourselves.Later, perhaps they can can be reimplemented with something like SIMD.jl, SIMDVectors.jl, or
llvm_call
s or with enhancement's to julia's Base/codegen.And thus the focus would be in porting: the main C files, for Float64: sleefsimddp.c and for Float32: sleefsimdsp.c
Now because this actually fairly simple C code, due to it being branch free -- it is just a sequence of function calls. Its a good candidate for writing some code to assist translation (#5). (
GCC -e -d $ARCH
to is probably worth running)On the otherhand we could just wait, and see if it is put into LLVM..
If it is then it will play nicer with LLVM's constant folding and other vectorisation infrastructure.
Then it is just a matter of
llvm_call
s like in #8See also:
The text was updated successfully, but these errors were encountered: