-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pure julia exp10 function #21445
Conversation
@nanosoldier |
I haven't (yet) added any specific benchmarks for exp10 and base benchmarks doesn't have any benchmarks that include exp10 ... |
base/special/exp10.jl
Outdated
# log2(10) | ||
const LOG210 = 3.321928094887362347870319429489390175864831393024580612054756395815934776608624 | ||
# log10(2) | ||
const LOG102 = 3.010299956639811952137388947244930267681898814621085413104274611271081892744238e-01 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe have underscores between the numbers, i.e. LOG2_10
and LOG10_2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These (and the horner macro) seem like suspiciously many digits, when they will have to get truncated internally when they are read in (e.g. this one will become 0.3010299956639812). Is it typical to over-state the accuracy in this way?
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels |
e16f431
to
d425f39
Compare
Looks good in general. Do you have any accuracy results? I tested the |
Please see and in the original post test range -10:0.00021:10, -35:0.00023:1000, -300:0.001:300
Accuracy tests for Float64
exp10 : max 0.91517196 at x = 275.61822 : mean 0.10907410
Base.exp10 : max 0.83238916 at x = 269.27436 : mean 0.11107440
Accuracy tests for Float32
exp10 : max 0.87275678 at x = -3.13534 : mean 0.02482310
Base.exp10 : max 0.82333934 at x = -1.05200 : mean 0.02529399
Unlike the exp function, I can find small number for float32 where it's not 1ulp. So we can either get rid of the small Taylor branch or keep it. |
As I said above, I think the main issue is simply that the threshold is too high: the next term in the Taylor series is
i.e. 0.66 ulps if If you choose the threshold to be half the current value, then you should be okay. |
base/special/exp10.jl
Outdated
5.2789829671382904052734375f-2) | ||
|
||
@eval exp10_small_thres(::Type{Float64}) = $(2.0^-29) | ||
@eval exp10_small_thres(::Type{Float32}) = $(2.0f0^-13) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing this to be 2.0f0^-14
gave a max ulp error of 0.9558756425976753
for all Float32
values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure about this? How are you testing over all float32, can you post the script. For example I'm finding many numbers greater than 1 ulp. The max float32 ulp I have found is at least 1.158, yet where these occur are larger than the small threshold value, even the value you post (-0.000122065285f0) is larger than the threshold value
Here is my test script to double check
function countulp(::Type{T}, x::AbstractFloat, y::AbstractFloat) where T
isnan(x) && isnan(y) && return zero(T)
isnan(x) || isnan(y) && return T(1000)
if isinf(T(x)) && isinf(T(y))
return signbit(x) == signbit(y) ? zero(T) : T(1001)
end
if x == zero(x) && y == zero(y)
return signbit(x) == signbit(y) ? zero(T) : T(1002)
end
if isfinite(x) && isfinite(y)
return T(abs(x - y)/eps(T(y)))
end
return T(1003)
end
start32 = 0x00000000
end32 = 0xffffffff
function test_acc(f1,f2,::Type{T}, suint::U, euint::U, spacing::Integer=1000) where T where U<:Unsigned
spacing_uint = convert(typeof(suint), spacing)
vuint = suint
x = reinterpret(T, vuint)
ulp = zero(T)
while vuint < euint
v1 =f1(x)
v2 = f2(big(x))
ulp_ = countulp(T,v1,v2)
ulp = max(ulp,ulp_)
if ulp_ > T(1.0)
@show x,ulp
end
vuint += spacing_uint
x = reinterpret(T, vuint)
end
ulp
end
@time ulp,x = test_acc(Amal.exp10, Base.exp10, Float32, start32, end32)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked against Float64
: I used the following:
function check32()
maxulps = 0.0
xmax = 0f0
for u in typemin(UInt32):typemax(UInt32)
x = reinterpret(Float32, u)
isfinite(x) || continue
y32 = exp10(x)
y64 = exp10(Float64(x))
if !isfinite(Float32(y64))
continue
end
if !isfinite(y32)
error("Non-finite value at $x")
end
d = abs(y32-y64)/eps(Float32(y64))
if d > maxulps
maxulps = d
xmax = x
end
end
maxulps,xmax
end
which gives:
julia> check32()
(0.9558756425976753, -3.7376504f0)
updated, can someone trigger nanosoldier? thanks |
9d3a462
to
ef06671
Compare
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels |
benchmarks look good |
The manually written-out coefficients look considerably longer than they need to be, as a Float64 aren't many of the trailing digits not accomplishing anything? |
the constants are written out to higher precision than needed since they can be useful for implementing higher precision function if/when float128 support is added in julia. I'd rather not change that and it is harmless anyways. |
As a reminder this should not be squashed when merged |
Does the first commit pass on its own? |
Yes. The second commit fixes the fast_math variants to actually call the pure julia exp and exp10 functions |
@simonbyrne go ahead and merge if you approve of this - though it's not the end of the world if we hold off until branching on this one |
More data points, I tested 2^30 \approx 1 billion random float64 values (with the help of a server and julia sharedarrays) the max error in ulp I found is 1.2623 at x = -210.2442424463067 |
This is great. I do think it is best to hold off until branching, which I suspect is this week. |
base/special/exp10.jl
Outdated
x < EXP10_MIN(T) && return T(0.0) | ||
end | ||
# compute approximation | ||
if xa > reinterpret(Unsigned, T(0.5)*T(LOG10_2)) # |r| > 0.5 log10(2). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe change the comment to be |x|
instead of |r|
I've noticed that the accuracy can be quite a bit lower on machines without native FMA (or those using the prebuilt images). On such machines I see a lot of values with > 1 ulp error. It is sometimes possible to avoid this by rearranging the last few operations. However I don't have time to look at it at the moment, so I think we should merge this and do some more experiments before the next release. |
Please do open an issue about investigating the non-native FMA thing since otherwise we'll forget. |
Yes without fma it is very difficult and perhaps not possible (unless you use double/double floats which is kinda not desirable or unless u use a table). If I do play around with this, I recall trying the same tricks as in exp by construction a rational but I don't think that worked very well at all (I don't even have a copy of this test; must have deleted probably because the accuracy wasn't good enough). |
For future reference, one such value was |
Did u see my post which was testing hardware fma
|
Ah, I forgot about that. Good point (for the record, I only see that without FMA, with FMA is ~0.74 ulps) |
That was with FMA testing against BigFloats which is very expensive. I had to use a hefty server and the computation ran for at least 1 hour. |
Would you mind posting the script? |
My script was here: |
thoughts? |
BTW I think it's fine for now that this function is probably at most 1.5 ulps for Float64 (probably closer to 1.3ulp...) at least for fma setups. Not many libms even have a exp10 implementation, e.g intel's math libs don't.... I don't think you can get much better with a polynomial in this case. |
base/special/exp.jl
Outdated
MAXEXP(::Type{Float64}) = 7.09782712893383996732e2 # log 2^1023*(2-2^-52) | ||
MAXEXP(::Type{Float32}) = 88.72283905206835f0 # log 2^127 *(2-2^-23) | ||
# max and min arguments | ||
EXP_MAX(::Type{Float64}) = 7.09782712893383996732e2 # log 2^1023*(2-2^-52) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bikeshedding but do people prefer
EXP_MAX
MAX_EXP
or the no underscore variants of the above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MAX_EXP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok Thanks I have changed it
We are post 0.6, would it be possible to decide whether this is wanted or not? |
Planning on merging this tomorrow unless I hear any objections |
Thanks for another cool PR, showing what Julia is capable of @musm |
wow nice So this function took me a long time to write. I'm still slightly disappointed I couldn't get <1 ulp on the Float64 version on non-fma setups, but I don't think it's a big deal. Not sure on @pkofod plans but since he is working on the libm this summer, I'll say that it is possible to get 0.5 ulp accuracy and (probably) beat the speed of the polynomial version at least on scalars via a table method, if that is something he is interested to pursue. (I looked at this but don't have anything so polished to be a PR and can't promise I will spend more time on it) For reference: worst cases
edit: clarified accuracy |
Need
to add tests here and possibly benchmarksJuliaCI/BaseBenchmarks.jl#73.It is expected that base will be slightly more accurate since it is just calling
^
function