Add pure julia exp10 function #21445

musm · 2017-04-19T19:09:33Z

Need ~~to add tests here and possibly benchmarks~~ JuliaCI/BaseBenchmarks.jl#73.

julia> @benchmark Base.exp10(5.0)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     105.795 ns (0.00% GC)
  median time:      155.140 ns (0.00% GC)
  mean time:        170.898 ns (0.00% GC)
  maximum time:     4.148 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> @benchmark My.exp10(5.0)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     13.816 ns (0.00% GC)
  median time:      15.395 ns (0.00% GC)
  mean time:        16.895 ns (0.00% GC)
  maximum time:     252.250 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000


test range -10:0.00021:10, -35:0.00023:1000, -300:0.001:300

Accuracy tests for Float64
     exp10          : max 0.91517196 at x = 275.61822                : mean 0.10907410
Base.exp10          : max 0.83238916 at x = 269.27436                : mean 0.11107440

Accuracy tests for Float32
     exp10          : max 0.87275678 at x = -3.13534                   : mean 0.02482310
Base.exp10          : max 0.82333934 at x = -1.05200                   : mean 0.02529399

It is expected that base will be slightly more accurate since it is just calling ^ function

ararslan · 2017-04-19T19:59:04Z

@nanosoldier runbenchmarks(ALL, vs=":master")

musm · 2017-04-19T20:04:18Z

I haven't (yet) added any specific benchmarks for exp10 and base benchmarks doesn't have any benchmarks that include exp10 ...

StefanKarpinski · 2017-04-19T20:30:43Z

base/special/exp10.jl

+# log2(10)
+const LOG210 = 3.321928094887362347870319429489390175864831393024580612054756395815934776608624
+# log10(2)
+const LOG102 = 3.010299956639811952137388947244930267681898814621085413104274611271081892744238e-01


Maybe have underscores between the numbers, i.e. LOG2_10 and LOG10_2?

These (and the horner macro) seem like suspiciously many digits, when they will have to get truncated internally when they are read in (e.g. this one will become 0.3010299956639812). Is it typical to over-state the accuracy in this way?

nanosoldier · 2017-04-19T23:10:38Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

simonbyrne · 2017-04-20T10:20:39Z

Looks good in general.

Do you have any accuracy results? I tested the Float32s and I noticed that some gave an error > 1ulp, e.g. -0.000122065285f0. The cause in that case seemed to be the branch to the Taylor approximation. exp10_small_thres may need to be a bit smaller.

musm · 2017-04-20T20:13:37Z

Please see
https://github.com/JuliaLang/julia/pull/21445/files#diff-2b4b439b810120cb21b0b4eed48a360aR11
which is indeed <1ulp, but outside this range it may be slightly larger than 1ulp, I think I have observed numbers 1.1 or 1.2 ulp at most in the past but haven't hit one yet...

and in the original post

test range -10:0.00021:10, -35:0.00023:1000, -300:0.001:300

Accuracy tests for Float64
     exp10          : max 0.91517196 at x = 275.61822                : mean 0.10907410
Base.exp10          : max 0.83238916 at x = 269.27436                : mean 0.11107440

Accuracy tests for Float32
     exp10          : max 0.87275678 at x = -3.13534                   : mean 0.02482310
Base.exp10          : max 0.82333934 at x = -1.05200                   : mean 0.02529399

Do you have any accuracy results? I tested the Float32s and I noticed that some gave an error > 1ulp, e.g. -0.000122065285f0. The cause in that case seemed to be the branch to the Taylor approximation. exp10_small_thres may need to be a bit smaller.

Unlike the exp function, I can find small number for float32 where it's not 1ulp. So we can either get rid of the small Taylor branch or keep it.

simonbyrne · 2017-04-21T08:31:23Z

As I said above, I think the main issue is simply that the threshold is too high: the next term in the Taylor series is (log(10)*x)^2/2, and so if you choose a value at the threshold, then:

julia> x = Base.Math.exp10_small_thres(Float32)
0.00012207031f0

julia> (x*log(10f0))^2/2 / eps(0.9f0)
0.6627373f0

i.e. 0.66 ulps if x is negative (so exp10(x) < 1). Combined the potential error in evaluating 1 + x*log(10f0) (which due to the size of the other arguments, is dominated by the +, so about ~0.5 ulps), and that will be enough to push you over the 1 ulp.

If you choose the threshold to be half the current value, then you should be okay.

simonbyrne · 2017-04-21T08:43:39Z

base/special/exp10.jl

+    5.2789829671382904052734375f-2)
+
+@eval exp10_small_thres(::Type{Float64}) = $(2.0^-29)
+@eval exp10_small_thres(::Type{Float32}) = $(2.0f0^-13)


Changing this to be 2.0f0^-14 gave a max ulp error of 0.9558756425976753 for all Float32 values.

Are you sure about this? How are you testing over all float32, can you post the script. For example I'm finding many numbers greater than 1 ulp. The max float32 ulp I have found is at least 1.158, yet where these occur are larger than the small threshold value, even the value you post (-0.000122065285f0) is larger than the threshold value

Here is my test script to double check

function countulp(::Type{T}, x::AbstractFloat, y::AbstractFloat) where T isnan(x) && isnan(y) && return zero(T) isnan(x) || isnan(y) && return T(1000) if isinf(T(x)) && isinf(T(y)) return signbit(x) == signbit(y) ? zero(T) : T(1001) end if x == zero(x) && y == zero(y) return signbit(x) == signbit(y) ? zero(T) : T(1002) end if isfinite(x) && isfinite(y) return T(abs(x - y)/eps(T(y))) end return T(1003) end start32 = 0x00000000 end32 = 0xffffffff function test_acc(f1,f2,::Type{T}, suint::U, euint::U, spacing::Integer=1000) where T where U<:Unsigned spacing_uint = convert(typeof(suint), spacing) vuint = suint x = reinterpret(T, vuint) ulp = zero(T) while vuint < euint v1 =f1(x) v2 = f2(big(x)) ulp_ = countulp(T,v1,v2) ulp = max(ulp,ulp_) if ulp_ > T(1.0) @show x,ulp end vuint += spacing_uint x = reinterpret(T, vuint) end ulp end @time ulp,x = test_acc(Amal.exp10, Base.exp10, Float32, start32, end32)

I just checked against Float64: I used the following:

function check32() maxulps = 0.0 xmax = 0f0 for u in typemin(UInt32):typemax(UInt32) x = reinterpret(Float32, u) isfinite(x) || continue y32 = exp10(x) y64 = exp10(Float64(x)) if !isfinite(Float32(y64)) continue end if !isfinite(y32) error("Non-finite value at $x") end d = abs(y32-y64)/eps(Float32(y64)) if d > maxulps maxulps = d xmax = x end end maxulps,xmax end

which gives:

julia> check32() (0.9558756425976753, -3.7376504f0)

musm · 2017-04-28T04:10:53Z

updated, can someone trigger nanosoldier? thanks

tkelman · 2017-04-28T10:37:27Z

@nanosoldier runbenchmarks(ALL, vs=":master")

nanosoldier · 2017-04-28T13:40:53Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

musm · 2017-04-28T14:14:20Z

benchmarks look good

tkelman · 2017-04-28T14:17:20Z

The manually written-out coefficients look considerably longer than they need to be, as a Float64 aren't many of the trailing digits not accomplishing anything?

musm · 2017-04-28T14:18:55Z

the constants are written out to higher precision than needed since they can be useful for implementing higher precision function if/when float128 support is added in julia. I'd rather not change that and it is harmless anyways.

musm · 2017-04-28T14:31:26Z

As a reminder this should not be squashed when merged

tkelman · 2017-04-28T14:45:27Z

Does the first commit pass on its own?

musm · 2017-04-28T14:46:35Z

Yes. The second commit fixes the fast_math variants to actually call the pure julia exp and exp10 functions

tkelman · 2017-04-29T09:05:49Z

@simonbyrne go ahead and merge if you approve of this - though it's not the end of the world if we hold off until branching on this one

musm · 2017-05-01T07:09:13Z

More data points, I tested 2^30 \approx 1 billion random float64 values (with the help of a server and julia sharedarrays) the max error in ulp I found is 1.2623 at x = -210.2442424463067
and 0.00893% of the random values had an ulp greater than 1.0

ViralBShah · 2017-05-02T01:43:39Z

This is great. I do think it is best to hold off until branching, which I suspect is this week.

simonbyrne · 2017-05-05T09:22:37Z

base/special/exp10.jl

+        x < EXP10_MIN(T) && return T(0.0)
+    end
+    # compute approximation
+    if xa > reinterpret(Unsigned, T(0.5)*T(LOG10_2)) # |r| > 0.5 log10(2).


maybe change the comment to be |x| instead of |r|

simonbyrne · 2017-05-05T09:36:45Z

I've noticed that the accuracy can be quite a bit lower on machines without native FMA (or those using the prebuilt images). On such machines I see a lot of values with > 1 ulp error.

It is sometimes possible to avoid this by rearranging the last few operations. However I don't have time to look at it at the moment, so I think we should merge this and do some more experiments before the next release.

StefanKarpinski · 2017-05-05T14:12:55Z

Please do open an issue about investigating the non-native FMA thing since otherwise we'll forget.

musm · 2017-05-05T14:23:59Z

Yes without fma it is very difficult and perhaps not possible (unless you use double/double floats which is kinda not desirable or unless u use a table). If I do play around with this, I recall trying the same tricks as in exp by construction a rational but I don't think that worked very well at all (I don't even have a copy of this test; must have deleted probably because the accuracy wasn't good enough).

simonbyrne · 2017-05-05T14:39:00Z

For future reference, one such value was x = -1.6515175124292971, which gives 1.06 ulps error.

musm · 2017-05-05T14:44:49Z

Did u see my post which was testing hardware fma

More data points, I tested 2^30 \approx 1 billion random float64 values (with the help of a server and julia sharedarrays) the max error in ulp I found is 1.2623 at x = -210.2442424463067
and 0.00893% of the random values had an ulp greater than 1.0

simonbyrne · 2017-05-05T14:53:03Z

Ah, I forgot about that. Good point (for the record, I only see that without FMA, with FMA is ~0.74 ulps)

musm · 2017-05-05T14:54:54Z

That was with FMA testing against BigFloats which is very expensive. I had to use a hefty server and the computation ran for at least 1 hour.

pkofod · 2017-05-05T15:13:27Z

Would you mind posting the script?

simonbyrne · 2017-05-05T15:14:47Z

My script was here:
https://gist.github.com/simonbyrne/6dcfac9488a44f1fb2a64b82c01e88cb

musm · 2017-05-10T16:30:13Z

thoughts?

musm · 2017-05-10T20:48:11Z

BTW I think it's fine for now that this function is probably at most 1.5 ulps for Float64 (probably closer to 1.3ulp...) at least for fma setups. Not many libms even have a exp10 implementation, e.g intel's math libs don't....

I don't think you can get much better with a polynomial in this case.

musm · 2017-05-10T20:48:54Z

base/special/exp.jl

-MAXEXP(::Type{Float64}) = 7.09782712893383996732e2 # log 2^1023*(2-2^-52)
-MAXEXP(::Type{Float32}) = 88.72283905206835f0      # log 2^127 *(2-2^-23)
+# max and min arguments
+EXP_MAX(::Type{Float64}) = 7.09782712893383996732e2 # log 2^1023*(2-2^-52)


bikeshedding but do people prefer
EXP_MAX
MAX_EXP
or the no underscore variants of the above

Ok Thanks I have changed it

musm · 2017-05-12T17:00:08Z

We are post 0.6, would it be possible to decide whether this is wanted or not?

KristofferC · 2017-06-26T02:08:24Z

Planning on merging this tomorrow unless I hear any objections

KristofferC · 2017-06-27T03:51:20Z

Thanks for another cool PR, showing what Julia is capable of @musm

musm · 2017-06-27T04:08:03Z

wow nice

So this function took me a long time to write. I'm still slightly disappointed I couldn't get <1 ulp on the Float64 version on non-fma setups, but I don't think it's a big deal. Not sure on @pkofod plans but since he is working on the libm this summer, I'll say that it is possible to get 0.5 ulp accuracy and (probably) beat the speed of the polynomial version at least on scalars via a table method, if that is something he is interested to pursue. (I looked at this but don't have anything so polished to be a PR and can't promise I will spend more time on it)

For reference: worst cases

non-fma Float64 : 1.2623 ulp at x = -210.2442424463067 (random search of 1 billion numbers)
fma Float64 : 0.9839 ulp at x = 206.98334259614455 (random search of 3 billion numbers)

edit: clarified accuracy

ararslan added the maths Mathematical functions label Apr 19, 2017

ararslan requested a review from simonbyrne April 19, 2017 19:47

StefanKarpinski reviewed Apr 19, 2017

View reviewed changes

musm force-pushed the exp10 branch 2 times, most recently from e16f431 to d425f39 Compare April 20, 2017 03:47

simonbyrne reviewed Apr 21, 2017

View reviewed changes

musm force-pushed the exp10 branch from d425f39 to 5b96424 Compare April 28, 2017 04:10

musm force-pushed the exp10 branch 3 times, most recently from 9d3a462 to ef06671 Compare April 28, 2017 05:00

musm force-pushed the exp10 branch from 2634267 to 5447aea Compare April 28, 2017 14:31

musm force-pushed the exp10 branch from 5447aea to 4c32bb6 Compare April 29, 2017 05:52

musm force-pushed the exp10 branch from 4c32bb6 to 768e081 Compare April 29, 2017 23:30

waldyrious mentioned this pull request May 4, 2017

Future of openlibm in julia #18102

Closed

simonbyrne reviewed May 5, 2017

View reviewed changes

musm force-pushed the exp10 branch from 768e081 to 52635fd Compare May 5, 2017 15:17

musm commented May 10, 2017

View reviewed changes

musm force-pushed the exp10 branch from 52635fd to 5304e70 Compare May 12, 2017 05:29

musm added 2 commits May 12, 2017 01:33

Add pure julia exp10 function

b72b872

Fix fastmath for exp10 and exp

e6f602d

musm force-pushed the exp10 branch from 5304e70 to e6f602d Compare May 12, 2017 05:34

KristofferC merged commit ed1105e into JuliaLang:master Jun 27, 2017

musm deleted the exp10 branch June 27, 2017 04:08

Add pure julia exp10 function #21445

Add pure julia exp10 function #21445

Conversation

musm commented Apr 19, 2017 • edited Loading

ararslan commented Apr 19, 2017

musm commented Apr 19, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nanosoldier commented Apr 19, 2017

simonbyrne commented Apr 20, 2017 • edited Loading

musm commented Apr 20, 2017

simonbyrne commented Apr 21, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

musm commented Apr 28, 2017

tkelman commented Apr 28, 2017

nanosoldier commented Apr 28, 2017

musm commented Apr 28, 2017

tkelman commented Apr 28, 2017

musm commented Apr 28, 2017

musm commented Apr 28, 2017

tkelman commented Apr 28, 2017

musm commented Apr 28, 2017

tkelman commented Apr 29, 2017

musm commented May 1, 2017

ViralBShah commented May 2, 2017

Choose a reason for hiding this comment

simonbyrne commented May 5, 2017

StefanKarpinski commented May 5, 2017

musm commented May 5, 2017 • edited Loading

simonbyrne commented May 5, 2017

musm commented May 5, 2017

simonbyrne commented May 5, 2017

musm commented May 5, 2017

pkofod commented May 5, 2017

simonbyrne commented May 5, 2017

musm commented May 10, 2017

musm commented May 10, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

musm commented May 12, 2017

KristofferC commented Jun 26, 2017

KristofferC commented Jun 27, 2017

musm commented Jun 27, 2017 • edited Loading

musm commented Apr 19, 2017 •

edited

Loading

simonbyrne commented Apr 20, 2017 •

edited

Loading

simonbyrne commented Apr 21, 2017 •

edited

Loading

musm commented May 5, 2017 •

edited

Loading

musm commented May 10, 2017 •

edited

Loading

musm commented Jun 27, 2017 •

edited

Loading