Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pure julia exp10 function #21445

Merged
merged 2 commits into from
Jun 27, 2017
Merged

Add pure julia exp10 function #21445

merged 2 commits into from
Jun 27, 2017

Conversation

musm
Copy link
Contributor

@musm musm commented Apr 19, 2017

Need to add tests here and possibly benchmarks JuliaCI/BaseBenchmarks.jl#73.

julia> @benchmark Base.exp10(5.0)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     105.795 ns (0.00% GC)
  median time:      155.140 ns (0.00% GC)
  mean time:        170.898 ns (0.00% GC)
  maximum time:     4.148 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> @benchmark My.exp10(5.0)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     13.816 ns (0.00% GC)
  median time:      15.395 ns (0.00% GC)
  mean time:        16.895 ns (0.00% GC)
  maximum time:     252.250 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000


test range -10:0.00021:10, -35:0.00023:1000, -300:0.001:300

Accuracy tests for Float64
     exp10          : max 0.91517196 at x = 275.61822                : mean 0.10907410
Base.exp10          : max 0.83238916 at x = 269.27436                : mean 0.11107440

Accuracy tests for Float32
     exp10          : max 0.87275678 at x = -3.13534                   : mean 0.02482310
Base.exp10          : max 0.82333934 at x = -1.05200                   : mean 0.02529399

It is expected that base will be slightly more accurate since it is just calling ^ function

@ararslan ararslan added the maths Mathematical functions label Apr 19, 2017
@ararslan ararslan requested a review from simonbyrne April 19, 2017 19:47
@ararslan
Copy link
Member

@nanosoldier runbenchmarks(ALL, vs=":master")

@musm
Copy link
Contributor Author

musm commented Apr 19, 2017

I haven't (yet) added any specific benchmarks for exp10 and base benchmarks doesn't have any benchmarks that include exp10 ...

# log2(10)
const LOG210 = 3.321928094887362347870319429489390175864831393024580612054756395815934776608624
# log10(2)
const LOG102 = 3.010299956639811952137388947244930267681898814621085413104274611271081892744238e-01
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe have underscores between the numbers, i.e. LOG2_10 and LOG10_2?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These (and the horner macro) seem like suspiciously many digits, when they will have to get truncated internally when they are read in (e.g. this one will become 0.3010299956639812). Is it typical to over-state the accuracy in this way?

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@musm musm force-pushed the exp10 branch 2 times, most recently from e16f431 to d425f39 Compare April 20, 2017 03:47
@simonbyrne
Copy link
Contributor

simonbyrne commented Apr 20, 2017

Looks good in general.

Do you have any accuracy results? I tested the Float32s and I noticed that some gave an error > 1ulp, e.g. -0.000122065285f0. The cause in that case seemed to be the branch to the Taylor approximation. exp10_small_thres may need to be a bit smaller.

@musm
Copy link
Contributor Author

musm commented Apr 20, 2017

Please see
https://github.com/JuliaLang/julia/pull/21445/files#diff-2b4b439b810120cb21b0b4eed48a360aR11
which is indeed <1ulp, but outside this range it may be slightly larger than 1ulp, I think I have observed numbers 1.1 or 1.2 ulp at most in the past but haven't hit one yet...

and in the original post

test range -10:0.00021:10, -35:0.00023:1000, -300:0.001:300

Accuracy tests for Float64
     exp10          : max 0.91517196 at x = 275.61822                : mean 0.10907410
Base.exp10          : max 0.83238916 at x = 269.27436                : mean 0.11107440

Accuracy tests for Float32
     exp10          : max 0.87275678 at x = -3.13534                   : mean 0.02482310
Base.exp10          : max 0.82333934 at x = -1.05200                   : mean 0.02529399

Do you have any accuracy results? I tested the Float32s and I noticed that some gave an error > 1ulp, e.g. -0.000122065285f0. The cause in that case seemed to be the branch to the Taylor approximation. exp10_small_thres may need to be a bit smaller.

Unlike the exp function, I can find small number for float32 where it's not 1ulp. So we can either get rid of the small Taylor branch or keep it.

@simonbyrne
Copy link
Contributor

simonbyrne commented Apr 21, 2017

As I said above, I think the main issue is simply that the threshold is too high: the next term in the Taylor series is (log(10)*x)^2/2, and so if you choose a value at the threshold, then:

julia> x = Base.Math.exp10_small_thres(Float32)
0.00012207031f0

julia> (x*log(10f0))^2/2 / eps(0.9f0)
0.6627373f0

i.e. 0.66 ulps if x is negative (so exp10(x) < 1). Combined the potential error in evaluating 1 + x*log(10f0) (which due to the size of the other arguments, is dominated by the +, so about ~0.5 ulps), and that will be enough to push you over the 1 ulp.

If you choose the threshold to be half the current value, then you should be okay.

5.2789829671382904052734375f-2)

@eval exp10_small_thres(::Type{Float64}) = $(2.0^-29)
@eval exp10_small_thres(::Type{Float32}) = $(2.0f0^-13)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this to be 2.0f0^-14 gave a max ulp error of 0.9558756425976753 for all Float32 values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure about this? How are you testing over all float32, can you post the script. For example I'm finding many numbers greater than 1 ulp. The max float32 ulp I have found is at least 1.158, yet where these occur are larger than the small threshold value, even the value you post (-0.000122065285f0) is larger than the threshold value

Here is my test script to double check

function countulp(::Type{T}, x::AbstractFloat, y::AbstractFloat) where T
    isnan(x) && isnan(y) && return zero(T)
    isnan(x) || isnan(y) && return T(1000)
    if isinf(T(x)) && isinf(T(y))
        return signbit(x) == signbit(y) ? zero(T) : T(1001)
    end
    if x == zero(x) && y == zero(y)
        return signbit(x) == signbit(y) ? zero(T) : T(1002)
    end
    if isfinite(x) && isfinite(y)
        return T(abs(x - y)/eps(T(y)))
    end
    return T(1003)
end

start32 = 0x00000000
end32 = 0xffffffff

function test_acc(f1,f2,::Type{T}, suint::U, euint::U, spacing::Integer=1000) where T where U<:Unsigned
    spacing_uint = convert(typeof(suint), spacing)
    vuint = suint
    x = reinterpret(T, vuint)
    ulp = zero(T)
    while vuint < euint
        v1 =f1(x)
        v2 = f2(big(x))
        ulp_ = countulp(T,v1,v2)
        ulp = max(ulp,ulp_)
        if ulp_ > T(1.0)
            @show x,ulp
        end
        vuint += spacing_uint
        x = reinterpret(T, vuint)
    end
    ulp
end

@time ulp,x = test_acc(Amal.exp10, Base.exp10, Float32, start32, end32)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked against Float64: I used the following:

function check32()
    maxulps = 0.0
    xmax    = 0f0
    for u in typemin(UInt32):typemax(UInt32)
        x = reinterpret(Float32, u)
        isfinite(x) || continue
        
        y32 = exp10(x)
        y64 = exp10(Float64(x))
        if !isfinite(Float32(y64))
            continue
        end
        if !isfinite(y32)
            error("Non-finite value at $x")
        end
        d = abs(y32-y64)/eps(Float32(y64))
        if d > maxulps
            maxulps = d
            xmax = x
        end
    end
    maxulps,xmax
end

which gives:

julia> check32()
(0.9558756425976753, -3.7376504f0)

@musm
Copy link
Contributor Author

musm commented Apr 28, 2017

updated, can someone trigger nanosoldier? thanks

@musm musm force-pushed the exp10 branch 3 times, most recently from 9d3a462 to ef06671 Compare April 28, 2017 05:00
@tkelman
Copy link
Contributor

tkelman commented Apr 28, 2017

@nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@musm
Copy link
Contributor Author

musm commented Apr 28, 2017

benchmarks look good

@tkelman
Copy link
Contributor

tkelman commented Apr 28, 2017

The manually written-out coefficients look considerably longer than they need to be, as a Float64 aren't many of the trailing digits not accomplishing anything?

@musm
Copy link
Contributor Author

musm commented Apr 28, 2017

the constants are written out to higher precision than needed since they can be useful for implementing higher precision function if/when float128 support is added in julia. I'd rather not change that and it is harmless anyways.

@musm
Copy link
Contributor Author

musm commented Apr 28, 2017

As a reminder this should not be squashed when merged

@tkelman
Copy link
Contributor

tkelman commented Apr 28, 2017

Does the first commit pass on its own?

@musm
Copy link
Contributor Author

musm commented Apr 28, 2017

Yes. The second commit fixes the fast_math variants to actually call the pure julia exp and exp10 functions

@tkelman
Copy link
Contributor

tkelman commented Apr 29, 2017

@simonbyrne go ahead and merge if you approve of this - though it's not the end of the world if we hold off until branching on this one

@musm
Copy link
Contributor Author

musm commented May 1, 2017

More data points, I tested 2^30 \approx 1 billion random float64 values (with the help of a server and julia sharedarrays) the max error in ulp I found is 1.2623 at x = -210.2442424463067
and 0.00893% of the random values had an ulp greater than 1.0

@ViralBShah
Copy link
Member

This is great. I do think it is best to hold off until branching, which I suspect is this week.

x < EXP10_MIN(T) && return T(0.0)
end
# compute approximation
if xa > reinterpret(Unsigned, T(0.5)*T(LOG10_2)) # |r| > 0.5 log10(2).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe change the comment to be |x| instead of |r|

@simonbyrne
Copy link
Contributor

I've noticed that the accuracy can be quite a bit lower on machines without native FMA (or those using the prebuilt images). On such machines I see a lot of values with > 1 ulp error.

It is sometimes possible to avoid this by rearranging the last few operations. However I don't have time to look at it at the moment, so I think we should merge this and do some more experiments before the next release.

@StefanKarpinski
Copy link
Member

Please do open an issue about investigating the non-native FMA thing since otherwise we'll forget.

@musm
Copy link
Contributor Author

musm commented May 5, 2017

Yes without fma it is very difficult and perhaps not possible (unless you use double/double floats which is kinda not desirable or unless u use a table). If I do play around with this, I recall trying the same tricks as in exp by construction a rational but I don't think that worked very well at all (I don't even have a copy of this test; must have deleted probably because the accuracy wasn't good enough).

@simonbyrne
Copy link
Contributor

For future reference, one such value was x = -1.6515175124292971, which gives 1.06 ulps error.

@musm
Copy link
Contributor Author

musm commented May 5, 2017

Did u see my post which was testing hardware fma

More data points, I tested 2^30 \approx 1 billion random float64 values (with the help of a server and julia sharedarrays) the max error in ulp I found is 1.2623 at x = -210.2442424463067
and 0.00893% of the random values had an ulp greater than 1.0

@simonbyrne
Copy link
Contributor

Ah, I forgot about that. Good point (for the record, I only see that without FMA, with FMA is ~0.74 ulps)

@musm
Copy link
Contributor Author

musm commented May 5, 2017

That was with FMA testing against BigFloats which is very expensive. I had to use a hefty server and the computation ran for at least 1 hour.

@pkofod
Copy link
Contributor

pkofod commented May 5, 2017

Would you mind posting the script?

@simonbyrne
Copy link
Contributor

@musm
Copy link
Contributor Author

musm commented May 10, 2017

thoughts?

@musm
Copy link
Contributor Author

musm commented May 10, 2017

BTW I think it's fine for now that this function is probably at most 1.5 ulps for Float64 (probably closer to 1.3ulp...) at least for fma setups. Not many libms even have a exp10 implementation, e.g intel's math libs don't....

I don't think you can get much better with a polynomial in this case.

MAXEXP(::Type{Float64}) = 7.09782712893383996732e2 # log 2^1023*(2-2^-52)
MAXEXP(::Type{Float32}) = 88.72283905206835f0 # log 2^127 *(2-2^-23)
# max and min arguments
EXP_MAX(::Type{Float64}) = 7.09782712893383996732e2 # log 2^1023*(2-2^-52)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bikeshedding but do people prefer
EXP_MAX
MAX_EXP
or the no underscore variants of the above

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAX_EXP

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok Thanks I have changed it

@musm
Copy link
Contributor Author

musm commented May 12, 2017

We are post 0.6, would it be possible to decide whether this is wanted or not?

@KristofferC
Copy link
Member

Planning on merging this tomorrow unless I hear any objections

@KristofferC KristofferC merged commit ed1105e into JuliaLang:master Jun 27, 2017
@KristofferC
Copy link
Member

Thanks for another cool PR, showing what Julia is capable of @musm

@musm
Copy link
Contributor Author

musm commented Jun 27, 2017

wow nice

So this function took me a long time to write. I'm still slightly disappointed I couldn't get <1 ulp on the Float64 version on non-fma setups, but I don't think it's a big deal. Not sure on @pkofod plans but since he is working on the libm this summer, I'll say that it is possible to get 0.5 ulp accuracy and (probably) beat the speed of the polynomial version at least on scalars via a table method, if that is something he is interested to pursue. (I looked at this but don't have anything so polished to be a PR and can't promise I will spend more time on it)

For reference: worst cases

  • non-fma Float64 : 1.2623 ulp at x = -210.2442424463067 (random search of 1 billion numbers)

  • fma Float64 : 0.9839 ulp at x = 206.98334259614455 (random search of 3 billion numbers)

edit: clarified accuracy

@musm musm deleted the exp10 branch June 27, 2017 04:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maths Mathematical functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants