Returning a tuple will affect performance #330

msekino · 2021-06-30T13:46:42Z

My application calculates a very large number of logbeta.
This resulted in periodic memory exhaustion and full GC as in the attached image.
I found that this was because logabsbeta returns a tuple.
The following is a survey.

First, I calculated the sum of logbeta using broadcast.

using SpecialFunctions
using BenchmarkTools

a = 1000rand(10000000)
b = 1000rand(10000000)

function testlogbeta(a, b)
    sum(logbeta.(a, b))
end

@btime testlogbeta(a, b)
> 1.105 s (20000011 allocations: 534.06 MiB)

A large number of allocations occurred.
I suspected that this was due to the use of broadcasts.
So I tried multi-threading without using broadcast.

using Base.Threads

function allocateindexrange(N, ithread)::UnitRange{Int}
    nperthread = ceil(N / nthreads()) |> Int
    from = (ithread - 1) * nperthread + 1
    to = min(ithread * nperthread, N)
    from:to
end

function testlogbeta2(a, b)
    sumlb = 0.0
    slock = SpinLock()
    @threads for ithread in 1:nthreads()
        is = allocateindexrange(length(a), ithread)
        lb = sumlogbeta(is, a, b)
        lock(slock) do
            sumlb += lb
        end
    end
    sumlb
end

function sumlogbeta(is, a, b)
    sumlb = 0.0
    for i in is
        sumlb += logbeta(a[i], b[i])
    end
    sumlb
end

@btime testlogbeta2(a, b)
> 144.828 ms (20000460 allocations: 457.80 MiB)

It's 7.6 times faster, but a large number of allocations are still occurring.
I tried a logbeta calculation that does not involve tuples.

function logbeta_float(a::Number, b::Number)
    if a > b
        return logbeta_float(b, a)
    end

    if a <= 0 && isinteger(a)
        if a + b <= 0 && isinteger(b)
            return logbeta_float(1 - a - b, b)
        else
            return -log(zero(a))
        end
    end

    if a > 0 && b > 8
        return SpecialFunctions.loggammadiv(a, b) + SpecialFunctions.loggamma(a)
    end

    ya, _ = SpecialFunctions.logabsgamma(a)
    yb, _ = SpecialFunctions.logabsgamma(b)
    yab, _ = SpecialFunctions.logabsgamma(a + b)
    ya + yb - yab
end

function testlogbeta_float(a, b)
    sumlb = 0.0
    slock = SpinLock()
    @threads for ithread in 1:nthreads()
        is = allocateindexrange(length(a), ithread)
        lb = sumlogbeta_float(is, a, b)
        lock(slock) do
            sumlb += lb
        end
    end
    sumlb
end

function sumlogbeta_float(is, a, b)
    sumlb = 0.0
    for i in is
        sumlb += logbeta_float(a[i], b[i])
    end
    sumlb
end

@btime testlogbeta_float(a, b)
> 10.197 ms (451 allocations: 34.67 KiB)

It's 14.5 times faster than the second one, and this way was able to keep the allocation to a very small number.

Since SpecialFunctions.jl may be called very many times in an application, it would be appreciated if you could return primitive types as much as possible.

Best regards.

The text was updated successfully, but these errors were encountered:

stevengj · 2021-07-02T02:32:22Z

Tuples are generally cheap (and don't require heap allocations) in Julia, so I'm skeptical that this is the source of your problem here.

Have you checked type stability with @code_warntype?

msekino · 2021-07-02T07:37:20Z

@stevengj
I found that just

for i in 1:100
    testlogbeta2(a, b)
end

can reproduce the memory consumption and GC behavior (as shown in the attached image above).
Could you try to run it?

I did @code_warntype testlogbeta2(a, b) but could not figure out what the problem was.

msekino · 2021-07-02T12:28:29Z

I'm starting to think that maybe the behavior is specific to my environment...

stevengj · 2021-07-02T14:29:36Z

It looks like the logbeta is type-unstable — I filed a separate issue #331. That looks like the reason why you have so many allocations.

By the way, it seems like you are trying to do a parallel reduction, but doing this with a spinlock seems very suboptimal. See e.g. this discussion. You might want to use a package like ThreadsX.jl, which provides efficient multi-threaded reductions.

stevengj closed this as completed Jul 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Returning a tuple will affect performance #330

Returning a tuple will affect performance #330

msekino commented Jun 30, 2021 •

edited

Loading

stevengj commented Jul 2, 2021

msekino commented Jul 2, 2021

msekino commented Jul 2, 2021

stevengj commented Jul 2, 2021

Returning a tuple will affect performance #330

Returning a tuple will affect performance #330

Comments

msekino commented Jun 30, 2021 • edited Loading

stevengj commented Jul 2, 2021

msekino commented Jul 2, 2021

msekino commented Jul 2, 2021

stevengj commented Jul 2, 2021

msekino commented Jun 30, 2021 •

edited

Loading