Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large numbers of threads hang in @threads #32511

Closed
staticfloat opened this issue Jul 6, 2019 · 2 comments · Fixed by #32551
Closed

Large numbers of threads hang in @threads #32511

staticfloat opened this issue Jul 6, 2019 · 2 comments · Fixed by #32551
Labels
bug Indicates an unexpected problem or unintended behavior multithreading Base.Threads and related functionality

Comments

@staticfloat
Copy link
Member

This is non-deterministic (huzzah) but luckily easy enough to trigger. I think using @benchmark more reliably triggers it since it's running threaded loops many thousands of times. Here's a test script:

using BenchmarkTools, Base.Threads

function func_threaded(val, N)
    sums = [0*(1 .^ val) for thread_idx in 1:nthreads()]
    @threads for idx in 1:N
        sums[threadid()] += idx.^val
    end
    return sum(sums)
end

@benchmark func_threaded(2.0, 1<<10)

And here's a log showing the fact that it hangs approximately 2/5 of the time:

$ for idx in $(seq 1 10); do echo "Run ${idx}:"; JULIA_NUM_THREADS=8 time julia-master thread_lock.jl; done
Run 1:
^C
signal (2): Interrupt: 2
in expression starting at /Users/sabae/src/julia/thread_lock.jl:11
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0xffffffffffffffff)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0xffffffffffffffff)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0xffffffffffffffff)
kevent at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0xffffffffffffffff)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0xffffffffffffffff)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0xffffffffffffffff)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0xffffffffffffffff)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0xffffffffffffffff)
Allocations: 2855175 (Pool: 2854832; Big: 343); GC: 11
       24.81 real         5.27 user         0.34 sys
Run 2:
^C
signal (2): Interrupt: 2
in expression starting at /Users/sabae/src/julia/thread_lock.jl:11
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0xffffffffffffffff)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0xffffffffffffffff)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0xffffffffffffffff)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0xffffffffffffffff)
kevent at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0xffffffffffffffff)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0xffffffffffffffff)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0xffffffffffffffff)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0xffffffffffffffff)
Allocations: 2129862 (Pool: 2129518; Big: 344); GC: 8
       28.74 real         1.38 user         0.21 sys
Run 3:
        6.94 real        39.42 user         1.78 sys
Run 4:
        7.02 real        39.16 user         1.79 sys
Run 5:
        7.07 real        40.29 user         1.81 sys
Run 6:
^C
signal (2): Interrupt: 2
in expression starting at /Users/sabae/src/julia/thread_lock.jl:11
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0xffffffffffffffff)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0xffffffffffffffff)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0xffffffffffffffff)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0xffffffffffffffff)
kevent at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0xffffffffffffffff)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0xffffffffffffffff)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0xffffffffffffffff)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0xffffffffffffffff)
Allocations: 8821151 (Pool: 8820777; Big: 374); GC: 39
       28.69 real        36.36 user         1.67 sys
Run 7:
        6.85 real        40.36 user         1.77 sys
Run 8:
        6.82 real        40.23 user         1.76 sys
Run 9:
        6.99 real        40.07 user         1.79 sys
Run 10:
^C
signal (2): Interrupt: 2
in expression starting at /Users/sabae/src/julia/thread_lock.jl:11
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0xffffffffffffffff)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0xffffffffffffffff)
kevent at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0xffffffffffffffff)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0xffffffffffffffff)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0xffffffffffffffff)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0xffffffffffffffff)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0xffffffffffffffff)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0xffffffffffffffff)
Allocations: 3580053 (Pool: 3579711; Big: 342); GC: 14
       19.33 real        10.11 user         0.55 sys
@staticfloat staticfloat added bug Indicates an unexpected problem or unintended behavior multithreading Base.Threads and related functionality labels Jul 6, 2019
@JeffBezanson
Copy link
Member

So far I can't reproduce this on linux even with 16 threads.

@vtjnash
Copy link
Member

vtjnash commented Jul 10, 2019

note that you don't seem to need to do anything in the loop, just: while true; do JULIA_NUM_THREADS=8 time ./usr/bin/julia -e 'for i = 1:10^5; Threads.@threads for idx in 1:Threads.nthreads(); end; end' || break; done (restarting Julia isn't necessary either, we could also just make i bigger)

vtjnash added a commit that referenced this issue Jul 10, 2019
gotta keep system vs runtime and global vs local straight!

fix #32511
vtjnash added a commit that referenced this issue Jul 11, 2019
gotta keep system vs runtime and global vs local straight!

fix #32511
vtjnash added a commit that referenced this issue Jul 12, 2019
gotta keep system vs runtime and global vs local straight!

fix #32511
vtjnash added a commit that referenced this issue Jul 15, 2019
gotta keep system vs runtime and global vs local straight!

fix #32511
vtjnash added a commit that referenced this issue Jul 15, 2019
gotta keep system vs runtime and global vs local straight!

fix #32511
JeffBezanson pushed a commit that referenced this issue Jul 16, 2019
gotta keep system vs runtime and global vs local straight!

fix #32511
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior multithreading Base.Threads and related functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants