Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jemalloc causes os-level deadlock #69165

Closed
git-blame opened this issue Feb 14, 2020 · 3 comments
Closed

Jemalloc causes os-level deadlock #69165

git-blame opened this issue Feb 14, 2020 · 3 comments
Labels
C-bug Category: This is a bug.

Comments

@git-blame
Copy link

My rust application has a C component which calls lua-jit to support lua scripts. These scripts sometimes execute system tools via lua's support for launching external process/shell. When these scripts are heavily used, sometimes the application would simply hang.

I was able to attach gdb to a debug version on one occasion. The stack traces show that most threads are within the lua code, waiting on a low-level lock due to some OS system call. One thread shows that rust's jemalloc (which is the allocator for all code even the C components in this app), is also waiting on a low-level lock.

Because of these tests:

  • Windows build does not exhibit this behavior (it uses system allocator)
  • Linux build hangs with jemalloc
  • Linux build is ok with jemalloc disabled (specifying system allocator in rust code)

I think that rust's version of jemalloc is causing a deadlock. I'm not sure if this would affect pure rust code, rust + C code that doesn't extensively make system calls, etc. or if this is a combination of rust + jemalloc + lua-jit calling lots of system calls (popen, pclose, etc.).

Backtrace below.

This may be related to #31030 or jemalloc/jemalloc/issues/315

Meta

rustc --version --verbose:

rustc 1.30.0 (da5f414c2 2018-10-24)
binary: rustc
commit-hash: da5f414c2c0bfe5198934493f04c676e2b23ff2e
commit-date: 2018-10-24
host: x86_64-unknown-linux-gnu
release: 1.30.0
LLVM version: 8.0
Backtrace

(gdb) thr 3
[Switching to thread 3 (Thread 0x7f7e3c9ff700 (LWP 21340))]
#0  __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
95    ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
(gdb) bt 5
#0  __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
#1  0x00007f7e41a183ba in _IO_flush_all_lockp (do_lock=1) at genops.c:774
#2  0x000055ae035e436a in lj_cf_io_popen (L=0x41776378) at lib_io.c:416
#3  0x000055ae035ccb0e in lj_BC_FUNCC ()
#4  0x000055ae035b930a in lua_pcall (L=0x7f7e41d3e740 <list_all_lock>, nargs=<optimized out>, nresults=<optimized out>,
    errfunc=<optimized out>) at lj_api.c:1129
(More stack frames follow...)
(gdb) thr 4
[Switching to thread 4 (Thread 0x7f7e3c1fe700 (LWP 21341))]
#0  __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
95    in ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S
(gdb) bt 5
#0  __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
#1  0x00007f7e41a1846f in _IO_flush_all_lockp (do_lock=1) at genops.c:783
#2  0x000055ae035e436a in lj_cf_io_popen (L=0x40d5c378) at lib_io.c:416
#3  0x000055ae035ccb0e in lj_BC_FUNCC ()
#4  0x000055ae035b930a in lua_pcall (L=0x7f7e390981f0, nargs=<optimized out>, nresults=<optimized out>, errfunc=<optimized out>)
    at lj_api.c:1129
(More stack frames follow...)
(gdb) thr 5
[Switching to thread 5 (Thread 0x7f7e3b9fd700 (LWP 21342))]
#0  __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
95    in ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S
(gdb) bt 5
#0  __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
#1  0x00007f7e41a16f25 in __GI__IO_un_link (fp=0x7f7e39298100) at genops.c:65
#2  0x00007f7e41a1717a in __GI__IO_un_link (fp=<optimized out>) at genops.c:60
#3  0x00007f7e41a09c55 in _IO_new_fclose (fp=0x7f7e39298100) at iofclose.c:54
#4  0x000055ae035e3695 in io_file_close (L=<optimized out>, iof=<optimized out>) at lib_io.c:101
(More stack frames follow...)
(gdb) thr 6
[Switching to thread 6 (Thread 0x7f7e3b1fc700 (LWP 21343))]
#0  __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
95    in ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S
(gdb) bt 5
#0  __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
#1  0x00007f7e41a18f1a in __GI__IO_list_lock () at genops.c:1216
#2  0x00007f7e41a5b505 in __libc_fork () at ../sysdeps/nptl/fork.c:125
#3  0x00007f7e41a0bc71 in _IO_new_proc_open (fp=fp@entry=0x7f7e38e98200, command=command@entry=0x404e4f90 "ps p  2> /dev/null",
    mode=<optimized out>, mode@entry=0x55ae0384b868 "r") at iopopen.c:180
#4  0x00007f7e41a0bf68 in _IO_new_popen (command=0x404e4f90 "ps p  2> /dev/null", mode=0x55ae0384b868 "r") at iopopen.c:296
(More stack frames follow...)
(gdb) thr 7
[Switching to thread 7 (Thread 0x7f7e3a9fb700 (LWP 21344))]
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
135    ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
(gdb) bt 18
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f7e41f62cc6 in __GI___pthread_mutex_lock (mutex=0x7f7e414a0560) at ../nptl/pthread_mutex_lock.c:135
#2  0x000055ae03568ea5 in je_malloc_mutex_lock (tsdn=<optimized out>, mutex=<optimized out>)
    at /rustc/da5f414c2c0bfe5198934493f04c676e2b23ff2e/src/liballoc_jemalloc/../jemalloc/include/jemalloc/internal/mutex.h:101
#3  je_arena_tcache_fill_small (tsdn=0x7f7e3a9fb4c8, arena=0x7f7e4149e980, tbin=0x7f7e40e223a8, binind=1106677596,
    prof_accumbytes=140180237976928)
    at /rustc/da5f414c2c0bfe5198934493f04c676e2b23ff2e/src/liballoc_jemalloc/../jemalloc/src/arena.c:2442
#4  0x000055ae0358278a in je_tcache_alloc_small_hard (tsdn=0x7f7e414a0560, arena=0x80, tcache=<optimized out>,
    tbin=0x7f7e40e223a8, binind=1106677596, tcache_success=0x7f7e3a9f920e)
    at /rustc/da5f414c2c0bfe5198934493f04c676e2b23ff2e/src/liballoc_jemalloc/../jemalloc/src/tcache.c:82
#5  0x000055ae0355d9ce in je_tcache_alloc_small (arena=<optimized out>, size=0, tsd=<optimized out>, tcache=<optimized out>,
    binind=<optimized out>, zero=<optimized out>, slow_path=<optimized out>)
    at /rustc/da5f414c2c0bfe5198934493f04c676e2b23ff2e/src/liballoc_jemalloc/../jemalloc/include/jemalloc/internal/tcache.h:301
#6  je_arena_malloc (tsdn=<optimized out>, size=<optimized out>, zero=<optimized out>, tcache=<optimized out>,
    slow_path=<optimized out>, arena=<optimized out>, ind=<optimized out>)
    at /rustc/da5f414c2c0bfe5198934493f04c676e2b23ff2e/src/liballoc_jemalloc/../jemalloc/include/jemalloc/internal/arena.h:1346
#7  je_iallocztm (size=<optimized out>, zero=<optimized out>, tcache=<optimized out>, is_metadata=<optimized out>,
    slow_path=<optimized out>, tsdn=<optimized out>, ind=<optimized out>, arena=<optimized out>)
    at include/jemalloc/internal/jemalloc_internal.h:1067
#8  je_ialloc (tsd=<optimized out>, size=<optimized out>, ind=<optimized out>, zero=<optimized out>, slow_path=<optimized out>)
    at include/jemalloc/internal/jemalloc_internal.h:1079
#9  ialloc_body (slow_path=false, size=<optimized out>, zero=<optimized out>, tsdn=<optimized out>, usize=<optimized out>)
    at /rustc/da5f414c2c0bfe5198934493f04c676e2b23ff2e/src/liballoc_jemalloc/../jemalloc/src/jemalloc.c:1605
#10 malloc (size=size@entry=4096)
    at /rustc/da5f414c2c0bfe5198934493f04c676e2b23ff2e/src/liballoc_jemalloc/../jemalloc/src/jemalloc.c:1644
#11 0x00007f7e41a09a62 in __GI__IO_file_doallocate (fp=0x7f7e39098100) at filedoalloc.c:101
#12 0x00007f7e41a17a76 in __GI__IO_doallocbuf (fp=fp@entry=0x7f7e39098100) at genops.c:398
#13 0x00007f7e41a16ae4 in _IO_new_file_underflow (fp=0x7f7e39098100) at fileops.c:564
#14 0x00007f7e41a17b32 in __GI__IO_default_uflow (fp=0x7f7e39098100) at genops.c:413
#15 0x00007f7e41a0b54a in __GI__IO_getline_info (fp=fp@entry=0x7f7e39098100,
    buf=buf@entry=0x41f50a10 "dpkg -S \"//var/lib/docker/overlay2/24dfba067742bebc0aef4ae9e772d072484ae200c22aa9e9defbe98c3aa61efb/diff/usr/share/ca-certificates/mozilla/Hellenic_Academic_and_Research_Institutions_RootCA_2015.crt\" "..., n=8191,
    delim=delim@entry=10, extract_delim=extract_delim@entry=1, eof=eof@entry=0x0) at iogetline.c:60
#16 0x00007f7e41a0b658 in __GI__IO_getline (fp=fp@entry=0x7f7e39098100,
---Type <return> to continue, or q <return> to quit---
    buf=buf@entry=0x41f50a10 "dpkg -S \"//var/lib/docker/overlay2/24dfba067742bebc0aef4ae9e772d072484ae200c22aa9e9defbe98c3aa61efb/diff/usr/share/ca-certificates/mozilla/Hellenic_Academic_and_Research_Institutions_RootCA_2015.crt\" "..., n=<optimized out>,
    delim=delim@entry=10, extract_delim=extract_delim@entry=1) at iogetline.c:34
#17 0x00007f7e41a0a3eb in _IO_fgets (
    buf=0x41f50a10 "dpkg -S \"//var/lib/docker/overlay2/24dfba067742bebc0aef4ae9e772d072484ae200c22aa9e9defbe98c3aa61efb/diff/usr/share/ca-certificates/mozilla/Hellenic_Academic_and_Research_Institutions_RootCA_2015.crt\" "..., n=<optimized out>,
    fp=0x7f7e39098100) at iofgets.c:53
(More stack frames follow...)

@git-blame git-blame added the C-bug Category: This is a bug. label Feb 14, 2020
@sfackler
Copy link
Member

Your compiler version is a year and a half out of date. Rust applications have used the system allocator by default since the 1.32.0 release (13 months ago).

@nagisa
Copy link
Member

nagisa commented Feb 14, 2020

It is worth noting that we do not actively support versions that are so outdated. It would be great if you could try this same code with the current stable compiler with jemalloc enabled and see if it reproduces. Otherwise this issue should be closed.

@git-blame
Copy link
Author

Testing with 1.41.0 and jemalloc enabled as allocator. Issue not seen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug.
Projects
None yet
Development

No branches or pull requests

3 participants