Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EXC_BAD_ACCESS with multiple local processes and pmap #3745

Closed
jxy opened this issue Jul 18, 2013 · 8 comments
Closed

EXC_BAD_ACCESS with multiple local processes and pmap #3745

jxy opened this issue Jul 18, 2013 · 8 comments
Labels
bug Indicates an unexpected problem or unintended behavior parallelism Parallel or distributed computation

Comments

@jxy
Copy link

jxy commented Jul 18, 2013

Either using one process or changing the offending pmap to map walks around the bug.

* thread #1: tid = 0x1803, 0x0000000100098111 libjulia-debug.dylib`jl_deserialize_value(s=0x00007fff5f400fe8) + 33 at dump.c:474, stop reason = EXC_BAD_ACCESS (code=2, address=0x7fff5f3fff58)
    frame #0: 0x0000000100098111 libjulia-debug.dylib`jl_deserialize_value(s=0x00007fff5f400fe8) + 33 at dump.c:474
   471  
   472  static jl_value_t *jl_deserialize_value(ios_t *s)
   473  {
-> 474      int pos = ios_pos(s);
   475      int32_t tag = read_uint8(s);
   476      if (tag == Null_tag)
   477          return NULL;
(lldb) bt
* thread #1: tid = 0x1803, 0x0000000100098111 libjulia-debug.dylib`jl_deserialize_value(s=0x00007fff5f400fe8) + 33 at dump.c:474, stop reason = EXC_BAD_ACCESS (code=2, address=0x7fff5f3fff58)
    frame #0: 0x0000000100098111 libjulia-debug.dylib`jl_deserialize_value(s=0x00007fff5f400fe8) + 33 at dump.c:474
    frame #1: 0x00000001000983f9 libjulia-debug.dylib`jl_deserialize_value(s=0x00007fff5f400fe8) + 777 at dump.c:511
    frame #2: 0x00000001000992b4 libjulia-debug.dylib`jl_deserialize_value(s=0x00007fff5f400fe8) + 4548 at dump.c:703
    frame #3: 0x000000010009878a libjulia-debug.dylib`jl_deserialize_value(s=0x00007fff5f400fe8) + 1690 at dump.c:559
    frame #4: 0x00000001000987b7 libjulia-debug.dylib`jl_deserialize_value(s=0x00007fff5f400fe8) + 1735 at dump.c:561
    frame #5: 0x00000001000987b7 libjulia-debug.dylib`jl_deserialize_value(s=0x00007fff5f400fe8) + 1735 at dump.c:561
    frame #6: 0x00000001000987b7 libjulia-debug.dylib`jl_deserialize_value(s=0x00007fff5f400fe8) + 1735 at dump.c:561
    frame #7: 0x00000001000987b7 libjulia-debug.dylib`jl_deserialize_value(s=0x00007fff5f400fe8) + 1735 at dump.c:561
    frame #8: 0x00000001000987b7 libjulia-debug.dylib`jl_deserialize_value(s=0x00007fff5f400fe8) + 1735 at dump.c:561
    frame #9: 0x000000010009982c libjulia-debug.dylib`jl_uncompress_ast(li=0x000000011699d6c0, data=0x000000011741bbe0) + 204 at dump.c:873
    frame #10: 0x000000010d7011f4
    frame #11: 0x000000010d700158
    frame #12: 0x000000010307f7a3
    frame #13: 0x00000001000160d5 libjulia-debug.dylib`jl_apply(f=0x000000010eb8c1a0, args=0x00007fff5f4012e8, nargs=2) + 69 at julia.h:1018
    frame #14: 0x0000000100017f88 libjulia-debug.dylib`jl_apply_generic(F=0x0000000103dddb80, args=0x00007fff5f4012e8, nargs=2) + 520 at gf.c:1401
    frame #15: 0x0000000112c8d095

and after an enormous amount of repetitions with different args...

    frame #75327: 0x00000001000160d5 libjulia-debug.dylib`jl_apply(f=0x00000001025bb280, args=0x00007fff5fbff210, nargs=4) + 69 at julia.h:1018
    frame #75328: 0x0000000100017f88 libjulia-debug.dylib`jl_apply_generic(F=0x0000000103bc6ae0, args=0x00007fff5fbff210, nargs=4) + 520 at gf.c:1401
    frame #75329: 0x0000000112cbcc21
    frame #75330: 0x00000001000160d5 libjulia-debug.dylib`jl_apply(f=0x000000011713eae0, args=0x00007fff5fbff390, nargs=3) + 69 at julia.h:1018
    frame #75331: 0x0000000100017f88 libjulia-debug.dylib`jl_apply_generic(F=0x0000000102f2a480, args=0x00007fff5fbff390, nargs=3) + 520 at gf.c:1401
    frame #75332: 0x0000000100021675 libjulia-debug.dylib`jl_apply(f=0x0000000102f2a480, args=0x00007fff5fbff390, nargs=3) + 69 at julia.h:1018
    frame #75333: 0x00000001000213ae libjulia-debug.dylib`jl_f_apply(F=0x000000010404bfc0, args=0x00007fff5fbff4c8, nargs=3) + 1294 at builtins.c:291
    frame #75334: 0x0000000112cbc9c8
    frame #75335: 0x00000001000160d5 libjulia-debug.dylib`jl_apply(f=0x00000001172fd460, args=0x00007fff5fbff640, nargs=3) + 69 at julia.h:1018
    frame #75336: 0x0000000100017f88 libjulia-debug.dylib`jl_apply_generic(F=0x0000000102f2a480, args=0x00007fff5fbff640, nargs=3) + 520 at gf.c:1401
    frame #75337: 0x0000000100021675 libjulia-debug.dylib`jl_apply(f=0x0000000102f2a480, args=0x00007fff5fbff640, nargs=3) + 69 at julia.h:1018
    frame #75338: 0x00000001000213ae libjulia-debug.dylib`jl_f_apply(F=0x000000010404bfc0, args=0x00007fff5fbff860, nargs=3) + 1294 at builtins.c:291
    frame #75339: 0x0000000112cbc3d2
    frame #75340: 0x0000000100093d05 libjulia-debug.dylib`jl_apply(f=0x0000000117d40980, args=0x0000000000000000, nargs=0) + 69 at julia.h:1018
    frame #75341: 0x0000000100093b75 libjulia-debug.dylib`start_task(t=0x0000000113a595c0) + 229 at task.c:397
    frame #75342: 0x0000000100092019 libjulia-debug.dylib`switch_stack(t=0x0000000113a595c0, where=0x0000000113a59608) + 137 at task.c:198
    frame #75343: 0x0000000100093b45 libjulia-debug.dylib`start_task(t=0x0000000104024340) + 181 at task.c:393
    frame #75344: 0x0000000100092019 libjulia-debug.dylib`switch_stack(t=0x0000000104024340, where=0x0000000104024388) + 137 at task.c:198
    frame #75345: 0x0000000100091f6b libjulia-debug.dylib`jl_switch_stack(t=0x0000000104024340, where=0x0000000104024388) + 43 at task.c:208
    frame #75346: 0x0000000100091b04 libjulia-debug.dylib`julia_trampoline(argc=3, argv=0x00007fff5fbffbb0, pmain=0x0000000100001ff0) + 180 at init.c:693
    frame #75347: 0x000000010000271b julia-debug-basic`main(argc=3, argv=0x00007fff5fbffbb0) + 171 at repl.c:303
    frame #75348: 0x00000001000016c4 julia-debug-basic`start + 52
(lldb) up
frame #1: 0x00000001000983f9 libjulia-debug.dylib`jl_deserialize_value(s=0x00007fff5f400fe8) + 777 at dump.c:511
   508          if (usetable)
   509              ptrhash_put(&backref_table, (void*)(ptrint_t)pos, (jl_value_t*)tu);
-> 511              jl_tupleset(tu, i, jl_deserialize_value(s));
   512          return (jl_value_t*)tu;
   513      }
   514      else if (vtag == (jl_value_t*)jl_symbol_type ||
(lldb) print *s
(ios_t) $0 = {
  (char *) buf = 0x0000000112d2a0a0 "\x15\a\x03\e\x15\x06<D6>\x81\x02\x04name\x06<D6>\x83\x06<D6>\x80\x06<D6>\x81\x06<D6>\x83\x02\x04name\x15\x80\x06<D6>\x84\x06<D6>\x83\x02\nlat_volume\x15\x93\x06<D6>\x83\x02\x0espatial_volume\x15\x83\x06<D6>\x83\x02\x11T3_spatial_volume\x15\x93\x06<D6>\x83\x02\nnum_flavor\x15\x93\a\e\x1c\x15\a\x02\x02\x04line\x15\f6\x05"
  (bufmode_t) bm = bm_mem
  (int) errcode = 0
  (bufstate_t) state = bst_rd
  (off_t) maxsize = 615
  (off_t) size = 615
  (off_t) bpos = 587
  (off_t) ndirty = 0
  (off_t) fpos = -1
  (size_t) lineno = 7
  (long) fd = -1
  (unsigned char:1) readonly = '\0'
  (unsigned char:1) ownbuf = '\0'
  (unsigned char:1) ownfd = '\0'
  (unsigned char:1) _eof = '\0'
  (unsigned char:1) rereadable = '\x01'
  (int64_t) userdata = 4344617760
  (char [54]) local = "<F0>h\x95\x01\x01" {
    (char) [0] = '<F0>'
    (char) [1] = 'h'
    (char) [2] = '\x95'
    (char) [3] = '\x01'
    (char) [4] = '\x01'
    (char) [5] = '\0'
    (char) [6] = '\0'
    (char) [7] = '\0'
    (char) [8] = '<D0>'
    (char) [9] = '\x10'
    (char) [10] = '@'
    (char) [11] = '_'
    (char) [12] = '<FF>'
    (char) [13] = '\x7f'
    (char) [14] = '\0'
    (char) [15] = '\0'
    (char) [16] = 'p'
    (char) [17] = '\x10'
    (char) [18] = '@'
    (char) [19] = '_'
    (char) [20] = '<FF>'
    (char) [21] = '\x7f'
    (char) [22] = '\0'
    (char) [23] = '\0'
    (char) [24] = '\f'
    (char) [25] = '\0'
    (char) [26] = '\0'
    (char) [27] = '\0'
    (char) [28] = '\0'
    (char) [29] = '\0'
    (char) [30] = '\0'
    (char) [31] = '\0'
    (char) [32] = '<E0>'
    (char) [33] = '\x10'
    (char) [34] = '@'
    (char) [35] = '_'
    (char) [36] = '<FF>'
    (char) [37] = '\x7f'
    (char) [38] = '\0'
    (char) [39] = '\0'
    (char) [40] = '\x80'
    (char) [41] = '<C3>'
    (char) [42] = '\x81'
    (char) [43] = '\x02'
    (char) [44] = '\x01'
    (char) [45] = '\0'
    (char) [46] = '\0'
    (char) [47] = '\0'
    (char) [48] = '\0'
    (char) [49] = '\0'
    (char) [50] = '\0'
    (char) [51] = '\0'
    (char) [52] = '\0'
    (char) [53] = '\0'
  }
}

There are some non-printable characters above, which are copied and formatted as <FF>. It looks like a very deep recursion happened, but there are not many recursions in my code. I'll keep the lldb session in case you need more info.

@JeffBezanson
Copy link
Member

Certainly looks like a stack overflow.

@jxy
Copy link
Author

jxy commented Jul 18, 2013

Yeah. Looks like serialize is called recursively, for some reason.

frame #75329: 0x0000000112cbcc21
-> 0x112cbcc21:  movabsq$4347929056, %rax
   0x112cbcc2b:  movq   -96(%rbp), %rcx
   0x112cbcc2f:  movq   %rcx, -80(%rbp)
   0x112cbcc33:  movabsq$4361506816, %rdi
(lldb) down
frame #75328: 0x0000000100017f88 libjulia-debug.dylib`jl_apply_generic(F=0x0000000103bc6ae0, args=0x00007fff5fbff210, nargs=4) + 520 at gf.c:1401
   1398     }
   1399     assert(!mfunc->linfo || !mfunc->linfo->inInference);
   1400 
-> 1401     return jl_apply(mfunc, args, nargs);
   1402 }
   1403 
   1404 // invoke()
(lldb) print mfunc->linfo->name->name
(char [1]) $84 = "send_msg_" {
  (char) [0] = 's'
}
(lldb) down
frame #75327: 0x00000001000160d5 libjulia-debug.dylib`jl_apply(f=0x00000001025bb280, args=0x00007fff5fbff210, nargs=4) + 69 at julia.h:1018
   1015 static inline
   1016 jl_value_t *jl_apply(jl_function_t *f, jl_value_t **args, uint32_t nargs)
   1017 {
-> 1018     return f->fptr((jl_value_t*)f, args, nargs);
   1019 }
   1020 
   1021 #define JL_NARGS(fname, min, max)                               \
(lldb) print f->linfo->name->nameme
(char [1]) $85 = "send_msg_" {
  (char) [0] = 's'
}
(lldb) down
frame #75326: 0x00000001030710b6
-> 0x1030710b6:  popq   %rbp
   0x1030710b7:  ret    
(lldb) down
frame #75325: 0x000000010307121b
-> 0x10307121b:  movq   8(%r13), %rax
   0x10307121f:  leaq   1(%rbx), %rcx
   0x103071223:  addq   $2, %rbx
   0x103071227:  cmpq   %rbx, %rax
(lldb) down
frame #75324: 0x0000000100017f88 libjulia-debug.dylib`jl_apply_generic(F=0x0000000103dddb80, args=0x00007fff5fbff0a0, nargs=2) + 520 at gf.c:1401
   1398     }
   1399     assert(!mfunc->linfo || !mfunc->linfo->inInference);
   1400 
-> 1401     return jl_apply(mfunc, args, nargs);
   1402 }
   1403 
   1404 // invoke()
(lldb) print mfunc->linfo->name->name                                                                                                                                                                                                                                                                                         
(char [1]) $86 = "serialize" {
  (char) [0] = 's'
}
(lldb) down
frame #75323: 0x00000001000160d5 libjulia-debug.dylib`jl_apply(f=0x000000010eb8c1a0, args=0x00007fff5fbff0a0, nargs=2) + 69 at julia.h:1018
   1015 static inline
   1016 jl_value_t *jl_apply(jl_function_t *f, jl_value_t **args, uint32_t nargs)
   1017 {
-> 1018     return f->fptr((jl_value_t*)f, args, nargs);
   1019 }
   1020 
   1021 #define JL_NARGS(fname, min, max)                               \
(lldb) print f->linfo->name->name                                                                                                                                                                                                                                                                                             
(char [1]) $87 = "serialize" {
  (char) [0] = 's'
}
(lldb) down
frame #75322: 0x000000010307f779
-> 0x10307f779:  movq   %r12, -72(%rbp)
   0x10307f77d:  movq   -88(%rbp), %rax
   0x10307f781:  movq   %rax, -64(%rbp)
   0x10307f785:  movabsq$4347928736, %rax
(lldb) down
frame #75321: 0x0000000100017f88 libjulia-debug.dylib`jl_apply_generic(F=0x0000000103dddb80, args=0x00007fff5fbfef48, nargs=2) + 520 at gf.c:1401
   1398     }
   1399     assert(!mfunc->linfo || !mfunc->linfo->inInference);
   1400 
-> 1401     return jl_apply(mfunc, args, nargs);
   1402 }
   1403 
   1404 // invoke()
(lldb) doprint mfunc->linfo->name->name
(char [1]) $88 = "serialize" {
  (char) [0] = 's'
}
(lldb) frame select 14
frame #14: 0x0000000100017f88 libjulia-debug.dylib`jl_apply_generic(F=0x0000000103dddb80, args=0x00007fff5f4012e8, nargs=2) + 520 at gf.c:1401
   1398     }
   1399     assert(!mfunc->linfo || !mfunc->linfo->inInference);
   1400 
-> 1401     return jl_apply(mfunc, args, nargs);
   1402 }
   1403 
   1404 // invoke()
(lldb) print mfunc->linfo->name->name
(char [1]) $89 = "serialize" {
  (char) [0] = 's'
}
(lldb) up
frame #15: 0x0000000112c8d095
-> 0x112c8d095:  movq   8(%r13), %rax
   0x112c8d099:  leaq   1(%r14), %rcx
   0x112c8d09d:  addq   $2, %r14
   0x112c8d0a1:  cmpq   %r14, %rax
(lldb) up
frame #16: 0x00000001000160d5 libjulia-debug.dylib`jl_apply(f=0x0000000115c96180, args=0x00007fff5f401448, nargs=2) + 69 at julia.h:1018
   1015 static inline
   1016 jl_value_t *jl_apply(jl_function_t *f, jl_value_t **args, uint32_t nargs)
   1017 {
-> 1018     return f->fptr((jl_value_t*)f, args, nargs);
   1019 }
   1020 
   1021 #define JL_NARGS(fname, min, max)                               \
(lldb) doprint f->linfo->name->nameme
(char [1]) $90 = "serialize" {
  (char) [0] = 's'
}
(lldb) up
frame #17: 0x0000000100017f88 libjulia-debug.dylib`jl_apply_generic(F=0x0000000103dddb80, args=0x00007fff5f401448, nargs=2) + 520 at gf.c:1401
   1398     }
   1399     assert(!mfunc->linfo || !mfunc->linfo->inInference);
   1400 
-> 1401     return jl_apply(mfunc, args, nargs);
   1402 }
   1403 
   1404 // invoke()
(lldb) print mfunc->linfo->name->name
(char [1]) $91 = "serialize" {
  (char) [0] = 's'
}

So I guess there are two separate bugs. One is you failed to catch this stack over flow. The other is serialize certainly shouldn't recurse so deep, should it?

@JeffBezanson
Copy link
Member

Ah, is there a chance you have a circular data structure? serializing those does not work yet, and would explain this.

Catching stack overflows simply does not work on Mac. So far nobody knows why.

@jxy
Copy link
Author

jxy commented Jul 18, 2013

I don't think I have any circular data structure. I have only defined immutable types, which do not refer to themselves. I do pass a lot of function objects around (I ported my code from Haskell), but I don't think that could explain 75000 levels of structures. Can you suggest a way to debug my code? How can I make serialize tell me some useful information?

@Keno
Copy link
Member

Keno commented Jun 2, 2015

Catching stack overflows was fixed since. Is the original issue still reproducible?

@jxy
Copy link
Author

jxy commented Jun 2, 2015

Stack overflow is still there. It seems that julia's pmap dislikes multiple curried lambdas. I've worked around that by changing my data structure from my naive haskell implementation.

@jxy jxy closed this as completed Jun 2, 2015
@kmsquire
Copy link
Member

kmsquire commented Jun 2, 2015

@jxy, if the stack overflow is still there, can you provide a small code snippet that causes it (on a recent version of Julia--v0.3.x or master).

@kmsquire kmsquire reopened this Jun 2, 2015
@vtjnash
Copy link
Member

vtjnash commented Nov 3, 2015

multiple curried lambdas are probably recursive data-structures, which are now handled correctly by the serializer code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior parallelism Parallel or distributed computation
Projects
None yet
Development

No branches or pull requests

5 participants