-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: fatal error: malloc deadlock with Go 1.5.1 #13143
Comments
Just to add to the above report, although this is the first dump we manage to capture (was running on an Raspberry Pi), we've also seen two eerily similar scenarios (no crash dumps were generated for those so we couldn't place them), where our code crashed after processing the same amount of data. One of them was an Android cross compilation running on on Galaxy Nexus phone (direct cross compile, not through |
The real bug here is this:
I suspect it can happen on OOM, some of the paths in mHeap_SysAlloc are not well tested. |
CL https://golang.org/cl/16622 mentions this issue. |
I have now proven both by hand and using an automated theorem prover* that either:
The automated proof did, however, uncover a wrap-around bug where a very large allocation could cause an out of bounds index rather than rejecting the allocation. That would suggest case 2. Assuming I can reproduce this in real code, the fix is trivial. @fjl, @karalabe, is it possible your code attempted a particularly large allocation (like over 3GB)? Also, could you run "getconf PAGESIZE" on the system that generated the traceback? I assumed it was 4096 because that's what the runtime uses on 32-bit non-NACL ARM, but if it isn't that could be important. * I used concolic execution of a model of the memory system, which doesn't lend itself to infinite code paths, so specifically I proved that no sequence of four allocations ending in an allocation of 8 pages can cause an index out of bounds on the last allocation. I chose four to exercise all combinations of code paths within the same final program state and because running four already took a really long time. :) |
Scratch the wrap-around I'd found. The wrap-around does happen, but assuming the kernel is correctly bounds-checking the mmap call itself (which I hadn't previously modeled in my proof), the allocation will fail anyway, just at a slightly later point. |
Also, what kernel version were you running? |
I increased the fidelity of the system call model in my proof (and fixed a few bugs in STP) and finally found a way this can happen. I've also managed to reproduce it on linux/386 with a few helping hacks in the runtime. The cause is ultimately quite simple: In mHeap_SysAlloc, if an allocation at arena_used would exceed arena_end (but wouldn't yet push us past arena_start+_MaxArean32), we try to extend the arena reservation by another 256 MB. Note that this could be a small allocation, which is consistent with this bug. We extend the arena by calling sysReserve, which calls mmap without MAP_FIXED on 32-bit, which means the address is just a hint and the kernel can put the mapping wherever it wants. If it happens to choose an address below arena_start (the kernel also chose arena_start, so there could be lots of space below it), then it's the beginning of the end. We don't detect that case in mHeap_SysAlloc. In fact, mHeap_SysAlloc will totally screw up arena_end and arena_used and then return the low pointer to mHeap_Grow, which will subtract arena_start from the pointer, which will underflow and produce a huge value of p, which we'll then use as an index in to h_spans, which will panic. The fix is just to check for this case in mHeap_SysAlloc. I'll roll a CL tomorrow or Monday. |
Nicely done... On Saturday, November 14, 2015, Austin Clements [email protected]
|
CL https://golang.org/cl/16926 mentions this issue. |
CL https://golang.org/cl/16927 mentions this issue. |
If the area returned by sysReserve in mheap.sysAlloc is outside the usable arena, we sysFree it. We pass a fake stat pointer to sysFree because we haven't added the allocation to any stat at that point. However, we pass a 0 stat, so sysFree panics when it decrements the stat because the fake stat underflows. Fix this by setting the fake stat to the allocation size. Updates #13143 (this is a prerequisite to fixing that bug). Change-Id: I61a6c9be19ac1c95863cf6a8435e19790c8bfc9a Reviewed-on: https://go-review.googlesource.com/16926 Reviewed-by: Ian Lance Taylor <[email protected]>
CL https://golang.org/cl/16961 mentions this issue. |
CL https://golang.org/cl/16987 mentions this issue. |
CL https://golang.org/cl/16988 mentions this issue. |
Currently, if an allocation is large enough that arena_end + size overflows (which is not hard to do on 32-bit), we go ahead and call sysReserve with the impossible base and length and depend on this to either directly fail because the kernel can't possibly fulfill the requested mapping (causing mheap.sysAlloc to return nil) or to succeed with a mapping at some other address which will then be rejected as outside the arena. In order to make this less subtle, less dependent on the kernel getting all of this right, and to eliminate the hopeless system call, add an explicit overflow check. Updates #13143. This real issue has been fixed by 0de59c2, but this is a belt-and-suspenders improvement on top of that. It was uncovered by my symbolic modeling of that bug. Change-Id: I85fa868a33286fdcc23cdd7cdf86b19abf1cb2d1 Reviewed-on: https://go-review.googlesource.com/16961 Run-TryBot: Austin Clements <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]>
If the area returned by sysReserve in mheap.sysAlloc is outside the usable arena, we sysFree it. We pass a fake stat pointer to sysFree because we haven't added the allocation to any stat at that point. However, we pass a 0 stat, so sysFree panics when it decrements the stat because the fake stat underflows. Fix this by setting the fake stat to the allocation size. Updates #13143 (this is a prerequisite to fixing that bug). Change-Id: I61a6c9be19ac1c95863cf6a8435e19790c8bfc9a Reviewed-on: https://go-review.googlesource.com/16926 Reviewed-by: Ian Lance Taylor <[email protected]> Reviewed-on: https://go-review.googlesource.com/16987 Run-TryBot: Austin Clements <[email protected]> Reviewed-by: Russ Cox <[email protected]>
… below the arena In mheap.sysAlloc, if an allocation at arena_used would exceed arena_end (but wouldn't yet push us past arena_start+_MaxArean32), it trie to extend the arena reservation by another 256 MB. It extends the arena by calling sysReserve, which, on 32-bit, calls mmap without MAP_FIXED, which means the address is just a hint and the kernel can put the mapping wherever it wants. In particular, mmap may choose an address below arena_start (the kernel also chose arena_start, so there could be lots of space below it). Currently, we don't detect this case and, if it happens, mheap.sysAlloc will corrupt arena_end and arena_used then return the low pointer to mheap.grow, which will crash when it attempts to index in to h_spans with an underflowed index. Fix this by checking not only that that p+p_size isn't too high, but that p isn't too low. Fixes #13143. Change-Id: I8d0f42bd1484460282a83c6f1a6f8f0df7fb2048 Reviewed-on: https://go-review.googlesource.com/16927 Run-TryBot: Austin Clements <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]> Reviewed-on: https://go-review.googlesource.com/16988 Reviewed-by: Russ Cox <[email protected]>
For reference, this is the symbolic model and execution engine I used: https://gist.github.com/aclements/cd6cd1a93732099ff982 This requires fixes to the STP Python bindings that are currently in STP pull request stp/stp#204 |
A go-ethereum user reported a crash with the following stack trace:
https://gist.github.com/CJentzsch/31aab5b31fae6805f893
The issue occurred on linux/arm with go-ethereum built using Go 1.5.1.
The binary in the trace can be downloaded here.
It does not seem to be reproducible, but looks similar to #10469 (x/mobile issue on android/arm).
The text was updated successfully, but these errors were encountered: