-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: incorrect synchronization around arena_used/h_spans during concurrent GC #9984
Comments
I think this isn't actually a problem, though (if I'm right), it may be worth writing down the reason in the code. Writing p into a global isn't a problem. We rescan globals during mark termination, which means we've stopped the world and synchronized with any other thread that may have changed arena_used or h_spans. I claim it also isn't a problem if a thread writes p into a heap object. This will invoke the write barrier, which will publish p to the garbage collector by (eventually) putting a work buffer on the global work buffer list. The arena_used update happens before p is added to the local work buffer (*), which happens before the work buffer is pushed to the global list, which happens before the work buffer pop by the GC, which happens before GC observes p. Hence GC will observe a sufficiently up-to-date value of arena_used. (*) Unless (maybe this is what you're getting at?), there was bad synchronization on p in the first place and the allocating thread and the thread that published p to the heap are different threads and p was obtained without establishing happens-before. But this would seem to be an extension of our policy of allowing memory safety violations when there are races in user code. |
@aclements I have not looked at the GC code recently, but I would assume that GC threads are still able to discover pointers by means other than picking up write barrier buffers. And consequently something along the following lines should be possible. Thread 1 allocates an object O and updates arena_used/h_spans. Then thread 1 writes pointer to O into heap. Then GC thread discovers the pointer on its own (by following pointers from roots). GC thread sees the updates value of arena_used, so it starts processing the pointer. But GC thread does not see the updated value of h_spans, so when it consults the span table it finds a stale span on no span at all. At this point GC decided that the pointer is bogus and terminates the program. |
CL https://golang.org/cl/10817 mentions this issue. |
There are two other reincarnations of the multiprocessor ordering problem:
|
GC bitmaps can be out of date, they just can't say an object is marked when On Mon, Jun 8, 2015 at 9:51 AM, Dmitry Vyukov [email protected]
|
They also can say a slot is not a pointer when it is a pointer.
I don't see why. |
The GC won't look at bitmaps unless it either has the pointer returned by On Mon, Jun 8, 2015 at 11:10 PM, Dmitry Vyukov [email protected]
|
I don't follow, Rick. I mean the scenario that I already described in: Namely: |
I agree we need a StoreStore barrier just before gcmalloc returns the On Tue, Jun 9, 2015 at 10:42 AM, Dmitry Vyukov [email protected]
|
Barriers are irrelevant here, because there is always a data dependency on pointer value for GC bitmap and object contents. Object contents are dependent on pointer value. GC bitmap address is dependent on pointer value. Spans and anera_used are still broken, though. |
I'm going to take a stab at arena_used based on the idea in your "Alpha barrier elimination trick" plus fixing places where we explicitly cache the value of arena_used (which is equivalent to really bad load reordering that will also happen on TSO :) |
@aclements can you construct a test that crashes on TSO? that would be useful. |
I think I was wrong about being able to trigger this on TSO for basically the reason I gave in #9984 (comment). scanobject/etc can observe a pointer that's in the heap but larger than the cached arena_used (adding a simple panic to scanobject can demonstrate this), but if the inheap check in write barrier is safe (which it seems to be on TSO with Russ' CL to fix the MapBits ordering problem), then any such pointer will have been marked by the write barrier that made the newly allocated object reachable, so it doesn't need to be marked by scanobject. (Of course, this is still a problem on non-TSO, as you pointed out in #9984 (comment)) |
Writebarrier writes a pointer into heap and marks the object, yes. But it does not do it atomically. There is a windows in which the pointer is in heap but the object is not marked (even on TSO). |
Yes that would be bad. Doing it the other way around shade then install On Wed, Jun 10, 2015 at 2:30 AM, Dmitry Vyukov [email protected]
|
... would be fine for TSO machines, but not for RMO machine. |
Upon reflection the issue is whether the WB maintains the no black to white On Wed, Jun 10, 2015 at 9:24 AM, Rick Hudson [email protected] wrote:
|
... except when we miss the inheap check because of a reordered store or load of arena_used on a non-TSO machine, which is what this bug started out as. |
I definitely agree that the GC thread could observe a nil *mspan if the object came from a freshly allocated span. However, I argue that this is safe because scanobject will simply ignore the pointer and whatever thread published it to the heap will eventually execute the write barrier, which will grey the object. In a race-free program, the publishing thread will observe sufficiently up-to-date arena_used and spans because it either allocated the object itself or got it through some means that created a happens-before edge. I think you're also right that the GC can observe a non-nil stale span, which is much more of a danger. One way this can happen right now is that we publish span pointers without a barrier, so the GC could observe a pointer in h_spans but observe stale values of the pointed-to mspan. Half of Russ' CL 10817 addresses this by making stores to h_spans atomic. The data dependency takes care of the load side. I think the only other way this can happen is when an existing span transitions from free to in-use during a GC cycle. In this case, the GC could observe a pointer into the span, but observe the stale span state and crash. Somewhat ironically, what's currently on HEAD simply ignores this case, which I believe is a correct way to fix it since the write barrier on the mutator thread will take care of greying the object. However, CL 10818 proposes re-enabling this test. |
I think we're all in agreement that there definitely needs to be a store-store barrier at the end of allocation, at least if GC is currently running. Dmitry, do you know of any store-store-type barrier mechanism already in the runtime? We have the full barriers for atomics, but I don't see anything lighter weight. |
No, we don't. Russ is strongly against anything weaker than full barrier. |
Are you sure that an invalid span will necessary be detected as invalid? |
I'm not too worried about that, since the invalid span check is just for debugging (it hasn't even been on for months). A false positive is bad here because it can crash the program even when nothing has gone wrong (except the span check), but I'm much less concerned about failing to crash the program at this point when something has gone wrong. |
If an invalid span is not necessary detected as invalid, then scanblock can incorrectly infer object start and/or object size. Are you sure it is OK? |
CL https://golang.org/cl/11083 mentions this issue. |
CL https://golang.org/cl/11081 mentions this issue. |
If there's an invalid span, we're going to crash somehow. It's certainly preferable to crash with a nice panic that tells us there was an invalid span, but this doesn't seem worth worrying a great deal about. |
Now that I've added the publication barrier to mallocgc (pending CL https://golang.org/cl/11083), I'm wondering if that's also the solution to the stale span problem. I may have just tied myself in knots at this point, but it seems like that publication barrier at the end of mallocgc should, in addition to ensuring GC sees an up-to-date object and heap bitmap after it follows a pointer to the object, also ensure it sees the up-to-date *mspan and mspan fields. In this case we'd need to do the publication barrier in the noscan path as well (the CL currently skips that), but that's easy enough. I believe this addresses your scenario in #9984 (comment). Am I right? |
x86 is TSO so the publication barrier is a nop, or are we talking about ARM? On Tue, Jun 16, 2015 at 3:34 PM, Austin Clements [email protected]
|
ARM and PPC. |
You are answering different question again. |
Sorry; I was answering the question I thought you were asking. Can you restate your question? And, if relevant and possible, explain why a store/store barrier at the end of mallocgc doesn't address it. I don't understand what you mean by "heap corruption in a correct program" (I can make guesses, but clearly my past interpretations of your meaning have been wrong). |
I mean the following situation. Assuming that GC thread can see invalid spans for some pointers. GC does the following check: if s == nil || pageID(k) < s.start || p >= s.limit || s.state != mSpanInUse { Your explanation assumes that this check can yield only true for invalid spans. And this is indeed OK as of now (GC will ignore the pointer, later the pointer will be picked up from write buffer and scanned). My concern is the case when this check yields false for invalid spans (assuming that an invalid span can have unpredictable values, why not?). And this case looks very-very bad to me, because the invalid span can have a wrong sizeclass, which in turn will lead to incorrect inference of object base and size, which I presume can lead to bad things (I have not looked at the latest code, so I can't say what exactly are that bad things). My question is: are you sure that the check cannot yield false for invalid spans? I don't know yet whether a barrier at the end of malloc fixes it or not. For now I am just trying to understand the problems that we are trying to solve. |
I think I see what you're getting at. I claimed that it is in fact okay to let scanobject see stale/invalid spans as long as it can correctly ignore them and let the write barrier in the user code mark the object. If I understand, you're questioning whether it can reliably ignore them, since if it can't and it winds up using information from a stale span, very bad things will happen (which is true). I convinced myself last Friday that it can reliably ignore them. However, the reasoning was convoluted and not something I would trust to remain true. And I think there's a much better answer wherein the scanobject simply will not see stale spans. I think I can reconstruct my reasoning if you want, but do you agree that if scanobject never sees a stale span, that the question of whether it can detect a stale span is moot? I assert that a store/store barrier at the end of mallocgc ensures that scanobject will not see a stale span for pointers into the heap. This barrier ensures that all writes performed by mallocgc, including initializing an mspan and storing it to h_spans if it got a fresh span, occur before the caller can perform any writes that make the object visible to the garbage collector (through a write barrier or just regular tracing). Once this span has been allocated from, the start, limit, and state fields will not change until the sweeper reclaims it, at which point there are no more objects to lead the garbage collector to this span. When scanobject gets the heap bits of an object, its data dependencies form the load/load side of the barrier. It's a somewhat unusual data dependency, but it is a data dependency: scanobject must load the pointer before it can compute the offset in h_spans and look up the span, so the load of the pointer must happen before the load of the *mspan or any fields of the mspan. Thus, it will see the up-to-date span, since the span cannot have changed since the store/store barrier in mallocgc. I have to think more about the special pointers from the heap into the stack that scanobject is supposed to ignore. Stack spans don't have the nice monotonic properties of heap spans, since they can be freed at any time. So, it's possible scanobject could load a pointer into a stack from an object, then some other thread grows that stack and frees the original stack span, and then heapBitsForObject sees a freed span (or even a reused one). If that's the case, I think this could happen regardless of memory ordering. |
This is definitely a problem. Since it's unrelated missing memory barriers, I opened issue #11267 to avoid cluttering this issue (of course, the solution may require barriers, but first we have to figure out what the solution is). |
Yes, I think that if GC does not see invalid spans, it gives us much better foundation for future. And it will allow us to enable the invalid span debug check, which is I think a very good thing to have. I think I constructed a case where GC does can see an invalid span (a span is freed, then a part of it is reallocated, then the other part of is allocated, and for that other part GC can see the original span descriptor with original bounds but new inuse state). I can well admit that there is something that prevents this scenario from happening. But as you said, it does not looks like a good foundation for future changes. The following looks like a good scheme to solve these races:
|
Yeah, stack pointers discovered by GC thread are an issue. |
I had been thinking we would just do the release barrier at the end of mallocgc for both scan and noscan objects, but this works, too, and is probably slightly cheaper since the release barrier at the end of heap alloc won't happen on every allocation (even though this means two barriers on scan allocations that also allocate a span). The h_spans part of bullet 2 was the intent of https://go-review.googlesource.com/#/c/11081/, but you seemed to think that wasn't sufficient to deal with accesses to h_spans. You're right that there's no data dependency on arena_used. I'd overlooked that.
I'm not convinced that this part is necessary. It's okay if the GC thread comes across a pointer and incorrectly rejects it because it has a stale value of arena_used. The thread that published the pointer will invoke the write barrier, which will mark the object (there is a window between when the pointer is published and it is marked, but this is okay as long as it is eventually marked and this window cannot be preempted). This seems fairly fundamental to a Dijkstra-style write barrier. |
I don't see any point in making malloc slower.
I don't think it is any important performance-wise.
Your scheme works too. |
Currently its possible for the garbage collector to observe uninitialized memory or stale heap bitmap bits on weakly ordered architectures such as ARM and PPC. On such architectures, the stores that zero newly allocated memory and initialize its heap bitmap may move after a store in user code that makes the allocated object observable by the garbage collector. To fix this, add a "publication barrier" (also known as an "export barrier") before returning from mallocgc. This is a store/store barrier that ensures any write done by user code that makes the returned object observable to the garbage collector will be ordered after the initialization performed by mallocgc. No barrier is necessary on the reading side because of the data dependency between loading the pointer and loading the contents of the object. Fixes one of the issues raised in #9984. Change-Id: Ia3d96ad9c5fc7f4d342f5e05ec0ceae700cd17c8 Reviewed-on: https://go-review.googlesource.com/11083 Reviewed-by: Rick Hudson <[email protected]> Reviewed-by: Dmitry Vyukov <[email protected]> Reviewed-by: Minux Ma <[email protected]> Reviewed-by: Martin Capitanio <[email protected]> Reviewed-by: Russ Cox <[email protected]>
I certainly like the idea of never seeing an invalid pointer in principle, but I worry that it would make the code uglier or slower or quite possibly both because of the current limitations of our inliner. Specifically, inheap and spanOf are currently inlined in several hot paths, and this would make them non-inlinable. This makes me sad, but it is what it is for now. |
Actually, isn't there already a release barrier at the end of heap alloc? h_spans is only manipulated under the heap lock, so the unlock should act as a release barrier. |
There is. By coincidence. It deserves at least a bold comment. I can imagine that have a per-P heap cache; or sweep a span and reuse it for a different size class without accessing heap; or when heap is empty mmap and setup a new span without holding heap mutex. |
The unsynchronized accesses to mheap_.arena_used in the concurrent part of the garbage collector look like a problem waiting to happen. In fact, they are safe, but the reason is somewhat subtle and undocumented. This commit documents this reasoning. Related to issue #9984. Change-Id: Icdbf2329c1aa11dbe2396a71eb5fc2a85bd4afd5 Reviewed-on: https://go-review.googlesource.com/11254 Reviewed-by: Dmitry Vyukov <[email protected]>
Definitely. |
h_spans can be accessed concurrently without synchronization from other threads, which means it needs the appropriate memory barriers on weakly ordered machines. It happens to already have the necessary memory barriers because all accesses to h_spans are currently protected by the heap lock and the unlocks happen in exactly the places where release barriers are needed, but it's easy to imagine that this could change in the future. Document the fact that we're depending on the barrier implied by the unlock. Related to issue #9984. Change-Id: I1bc3c95cd73361b041c8c95cd4bb92daf8c1f94a Reviewed-on: https://go-review.googlesource.com/11361 Reviewed-by: Rick Hudson <[email protected]> Reviewed-by: Dmitry Vyukov <[email protected]>
I believe all of the issues in this issue have now been addressed. Thanks for sticking it out with me. :) Please reopen if you think there's something we didn't take care of. |
Arena_used/h_spans updates during memory allocation are not synchronized with concurrent GC.
Consider that a thread grows heap and allocates a new object p and then writes p into an existing object or a global. Concurrent GC reaches the global and extracts pointer p, then it checks it against arena_start/arena_used, but it may not see the updated value of arena_used and thus ignore the object p unmarked and unscanned.
A similar senario can happen with h_spans: GC sees new value of arena_used but does not see the new value in h_spans[spanIdx].
This will probably work today on x86 due to conservative compiler that does not reorder memory accesses aggressively. But this should break on arm and power.
@RLH @aclements @rsc
The text was updated successfully, but these errors were encountered: