Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: incorrect synchronization around arena_used/h_spans during concurrent GC #9984

Closed
dvyukov opened this issue Feb 24, 2015 · 44 comments
Closed
Milestone

Comments

@dvyukov
Copy link
Member

dvyukov commented Feb 24, 2015

Arena_used/h_spans updates during memory allocation are not synchronized with concurrent GC.
Consider that a thread grows heap and allocates a new object p and then writes p into an existing object or a global. Concurrent GC reaches the global and extracts pointer p, then it checks it against arena_start/arena_used, but it may not see the updated value of arena_used and thus ignore the object p unmarked and unscanned.
A similar senario can happen with h_spans: GC sees new value of arena_used but does not see the new value in h_spans[spanIdx].

This will probably work today on x86 due to conservative compiler that does not reorder memory accesses aggressively. But this should break on arm and power.

@RLH @aclements @rsc

@dvyukov dvyukov added this to the Go1.5 milestone Feb 24, 2015
@aclements
Copy link
Member

I think this isn't actually a problem, though (if I'm right), it may be worth writing down the reason in the code.

Writing p into a global isn't a problem. We rescan globals during mark termination, which means we've stopped the world and synchronized with any other thread that may have changed arena_used or h_spans.

I claim it also isn't a problem if a thread writes p into a heap object. This will invoke the write barrier, which will publish p to the garbage collector by (eventually) putting a work buffer on the global work buffer list. The arena_used update happens before p is added to the local work buffer (*), which happens before the work buffer is pushed to the global list, which happens before the work buffer pop by the GC, which happens before GC observes p. Hence GC will observe a sufficiently up-to-date value of arena_used.

(*) Unless (maybe this is what you're getting at?), there was bad synchronization on p in the first place and the allocating thread and the thread that published p to the heap are different threads and p was obtained without establishing happens-before. But this would seem to be an extension of our policy of allowing memory safety violations when there are races in user code.

@dvyukov
Copy link
Member Author

dvyukov commented May 18, 2015

@aclements I have not looked at the GC code recently, but I would assume that GC threads are still able to discover pointers by means other than picking up write barrier buffers. And consequently something along the following lines should be possible. Thread 1 allocates an object O and updates arena_used/h_spans. Then thread 1 writes pointer to O into heap. Then GC thread discovers the pointer on its own (by following pointers from roots). GC thread sees the updates value of arena_used, so it starts processing the pointer. But GC thread does not see the updated value of h_spans, so when it consults the span table it finds a stale span on no span at all. At this point GC decided that the pointer is bogus and terminates the program.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/10817 mentions this issue.

@dvyukov
Copy link
Member Author

dvyukov commented Jun 8, 2015

There are two other reincarnations of the multiprocessor ordering problem:

  1. GC bitmap. It must be visible to GC threads along with the object.
  2. Object zeroing. At least the zero state must be visible to GC threads. If GC will see values from the previous life of the object, it will crash or retain irrelevant objects.
    I think that both cases are solvable by issuing a @StoreStore barrier after object initialization (both zeroing and bitmap setup). On the reader side there is a data dependency on the pointer value, so only a data-dependent barrier is enough (which is no-op on everything except Alpha).
    This can also probably add very bad failure modes for user code with data races on pointers, as visibility over bitmap won't be propagated to GC threads.

@RLH
Copy link
Contributor

RLH commented Jun 9, 2015

GC bitmaps can be out of date, they just can't say an object is marked when
it isn't.
Isn't 2 an indication of a run of the mill race condition?

On Mon, Jun 8, 2015 at 9:51 AM, Dmitry Vyukov [email protected]
wrote:

There are two other reincarnations of the multiprocessor ordering problem:

  1. GC bitmap. It must be visible to GC threads along with the object.
  2. Object zeroing. At least the zero state must be visible to GC threads.
    If GC will see values from the previous life of the object, it will crash
    or retain irrelevant objects.
    I think that both cases are solvable by issuing a @StoreStore barrier
    after object initialization (both zeroing and bitmap setup). On the reader
    side there is a data dependency on the pointer value, so only a
    data-dependent barrier is enough (which is no-op on everything except
    Alpha).
    This can also probably add very bad failure modes for user code with data
    races on pointers, as visibility over bitmap won't be propagated to GC
    threads.


Reply to this email directly or view it on GitHub
#9984 (comment).

@dvyukov
Copy link
Member Author

dvyukov commented Jun 9, 2015

@RLH

GC bitmaps can be out of date, they just can't say an object is marked when it isn't.

They also can say a slot is not a pointer when it is a pointer.

Isn't 2 an indication of a run of the mill race condition?

I don't see why.

@RLH
Copy link
Contributor

RLH commented Jun 9, 2015

The GC won't look at bitmaps unless it either has the pointer returned by
the allocation or in a sweep after it has passed through a barrier. The GC
map information will be data dependent on the pointer returned, just like
the zeroing of the actual object.

On Mon, Jun 8, 2015 at 11:10 PM, Dmitry Vyukov [email protected]
wrote:

@RLH https://github.com/RLH

GC bitmaps can be out of date, they just can't say an object is marked
when it isn't.

They also can say a slot is not a pointer when it is a pointer.

Isn't 2 an indication of a run of the mill race condition?

I don't see why.


Reply to this email directly or view it on GitHub
#9984 (comment).

@dvyukov
Copy link
Member Author

dvyukov commented Jun 9, 2015

I don't follow, Rick.
What is "pointer returned by the allocation"?

I mean the scenario that I already described in:
#9984 (comment)

Namely:
Object at pointer P is free and contains garbage from the previous life.
Thread 1 allocates P and zeroes the object at P.
Thread 1 write pointer P into heap.
GC thread 2 independently discovers pointer P and starts scanning it.
Thread 2 reads a pointer from object pointed to by P.
Now, there is nothing that guarantees that thread 2 won't see the relic garbage in the pointer slot.
GC explodes on invalid pointer.

@RLH
Copy link
Contributor

RLH commented Jun 9, 2015

I agree we need a StoreStore barrier just before gcmalloc returns the
pointer. I don't see how the GC can independently discover such a pointer
without either a barrier which creates a happens-before edge or a data
dependency which should prevent the invalid pointer.

On Tue, Jun 9, 2015 at 10:42 AM, Dmitry Vyukov [email protected]
wrote:

I don't follow, Rick.
What is "pointer returned by the allocation"?

I mean the scenario that I already described in:
#9984 (comment)
#9984 (comment)

Namely:
Object at pointer P is free and contains garbage from the previous life.
Thread 1 allocates P and zeroes the object at P.
Thread 1 write pointer P into heap.
GC thread 2 independently discovers pointer P and starts scanning it.
Thread 2 reads a pointer from object pointed to by P.
Now, there is nothing that guarantees that thread 2 won't see the relic
garbage in the pointer slot.
GC explodes on invalid pointer.


Reply to this email directly or view it on GitHub
#9984 (comment).

@dvyukov
Copy link
Member Author

dvyukov commented Jun 9, 2015

I don't see how the GC can independently discover such a pointer
without either a barrier which creates a happens-before edge or a data
dependency which should prevent the invalid pointer.

Barriers are irrelevant here, because there is always a data dependency on pointer value for GC bitmap and object contents. Object contents are dependent on pointer value. GC bitmap address is dependent on pointer value.

Spans and anera_used are still broken, though.

@aclements
Copy link
Member

I'm going to take a stab at arena_used based on the idea in your "Alpha barrier elimination trick" plus fixing places where we explicitly cache the value of arena_used (which is equivalent to really bad load reordering that will also happen on TSO :)

@dvyukov
Copy link
Member Author

dvyukov commented Jun 9, 2015

@aclements can you construct a test that crashes on TSO? that would be useful.

@aclements
Copy link
Member

I think I was wrong about being able to trigger this on TSO for basically the reason I gave in #9984 (comment). scanobject/etc can observe a pointer that's in the heap but larger than the cached arena_used (adding a simple panic to scanobject can demonstrate this), but if the inheap check in write barrier is safe (which it seems to be on TSO with Russ' CL to fix the MapBits ordering problem), then any such pointer will have been marked by the write barrier that made the newly allocated object reachable, so it doesn't need to be marked by scanobject.

(Of course, this is still a problem on non-TSO, as you pointed out in #9984 (comment))

@dvyukov
Copy link
Member Author

dvyukov commented Jun 10, 2015

then any such pointer will have been marked by the write barrier that made the newly allocated object reachable, so it doesn't need to be marked by scanobject.

Writebarrier writes a pointer into heap and marks the object, yes. But it does not do it atomically. There is a windows in which the pointer is in heap but the object is not marked (even on TSO).

@RLH
Copy link
Contributor

RLH commented Jun 10, 2015

Yes that would be bad. Doing it the other way around shade then install
would be fine.

On Wed, Jun 10, 2015 at 2:30 AM, Dmitry Vyukov [email protected]
wrote:

then any such pointer will have been marked by the write barrier that made
the newly allocated object reachable, so it doesn't need to be marked by
scanobject.

Writebarrier writes a pointer into heap and marks the object, yes. But it
does not do it atomically. There is a windows in which the pointer is in
heap but the object is not marked (even on TSO).


Reply to this email directly or view it on GitHub
#9984 (comment).

@dvyukov
Copy link
Member Author

dvyukov commented Jun 10, 2015

Doing it the other way around shade then install would be fine...

... would be fine for TSO machines, but not for RMO machine.

@RLH
Copy link
Contributor

RLH commented Jun 10, 2015

Upon reflection the issue is whether the WB maintains the no black to white
invariant, which I think it does no matter which order we do this in. I
don't think we miss greying any white objects.

On Wed, Jun 10, 2015 at 9:24 AM, Rick Hudson [email protected] wrote:

Yes that would be bad. Doing it the other way around shade then install
would be fine.

On Wed, Jun 10, 2015 at 2:30 AM, Dmitry Vyukov [email protected]
wrote:

then any such pointer will have been marked by the write barrier that
made the newly allocated object reachable, so it doesn't need to be marked
by scanobject.

Writebarrier writes a pointer into heap and marks the object, yes. But it
does not do it atomically. There is a windows in which the pointer is in
heap but the object is not marked (even on TSO).


Reply to this email directly or view it on GitHub
#9984 (comment).

@aclements
Copy link
Member

Upon reflection the issue is whether the WB maintains the no black to white
invariant, which I think it does no matter which order we do this in. I
don't think we miss greying any white objects.

... except when we miss the inheap check because of a reordered store or load of arena_used on a non-TSO machine, which is what this bug started out as.

@aclements
Copy link
Member

I have not looked at the GC code recently, but I would assume that GC threads are still able to discover pointers by means other than picking up write barrier buffers. And consequently something along the following lines should be possible. Thread 1 allocates an object O and updates arena_used/h_spans. Then thread 1 writes pointer to O into heap. Then GC thread discovers the pointer on its own (by following pointers from roots). GC thread sees the updates value of arena_used, so it starts processing the pointer. But GC thread does not see the updated value of h_spans, so when it consults the span table it finds a stale span on no span at all. At this point GC decided that the pointer is bogus and terminates the program.

I definitely agree that the GC thread could observe a nil *mspan if the object came from a freshly allocated span. However, I argue that this is safe because scanobject will simply ignore the pointer and whatever thread published it to the heap will eventually execute the write barrier, which will grey the object. In a race-free program, the publishing thread will observe sufficiently up-to-date arena_used and spans because it either allocated the object itself or got it through some means that created a happens-before edge.

I think you're also right that the GC can observe a non-nil stale span, which is much more of a danger. One way this can happen right now is that we publish span pointers without a barrier, so the GC could observe a pointer in h_spans but observe stale values of the pointed-to mspan. Half of Russ' CL 10817 addresses this by making stores to h_spans atomic. The data dependency takes care of the load side.

I think the only other way this can happen is when an existing span transitions from free to in-use during a GC cycle. In this case, the GC could observe a pointer into the span, but observe the stale span state and crash. Somewhat ironically, what's currently on HEAD simply ignores this case, which I believe is a correct way to fix it since the write barrier on the mutator thread will take care of greying the object. However, CL 10818 proposes re-enabling this test.

@aclements
Copy link
Member

There are two other reincarnations of the multiprocessor ordering problem:

  1. GC bitmap. It must be visible to GC threads along with the object.
  2. Object zeroing. At least the zero state must be visible to GC threads. If GC will see values from the previous life of the object, it will crash or retain irrelevant objects.
    I think that both cases are solvable by issuing a @StoreStore barrier after object initialization (both zeroing and bitmap setup). On the reader side there is a data dependency on the pointer value, so only a data-dependent barrier is enough (which is no-op on everything except Alpha).

I think we're all in agreement that there definitely needs to be a store-store barrier at the end of allocation, at least if GC is currently running. Dmitry, do you know of any store-store-type barrier mechanism already in the runtime? We have the full barriers for atomics, but I don't see anything lighter weight.

@dvyukov
Copy link
Member Author

dvyukov commented Jun 13, 2015

Dmitry, do you know of any store-store-type barrier mechanism already in the runtime?

No, we don't. Russ is strongly against anything weaker than full barrier.

@dvyukov
Copy link
Member Author

dvyukov commented Jun 13, 2015

I think you're also right that the GC can observe a non-nil stale span, which is much more of a danger.

Are you sure that an invalid span will necessary be detected as invalid?

@aclements
Copy link
Member

Are you sure that an invalid span will necessary be detected as invalid?

I'm not too worried about that, since the invalid span check is just for debugging (it hasn't even been on for months). A false positive is bad here because it can crash the program even when nothing has gone wrong (except the span check), but I'm much less concerned about failing to crash the program at this point when something has gone wrong.

@dvyukov
Copy link
Member Author

dvyukov commented Jun 13, 2015

If an invalid span is not necessary detected as invalid, then scanblock can incorrectly infer object start and/or object size. Are you sure it is OK?
Note that unused spans have old start/npages, so they can be just right to cover the pointer; and state can be concurrently changed to inuse.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/11083 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/11081 mentions this issue.

@aclements
Copy link
Member

If an invalid span is not necessary detected as invalid, then scanblock can incorrectly infer object start and/or object size. Are you sure it is OK?

If there's an invalid span, we're going to crash somehow. It's certainly preferable to crash with a nice panic that tells us there was an invalid span, but this doesn't seem worth worrying a great deal about.

@aclements
Copy link
Member

Now that I've added the publication barrier to mallocgc (pending CL https://golang.org/cl/11083), I'm wondering if that's also the solution to the stale span problem. I may have just tied myself in knots at this point, but it seems like that publication barrier at the end of mallocgc should, in addition to ensuring GC sees an up-to-date object and heap bitmap after it follows a pointer to the object, also ensure it sees the up-to-date *mspan and mspan fields. In this case we'd need to do the publication barrier in the noscan path as well (the CL currently skips that), but that's easy enough. I believe this addresses your scenario in #9984 (comment). Am I right?

@RLH
Copy link
Contributor

RLH commented Jun 16, 2015

x86 is TSO so the publication barrier is a nop, or are we talking about ARM?

On Tue, Jun 16, 2015 at 3:34 PM, Austin Clements [email protected]
wrote:

Now that I've added the publication barrier to mallocgc (pending CL
https://golang.org/cl/11083), I'm wondering if that's also the solution
to the stale span problem. I may have just tied myself in knots at this
point, but it seems like that publication barrier at the end of mallocgc
should, in addition to ensuring GC sees an up-to-date object and heap
bitmap after it follows a pointer to the object, also ensure it sees the
up-to-date *mspan and mspan fields. In this case we'd need to do the
publication barrier in the noscan path as well (the CL currently skips
that), but that's easy enough. I believe this addresses your scenario in #9984
(comment)
#9984 (comment). Am I
right?


Reply to this email directly or view it on GitHub
#9984 (comment).

@aclements
Copy link
Member

x86 is TSO so the publication barrier is a nop, or are we talking about ARM?

ARM and PPC.

@dvyukov
Copy link
Member Author

dvyukov commented Jun 17, 2015

If there's an invalid span, we're going to crash somehow. It's certainly preferable to crash with a nice panic that tells us there was an invalid span, but this doesn't seem worth worrying a great deal about.

You are answering different question again.
I am not worried about diagnostics, I am worried about heap corrupts in correct programs.

@aclements
Copy link
Member

Sorry; I was answering the question I thought you were asking. Can you restate your question? And, if relevant and possible, explain why a store/store barrier at the end of mallocgc doesn't address it. I don't understand what you mean by "heap corruption in a correct program" (I can make guesses, but clearly my past interpretations of your meaning have been wrong).

@dvyukov
Copy link
Member Author

dvyukov commented Jun 17, 2015

I mean the following situation.

Assuming that GC thread can see invalid spans for some pointers.

GC does the following check:

if s == nil || pageID(k) < s.start || p >= s.limit || s.state != mSpanInUse {

Your explanation assumes that this check can yield only true for invalid spans. And this is indeed OK as of now (GC will ignore the pointer, later the pointer will be picked up from write buffer and scanned).

My concern is the case when this check yields false for invalid spans (assuming that an invalid span can have unpredictable values, why not?). And this case looks very-very bad to me, because the invalid span can have a wrong sizeclass, which in turn will lead to incorrect inference of object base and size, which I presume can lead to bad things (I have not looked at the latest code, so I can't say what exactly are that bad things).

My question is: are you sure that the check cannot yield false for invalid spans?
I don't see any obvious reason for that.

I don't know yet whether a barrier at the end of malloc fixes it or not. For now I am just trying to understand the problems that we are trying to solve.

@aclements
Copy link
Member

I think I see what you're getting at. I claimed that it is in fact okay to let scanobject see stale/invalid spans as long as it can correctly ignore them and let the write barrier in the user code mark the object. If I understand, you're questioning whether it can reliably ignore them, since if it can't and it winds up using information from a stale span, very bad things will happen (which is true).

I convinced myself last Friday that it can reliably ignore them. However, the reasoning was convoluted and not something I would trust to remain true. And I think there's a much better answer wherein the scanobject simply will not see stale spans. I think I can reconstruct my reasoning if you want, but do you agree that if scanobject never sees a stale span, that the question of whether it can detect a stale span is moot?

I assert that a store/store barrier at the end of mallocgc ensures that scanobject will not see a stale span for pointers into the heap. This barrier ensures that all writes performed by mallocgc, including initializing an mspan and storing it to h_spans if it got a fresh span, occur before the caller can perform any writes that make the object visible to the garbage collector (through a write barrier or just regular tracing). Once this span has been allocated from, the start, limit, and state fields will not change until the sweeper reclaims it, at which point there are no more objects to lead the garbage collector to this span. When scanobject gets the heap bits of an object, its data dependencies form the load/load side of the barrier. It's a somewhat unusual data dependency, but it is a data dependency: scanobject must load the pointer before it can compute the offset in h_spans and look up the span, so the load of the pointer must happen before the load of the *mspan or any fields of the mspan. Thus, it will see the up-to-date span, since the span cannot have changed since the store/store barrier in mallocgc.

I have to think more about the special pointers from the heap into the stack that scanobject is supposed to ignore. Stack spans don't have the nice monotonic properties of heap spans, since they can be freed at any time. So, it's possible scanobject could load a pointer into a stack from an object, then some other thread grows that stack and frees the original stack span, and then heapBitsForObject sees a freed span (or even a reused one). If that's the case, I think this could happen regardless of memory ordering.

@aclements
Copy link
Member

I have to think more about the special pointers from the heap into the stack that scanobject is supposed to ignore.

This is definitely a problem. Since it's unrelated missing memory barriers, I opened issue #11267 to avoid cluttering this issue (of course, the solution may require barriers, but first we have to figure out what the solution is).

@dvyukov
Copy link
Member Author

dvyukov commented Jun 18, 2015

Yes, I think that if GC does not see invalid spans, it gives us much better foundation for future. And it will allow us to enable the invalid span debug check, which is I think a very good thing to have.

I think I constructed a case where GC does can see an invalid span (a span is freed, then a part of it is reallocated, then the other part of is allocated, and for that other part GC can see the original span descriptor with original bounds but new inuse state). I can well admit that there is something that prevents this scenario from happening. But as you said, it does not looks like a good foundation for future changes.

The following looks like a good scheme to solve these races:

  • Issue release barrier at the end of mallocgc for no-noscan objects (as you did in the cl). This provides visibility over bitmap and zero object state for no-noscan objects. For noscan objects bitmap must be already in proper state, and that state is visible to GC due to sweep->mark ordering (which must be ensured elsewhere).
  • Issue release barrier at the end of heap alloc. This provides visibility over the slot in h_spans, and part of visibility for arena_used (because there is no data-dependency on the pointer value on reader size).
  • Do the reload-if-invalid trick for arena_used in GC thread.

@dvyukov
Copy link
Member Author

dvyukov commented Jun 18, 2015

Yeah, stack pointers discovered by GC thread are an issue.

@aclements
Copy link
Member

The following looks like a good scheme to solve these races:

  • Issue release barrier at the end of mallocgc for no-noscan objects (as you did in the cl). This provides visibility over bitmap and zero object state for no-noscan objects. For noscan objects bitmap must be already in proper state, and that state is visible to GC due to sweep->mark ordering (which must be ensured elsewhere).
  • Issue release barrier at the end of heap alloc. This provides visibility over the slot in h_spans, and part of visibility for arena_used (because there is no data-dependency on the pointer value on reader size).

I had been thinking we would just do the release barrier at the end of mallocgc for both scan and noscan objects, but this works, too, and is probably slightly cheaper since the release barrier at the end of heap alloc won't happen on every allocation (even though this means two barriers on scan allocations that also allocate a span). The h_spans part of bullet 2 was the intent of https://go-review.googlesource.com/#/c/11081/, but you seemed to think that wasn't sufficient to deal with accesses to h_spans.

You're right that there's no data dependency on arena_used. I'd overlooked that.

  • Do the reload-if-invalid trick for arena_used in GC thread.

I'm not convinced that this part is necessary. It's okay if the GC thread comes across a pointer and incorrectly rejects it because it has a stale value of arena_used. The thread that published the pointer will invoke the write barrier, which will mark the object (there is a window between when the pointer is published and it is marked, but this is okay as long as it is eventually marked and this window cannot be preempted). This seems fairly fundamental to a Dijkstra-style write barrier.

@dvyukov
Copy link
Member Author

dvyukov commented Jun 19, 2015

I had been thinking we would just do the release barrier at the end of mallocgc for both scan and noscan objects

I don't see any point in making malloc slower.

even though this means two barriers on scan allocations that also allocate a span

I don't think it is any important performance-wise.

I'm not convinced that this part is necessary.

Your scheme works too.
My idea was to ensure the invariant that if GC sees a valid pointer then it always recognizes it as a valid pointer.

aclements added a commit that referenced this issue Jun 19, 2015
Currently its possible for the garbage collector to observe
uninitialized memory or stale heap bitmap bits on weakly ordered
architectures such as ARM and PPC. On such architectures, the stores
that zero newly allocated memory and initialize its heap bitmap may
move after a store in user code that makes the allocated object
observable by the garbage collector.

To fix this, add a "publication barrier" (also known as an "export
barrier") before returning from mallocgc. This is a store/store
barrier that ensures any write done by user code that makes the
returned object observable to the garbage collector will be ordered
after the initialization performed by mallocgc. No barrier is
necessary on the reading side because of the data dependency between
loading the pointer and loading the contents of the object.

Fixes one of the issues raised in #9984.

Change-Id: Ia3d96ad9c5fc7f4d342f5e05ec0ceae700cd17c8
Reviewed-on: https://go-review.googlesource.com/11083
Reviewed-by: Rick Hudson <[email protected]>
Reviewed-by: Dmitry Vyukov <[email protected]>
Reviewed-by: Minux Ma <[email protected]>
Reviewed-by: Martin Capitanio <[email protected]>
Reviewed-by: Russ Cox <[email protected]>
@aclements
Copy link
Member

I'm not convinced that this part is necessary.
Your scheme works too.
My idea was to ensure the invariant that if GC sees a valid pointer then it always recognizes it as a valid pointer.

I certainly like the idea of never seeing an invalid pointer in principle, but I worry that it would make the code uglier or slower or quite possibly both because of the current limitations of our inliner. Specifically, inheap and spanOf are currently inlined in several hot paths, and this would make them non-inlinable. This makes me sad, but it is what it is for now.

@aclements
Copy link
Member

  • Issue release barrier at the end of heap alloc. This provides visibility over the slot in h_spans, and part of visibility for arena_used (because there is no data-dependency on the pointer value on reader size).

Actually, isn't there already a release barrier at the end of heap alloc? h_spans is only manipulated under the heap lock, so the unlock should act as a release barrier.

@dvyukov
Copy link
Member Author

dvyukov commented Jun 20, 2015

There is. By coincidence. It deserves at least a bold comment. I can imagine that have a per-P heap cache; or sweep a span and reuse it for a different size class without accessing heap; or when heap is empty mmap and setup a new span without holding heap mutex.

aclements added a commit that referenced this issue Jun 22, 2015
The unsynchronized accesses to mheap_.arena_used in the concurrent
part of the garbage collector look like a problem waiting to happen.
In fact, they are safe, but the reason is somewhat subtle and
undocumented. This commit documents this reasoning.

Related to issue #9984.

Change-Id: Icdbf2329c1aa11dbe2396a71eb5fc2a85bd4afd5
Reviewed-on: https://go-review.googlesource.com/11254
Reviewed-by: Dmitry Vyukov <[email protected]>
@aclements
Copy link
Member

There is. By coincidence. It deserves at least a bold comment.

Definitely.

aclements added a commit that referenced this issue Jun 23, 2015
h_spans can be accessed concurrently without synchronization from
other threads, which means it needs the appropriate memory barriers on
weakly ordered machines. It happens to already have the necessary
memory barriers because all accesses to h_spans are currently
protected by the heap lock and the unlocks happen in exactly the
places where release barriers are needed, but it's easy to imagine
that this could change in the future. Document the fact that we're
depending on the barrier implied by the unlock.

Related to issue #9984.

Change-Id: I1bc3c95cd73361b041c8c95cd4bb92daf8c1f94a
Reviewed-on: https://go-review.googlesource.com/11361
Reviewed-by: Rick Hudson <[email protected]>
Reviewed-by: Dmitry Vyukov <[email protected]>
@aclements
Copy link
Member

I believe all of the issues in this issue have now been addressed. Thanks for sticking it out with me. :) Please reopen if you think there's something we didn't take care of.

@golang golang locked and limited conversation to collaborators Jun 25, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants