You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This optimizes deferproc and deferreturn in various ways.
The most important optimization is that it more carefully arranges to
prevent preemption or stack growth. Currently we do this by switching
to the system stack on every deferproc and every deferreturn. While we
need to be on the system stack for the slow path of allocating and
freeing defers, in the common case we can fit in the nosplit stack.
Hence, this change pushes the system stack switch down into the slow
paths and makes everything now exposed to the user stack nosplit. This
also eliminates the need for various acquirem/releasem pairs, since we
are now preventing preemption by preventing stack split checks.
As another smaller optimization, we special case the common cases of
zero-sized and pointer-sized defer frames to respectively skip the
copy and perform the copy in line instead of calling memmove.
This speeds up the runtime defer benchmark by 42%:
name old time/op new time/op delta
Defer-4 75.1ns ± 1% 43.3ns ± 1% -42.31% (p=0.000 n=8+10)
In reality, this speeds up defer by about 2.2X. The two benchmarks
below compare a Lock/defer Unlock pair (DeferLock) with a Lock/Unlock
pair (NoDeferLock). NoDeferLock establishes a baseline cost, so these
two benchmarks together show that this change reduces the overhead of
defer from 61.4ns to 27.9ns.
name old time/op new time/op delta
DeferLock-4 77.4ns ± 1% 43.9ns ± 1% -43.31% (p=0.000 n=10+10)
NoDeferLock-4 16.0ns ± 0% 15.9ns ± 0% -0.39% (p=0.000 n=9+8)
This also shaves 34ns off cgo calls:
name old time/op new time/op delta
CgoNoop-4 122ns ± 1% 88.3ns ± 1% -27.72% (p=0.000 n=8+9)
Updates #14939, #16051.
Change-Id: I2baa0dea378b7e4efebbee8fca919a97d5e15f38
Reviewed-on: https://go-review.googlesource.com/29656
Reviewed-by: Keith Randall <[email protected]>
0 commit comments