Skip to content

Commit 17db6e0

Browse files
committed
runtime: use heap scan size as estimate of GC scan work
Currently, the GC uses a moving average of recent scan work ratios to estimate the total scan work required by this cycle. This is in turn used to compute how much scan work should be done by mutators when they allocate in order to perform all expected scan work by the time the allocated heap reaches the heap goal. However, our current scan work estimate can be arbitrarily wrong if the heap topography changes significantly from one cycle to the next. For example, in the go1 benchmarks, at the beginning of each benchmark, the heap is dominated by a 256MB no-scan object, so the GC learns that the scan density of the heap is very low. In benchmarks that then rapidly allocate pointer-dense objects, by the time of the next GC cycle, our estimate of the scan work can be too low by a large factor. This in turn lets the mutator allocate faster than the GC can collect, allowing it to get arbitrarily far ahead of the scan work estimate, which leads to very long GC cycles with very little mutator assist that can overshoot the heap goal by large margins. This is particularly easy to demonstrate with BinaryTree17: $ GODEBUG=gctrace=1 ./go1.test -test.bench BinaryTree17 gc #1 @0.017s 2%: 0+0+0+0+0 ms clock, 0+0+0+0/0/0+0 ms cpu, 4->262->262 MB, 4 MB goal, 1 P gc #2 @0.026s 3%: 0+0+0+0+0 ms clock, 0+0+0+0/0/0+0 ms cpu, 262->262->262 MB, 524 MB goal, 1 P testing: warning: no tests to run PASS BenchmarkBinaryTree17 gc #3 @1.906s 0%: 0+0+0+0+7 ms clock, 0+0+0+0/0/0+7 ms cpu, 325->325->287 MB, 325 MB goal, 1 P (forced) gc #4 @12.203s 20%: 0+0+0+10067+10 ms clock, 0+0+0+0/2523/852+10 ms cpu, 430->2092->1950 MB, 574 MB goal, 1 P 1 9150447353 ns/op Change this estimate to instead use the *current* scannable heap size. This has the advantage of being based solely on the current state of the heap, not on past densities or reachable heap sizes, so it isn't susceptible to falling behind during these sorts of phase changes. This is strictly an over-estimate, but it's better to over-estimate and get more assist than necessary than it is to under-estimate and potentially spiral out of control. Experiments with scaling this estimate back showed no obvious benefit for mutator utilization, heap size, or assist time. This new estimate has little effect for most benchmarks, including most go1 benchmarks, x/benchmarks, and the 6g benchmark. It has a huge effect for benchmarks that triggered the bad pacer behavior: name old mean new mean delta BinaryTree17 10.0s × (1.00,1.00) 3.5s × (0.98,1.01) -64.93% (p=0.000) Fannkuch11 2.74s × (1.00,1.01) 2.65s × (1.00,1.00) -3.52% (p=0.000) FmtFprintfEmpty 56.4ns × (0.99,1.00) 57.8ns × (1.00,1.01) +2.43% (p=0.000) FmtFprintfString 187ns × (0.99,1.00) 185ns × (0.99,1.01) -1.19% (p=0.010) FmtFprintfInt 184ns × (1.00,1.00) 183ns × (1.00,1.00) (no variance) FmtFprintfIntInt 321ns × (1.00,1.00) 315ns × (1.00,1.00) -1.80% (p=0.000) FmtFprintfPrefixedInt 266ns × (1.00,1.00) 263ns × (1.00,1.00) -1.22% (p=0.000) FmtFprintfFloat 353ns × (1.00,1.00) 353ns × (1.00,1.00) -0.13% (p=0.035) FmtManyArgs 1.21µs × (1.00,1.00) 1.19µs × (1.00,1.00) -1.33% (p=0.000) GobDecode 9.69ms × (1.00,1.00) 9.59ms × (1.00,1.00) -1.07% (p=0.000) GobEncode 7.89ms × (0.99,1.01) 7.74ms × (1.00,1.00) -1.92% (p=0.000) Gzip 391ms × (1.00,1.00) 392ms × (1.00,1.00) ~ (p=0.522) Gunzip 97.1ms × (1.00,1.00) 97.0ms × (1.00,1.00) -0.10% (p=0.000) HTTPClientServer 55.7µs × (0.99,1.01) 56.7µs × (0.99,1.01) +1.81% (p=0.001) JSONEncode 19.1ms × (1.00,1.00) 19.0ms × (1.00,1.00) -0.85% (p=0.000) JSONDecode 66.8ms × (1.00,1.00) 66.9ms × (1.00,1.00) ~ (p=0.288) Mandelbrot200 4.13ms × (1.00,1.00) 4.12ms × (1.00,1.00) -0.08% (p=0.000) GoParse 3.97ms × (1.00,1.01) 4.01ms × (1.00,1.00) +0.99% (p=0.000) RegexpMatchEasy0_32 114ns × (1.00,1.00) 115ns × (0.99,1.00) ~ (p=0.070) RegexpMatchEasy0_1K 376ns × (1.00,1.00) 376ns × (1.00,1.00) ~ (p=0.900) RegexpMatchEasy1_32 94.9ns × (1.00,1.00) 96.3ns × (1.00,1.01) +1.53% (p=0.001) RegexpMatchEasy1_1K 568ns × (1.00,1.00) 567ns × (1.00,1.00) -0.22% (p=0.001) RegexpMatchMedium_32 159ns × (1.00,1.00) 159ns × (1.00,1.00) ~ (p=0.178) RegexpMatchMedium_1K 46.4µs × (1.00,1.00) 46.6µs × (1.00,1.00) +0.29% (p=0.000) RegexpMatchHard_32 2.37µs × (1.00,1.00) 2.37µs × (1.00,1.00) ~ (p=0.722) RegexpMatchHard_1K 71.1µs × (1.00,1.00) 71.2µs × (1.00,1.00) ~ (p=0.229) Revcomp 565ms × (1.00,1.00) 562ms × (1.00,1.00) -0.52% (p=0.000) Template 81.0ms × (1.00,1.00) 80.2ms × (1.00,1.00) -0.97% (p=0.000) TimeParse 380ns × (1.00,1.00) 380ns × (1.00,1.00) ~ (p=0.148) TimeFormat 405ns × (0.99,1.00) 385ns × (0.99,1.00) -5.00% (p=0.000) Change-Id: I11274158bf3affaf62662e02de7af12d5fb789e4 Reviewed-on: https://go-review.googlesource.com/9696 Reviewed-by: Russ Cox <[email protected]> Run-TryBot: Austin Clements <[email protected]>
1 parent 3be3cbd commit 17db6e0

File tree

1 file changed

+15
-38
lines changed

1 file changed

+15
-38
lines changed

src/runtime/mgc.go

+15-38
Original file line numberDiff line numberDiff line change
@@ -238,13 +238,6 @@ const (
238238
// GOMAXPROCS. The high-level design of this algorithm is documented
239239
// at http://golang.org/s/go15gcpacing.
240240
var gcController = gcControllerState{
241-
// Initial work ratio guess.
242-
//
243-
// TODO(austin): This is based on the work ratio of the
244-
// compiler on ./all.bash. Run a wider variety of programs and
245-
// see what their work ratios are.
246-
workRatioAvg: 0.5 / float64(ptrSize),
247-
248241
// Initial trigger ratio guess.
249242
triggerRatio: 7 / 8.0,
250243
}
@@ -254,6 +247,10 @@ type gcControllerState struct {
254247
// is updated atomically during the cycle. Updates may be
255248
// batched arbitrarily, since the value is only read at the
256249
// end of the cycle.
250+
//
251+
// Currently this is the bytes of heap scanned. For most uses,
252+
// this is an opaque unit of work, but for estimation the
253+
// definition is important.
257254
scanWork int64
258255

259256
// bgScanCredit is the scan work credit accumulated by the
@@ -299,10 +296,6 @@ type gcControllerState struct {
299296
// dedicated mark workers get started.
300297
dedicatedMarkWorkersNeeded int64
301298

302-
// workRatioAvg is a moving average of the scan work ratio
303-
// (scan work per byte marked).
304-
workRatioAvg float64
305-
306299
// assistRatio is the ratio of allocated bytes to scan work
307300
// that should be performed by mutator assists. This is
308301
// computed at the beginning of each cycle.
@@ -399,21 +392,16 @@ func (c *gcControllerState) startCycle() {
399392
// improved estimates. This should be called periodically during
400393
// concurrent mark.
401394
func (c *gcControllerState) revise() {
402-
// Estimate the size of the marked heap. We don't have much to
403-
// go on, so at the beginning of the cycle this uses the
404-
// marked heap size from last cycle. If the reachable heap has
405-
// grown since last cycle, we'll eventually mark more than
406-
// this and we can revise our estimate. This way, if we
407-
// overshoot our initial estimate, the assist ratio will climb
408-
// smoothly and put more pressure on mutator assists to finish
409-
// the cycle.
410-
heapMarkedEstimate := memstats.heap_marked
411-
if heapMarkedEstimate < work.bytesMarked {
412-
heapMarkedEstimate = work.bytesMarked
413-
}
414-
415-
// Compute the expected work based on this estimate.
416-
scanWorkExpected := uint64(float64(heapMarkedEstimate) * c.workRatioAvg)
395+
// Compute the expected scan work. This is a strict upper
396+
// bound on the possible scan work in the current heap.
397+
//
398+
// You might consider dividing this by 2 (or by
399+
// (100+GOGC)/100) to counter this over-estimation, but
400+
// benchmarks show that this has almost no effect on mean
401+
// mutator utilization, heap size, or assist time and it
402+
// introduces the danger of under-estimating and letting the
403+
// mutator outpace the garbage collector.
404+
scanWorkExpected := memstats.heap_scan
417405

418406
// Compute the mutator assist ratio so by the time the mutator
419407
// allocates the remaining heap bytes up to next_gc, it will
@@ -443,9 +431,6 @@ func (c *gcControllerState) endCycle() {
443431
// transient changes. Values near 1 may be unstable.
444432
const triggerGain = 0.5
445433

446-
// EWMA weight given to this cycle's scan work ratio.
447-
const workRatioWeight = 0.75
448-
449434
// Stop the revise timer
450435
deltimer(&c.reviseTimer)
451436

@@ -484,12 +469,6 @@ func (c *gcControllerState) endCycle() {
484469
c.triggerRatio = goalGrowthRatio * 0.95
485470
}
486471

487-
// Compute the scan work ratio for this cycle.
488-
workRatio := float64(c.scanWork) / float64(work.bytesMarked)
489-
490-
// Update EWMA of recent scan work ratios.
491-
c.workRatioAvg = workRatioWeight*workRatio + (1-workRatioWeight)*c.workRatioAvg
492-
493472
if debug.gcpacertrace > 0 {
494473
// Print controller state in terms of the design
495474
// document.
@@ -502,14 +481,12 @@ func (c *gcControllerState) endCycle() {
502481
u_a := utilization
503482
u_g := gcGoalUtilization
504483
W_a := c.scanWork
505-
w_a := workRatio
506-
w_ewma := c.workRatioAvg
507484
print("pacer: H_m_prev=", H_m_prev,
508485
" h_t=", h_t, " H_T=", H_T,
509486
" h_a=", h_a, " H_a=", H_a,
510487
" h_g=", h_g, " H_g=", H_g,
511488
" u_a=", u_a, " u_g=", u_g,
512-
" W_a=", W_a, " w_a=", w_a, " w_ewma=", w_ewma,
489+
" W_a=", W_a,
513490
" goalΔ=", goalGrowthRatio-h_t,
514491
" actualΔ=", h_a-h_t,
515492
" u_a/u_g=", u_a/u_g,

0 commit comments

Comments
 (0)