Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrator Change: Performance Tracking #13

Closed
ChrisRackauckas opened this issue Jan 28, 2017 · 23 comments
Closed

Integrator Change: Performance Tracking #13

ChrisRackauckas opened this issue Jan 28, 2017 · 23 comments

Comments

@ChrisRackauckas
Copy link
Member

ChrisRackauckas commented Jan 28, 2017

The integrator change is almost complete, but there is an issue with performance tracking. For the test problem I am using

using StochasticDiffEq, DiffEqProblemLibrary
srand(200)
prob = oval2ModelExample(largeFluctuations=true,useBigs=false)
quick_prob = deepcopy(prob)
quick_prob.tspan = (0.0,1.0)

using BenchmarkTools
@benchmark begin
  srand(100)
  sol = solve(quick_prob,SRIW1(),dt=(1/2)^(18),progress_steps=Int(1e5),
        adaptivealg=:RSwM3,progress=false,qmax=4,save_timeseries=false,
        timeseries_steps=1000,abstol=1e-5,reltol=1e-3)
end

To run these you need to be on bench2 for DiffEqBase.

I thoroughly checked the accuracy of the calculations. Except in early commits, everything calculates the exact same trajectory. For most commits, note that the branch needs to have a fix. There are two locations where sqrt needs to be updated: both in the rejections. See f3b0979 for an example.

So at any point, a branch can be made off a commit and benchmarks can occur. Here's what happens over time.

The "Base" branch is bench_pi: it's before the most recent changes, only uses the macros, and is the fastest.

  # Bench PI

  BenchmarkTools.Trial:
  memory estimate:  61.67 mb
  allocs estimate:  487561
  --------------
  minimum time:     45.840 ms (9.37% GC)
  median time:      48.309 ms (13.53% GC)
  mean time:        48.174 ms (13.15% GC)
  maximum time:     52.718 ms (7.41% GC)
  --------------
  samples:          104
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

  BenchmarkTools.Trial:
  memory estimate:  61.67 mb
  allocs estimate:  487562
  --------------
  minimum time:     44.596 ms (8.16% GC)
  median time:      46.701 ms (12.55% GC)
  mean time:        46.558 ms (12.04% GC)
  maximum time:     52.336 ms (6.89% GC)
  --------------
  samples:          108
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

Next we have bench_mid: it's most of the changes, but still lots of the @def macros

# Bench Mid

BenchmarkTools.Trial: 
  memory estimate:  61.26 mb
  allocs estimate:  461027
  --------------
  minimum time:     46.949 ms (8.61% GC)
  median time:      49.106 ms (12.10% GC)
  mean time:        48.942 ms (11.64% GC)
  maximum time:     54.735 ms (6.63% GC)
  --------------
  samples:          103
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

BenchmarkTools.Trial: 
  memory estimate:  61.26 mb
  allocs estimate:  461028
  --------------
  minimum time:     47.503 ms (8.61% GC)
  median time:      49.853 ms (12.14% GC)
  mean time:        49.618 ms (11.65% GC)
  maximum time:     55.661 ms (6.90% GC)
  --------------
  samples:          101
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

It's acceptably close to bench PI, though I would like to find out why it's slower if possible. After that it's bench_single_solve, where most of the macros are gone:

  # Bench Single Solve

BenchmarkTools.Trial: 
  memory estimate:  61.73 mb
  allocs estimate:  476536
  --------------
  minimum time:     48.410 ms (8.36% GC)
  median time:      50.787 ms (12.60% GC)
  mean time:        51.230 ms (12.67% GC)
  maximum time:     75.402 ms (16.45% GC)
  --------------
  samples:          98
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

BenchmarkTools.Trial: 
  memory estimate:  61.73 mb
  allocs estimate:  476537
  --------------
  minimum time:     47.717 ms (8.66% GC)
  median time:      50.336 ms (12.84% GC)
  mean time:        50.223 ms (12.43% GC)
  maximum time:     55.817 ms (6.87% GC)
  --------------
  samples:          100
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

Again, small performance loss, no idea why. Right after that is bench_accept_header, where now all of the solvers use the same solve! command and there's a new loop header. Here's where it gets super interesting. If I comment out one like, I get:

BenchmarkTools.Trial: 
  memory estimate:  61.73 mb
  allocs estimate:  476535
  --------------
  minimum time:     49.142 ms (8.27% GC)
  median time:      51.910 ms (12.56% GC)
  mean time:        54.284 ms (13.12% GC)
  maximum time:     99.443 ms (13.03% GC)
  --------------
  samples:          93
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

BenchmarkTools.Trial: 
  memory estimate:  61.73 mb
  allocs estimate:  476535
  --------------
  minimum time:     49.518 ms (7.91% GC)
  median time:      57.708 ms (13.50% GC)
  mean time:        57.122 ms (14.73% GC)
  maximum time:     85.435 ms (32.41% GC)
  --------------
  samples:          88
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

and then if I uncomment it out (the branch bench_accept_header_uncomment) then I get:

BenchmarkTools.Trial: 
  memory estimate:  62.67 mb
  allocs estimate:  538423
  --------------
  minimum time:     118.871 ms (3.78% GC)
  median time:      122.657 ms (5.41% GC)
  mean time:        122.573 ms (5.20% GC)
  maximum time:     126.492 ms (3.40% GC)
  --------------
  samples:          41
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

BenchmarkTools.Trial: 
  memory estimate:  62.67 mb
  allocs estimate:  538423
  --------------
  minimum time:     114.601 ms (3.86% GC)
  median time:      120.697 ms (5.60% GC)
  mean time:        120.593 ms (5.35% GC)
  maximum time:     126.930 ms (3.28% GC)
  --------------
  samples:          42
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

Line is just a check: if isempty(tstops). Then some features are added to get to the master branch. It benchmarks at:

BenchmarkTools.Trial: 
  memory estimate:  122.52 mb
  allocs estimate:  4460374
  --------------
  minimum time:     285.688 ms (3.92% GC)
  median time:      288.482 ms (4.54% GC)
  mean time:        288.774 ms (4.40% GC)
  maximum time:     293.832 ms (3.65% GC)
  --------------
  samples:          18
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

BenchmarkTools.Trial: 
  memory estimate:  122.52 mb
  allocs estimate:  4460374
  --------------
  minimum time:     281.921 ms (3.75% GC)
  median time:      284.899 ms (4.44% GC)
  mean time:        284.920 ms (4.26% GC)
  maximum time:     289.296 ms (3.51% GC)
  --------------
  samples:          18
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

Thus the performance regression is huge! I then go in an comment out some if isempty(tstops) conditionals and get the branch fast_master which gives:

BenchmarkTools.Trial: 
  memory estimate:  120.87 mb
  allocs estimate:  4352465
  --------------
  minimum time:     160.632 ms (6.72% GC)
  median time:      163.469 ms (7.91% GC)
  mean time:        163.188 ms (7.85% GC)
  maximum time:     169.087 ms (6.31% GC)
  --------------
  samples:          31
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

BenchmarkTools.Trial: 
  memory estimate:  120.87 mb
  allocs estimate:  4352465
  --------------
  minimum time:     158.549 ms (6.58% GC)
  median time:      161.241 ms (7.50% GC)
  mean time:        161.133 ms (7.59% GC)
  maximum time:     167.428 ms (6.21% GC)
  --------------
  samples:          32
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

The last branch is features which adds some features to master, but isn't worth looking at until it's found out why master is so slow and how to get around it.

@ChrisRackauckas
Copy link
Member Author

ChrisRackauckas commented Jan 28, 2017

Note that these same exact changes and design were used in OrdinaryDiffEq.jl and the pre vs post change benchmarks have little to no difference (with the post-change benchmarks being like 2% better). So I have no idea why it would fail so bad here. Though I did see some odd behavior like this. What I did there was, whenever weird stuff like this happened (I saw the tstop thing there), I just gave up an restarted back at where the benchmarks were fine. I am hoping to find out what's actually the case here, and not lose all of this work.

@ChrisRackauckas
Copy link
Member Author

While this all is frustrating, what I can say is, I can prove that they are all solving the same problem, and I did extensive testing at the master branch to show that this is once again a correct method (with the sqdt fixes). So it's just performance that's the problem.

@ChrisRackauckas
Copy link
Member Author

ChrisRackauckas commented Jan 28, 2017

DataStructures.jl had non-stict typing which caused a type-instability:

https://github.com/JuliaLang/DataStructures.jl/blob/v0.5.2/src/heaps/binary_heap.jl#L107

After fixing this, bench_accept_header_uncomment goes down significantly:

BenchmarkTools.Trial: 
  memory estimate:  61.73 mb
  allocs estimate:  476530
  --------------
  minimum time:     48.717 ms (7.82% GC)
  median time:      52.125 ms (13.19% GC)
  mean time:        51.902 ms (12.68% GC)
  maximum time:     57.910 ms (18.68% GC)
  --------------
  samples:          97
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

A PR is being made. Note that this puts master at:

BenchmarkTools.Trial: 
  memory estimate:  120.87 mb
  allocs estimate:  4352456
  --------------
  minimum time:     168.849 ms (6.89% GC)
  median time:      171.353 ms (8.04% GC)
  mean time:        171.250 ms (7.89% GC)
  maximum time:     176.079 ms (6.44% GC)
  --------------
  samples:          30
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

@ChrisRackauckas
Copy link
Member Author

ChrisRackauckas commented Jan 28, 2017

There was a changed to chunkedarrays. To update, change like this:

77f7bce

The test

srand(100)
sol = solve(quick_prob,SRIW1(),dt=(1/2)^(18),progress_steps=Int(1e5),
      adaptivealg=:RSwM3,progress=false,qmax=4,save_timeseries=false,
      timeseries_steps=1000,abstol=1e-5,reltol=1e-3)
println(sol.u[end])

shows that using ChunkedArrays branch faster solves the same problem, with solution:

[0.011503,0.939809,0.00312214,0.00155873,0.0172325,0.0577331,0.237757,0.00134921,0.000238022,4.19916e-5,7.40824e-6,1.307e-6,0.0621091,1.24463,0.0483949,199.901,137.457,0.0177237,0.132583]

with this, we now have (note: was not able to reproduce this fast of a timing with stricter benchmarking parameters)

BenchmarkTools.Trial: 
  memory estimate:  54.25 mb
  allocs estimate:  373479
  --------------
  minimum time:     38.878 ms (8.52% GC)
  median time:      39.825 ms (9.81% GC)
  mean time:        40.278 ms (11.12% GC)
  maximum time:     47.821 ms (7.06% GC)
  --------------
  samples:          125
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

on bench_pi, and on fast_master we have

BenchmarkTools.Trial: 
  memory estimate:  113.45 mb
  allocs estimate:  4238383
  --------------
  minimum time:     151.883 ms (6.04% GC)
  median time:      153.031 ms (6.59% GC)
  mean time:        153.426 ms (6.66% GC)
  maximum time:     160.144 ms (5.64% GC)
  --------------
  samples:          33
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

while on master we have:

BenchmarkTools.Trial: 
  memory estimate:  113.45 mb
  allocs estimate:  4238383
  --------------
  minimum time:     161.828 ms (6.00% GC)
  median time:      163.158 ms (6.55% GC)
  mean time:        163.833 ms (6.59% GC)
  maximum time:     174.442 ms (6.87% GC)
  --------------
  samples:          31
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

@ChrisRackauckas
Copy link
Member Author

Declaring typing on dWtmp dZtmp dttmp did nothing on master:

BenchmarkTools.Trial: 
  memory estimate:  113.45 mb
  allocs estimate:  4238383
  --------------
  minimum time:     162.326 ms (6.06% GC)
  median time:      163.283 ms (6.57% GC)
  mean time:        163.773 ms (6.68% GC)
  maximum time:     171.105 ms (5.63% GC)
  --------------
  samples:          31
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

@ChrisRackauckas
Copy link
Member Author

ChrisRackauckas commented Jan 28, 2017

Fully declared typing on bench_pi gets it to: (note: was not able recreate these fast timings with stricter benchmarks)

BenchmarkTools.Trial: 
  memory estimate:  54.25 mb
  allocs estimate:  373479
  --------------
  minimum time:     39.818 ms (10.01% GC)
  median time:      40.497 ms (10.62% GC)
  mean time:        41.059 ms (11.72% GC)
  maximum time:     48.047 ms (7.27% GC)
  --------------
  samples:          122
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

This is the timing to reach for.

@ChrisRackauckas
Copy link
Member Author

ChrisRackauckas commented Jan 28, 2017

Chunked changes for bench_single_solve:

BenchmarkTools.Trial: 
  memory estimate:  54.31 mb
  allocs estimate:  362451
  --------------
  minimum time:     42.992 ms (10.49% GC)
  median time:      45.045 ms (10.69% GC)
  mean time:        44.504 ms (12.45% GC)
  maximum time:     50.782 ms (8.08% GC)
  --------------
  samples:          113
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

Chunked changes for bench_accept_header

BenchmarkTools.Trial: 
  memory estimate:  54.31 mb
  allocs estimate:  362457
  --------------
  minimum time:     42.437 ms (9.58% GC)
  median time:      43.796 ms (10.15% GC)
  mean time:        43.973 ms (11.75% GC)
  maximum time:     50.403 ms (7.37% GC)
  --------------
  samples:          114
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

and accept_header_uncommented

BenchmarkTools.Trial: 
  memory estimate:  54.31 mb
  allocs estimate:  362457
  --------------
  minimum time:     42.197 ms (8.79% GC)
  median time:      43.176 ms (9.94% GC)
  mean time:        43.709 ms (11.41% GC)
  maximum time:     50.080 ms (7.48% GC)
  --------------
  samples:          115
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

Looks like the difference between single_solve and accept_header is now gone. This makes the branch ordering was:

bench_pi
bench_mid
bench_single_solve
bench_accept_header
master / fast_master
features

Now should check what changes at mid and between accept_header to master.

Deleting single_solve and accept_header_uncommented

@ChrisRackauckas
Copy link
Member Author

brench_post_accept is the first merge after bench_accept_header, and it has:

BenchmarkTools.Trial: 
  memory estimate:  113.69 mb
  allocs estimate:  4253848
  --------------
  minimum time:     167.096 ms (6.10% GC)
  median time:      168.456 ms (6.74% GC)
  mean time:        168.720 ms (6.82% GC)
  maximum time:     175.747 ms (5.93% GC)
  --------------
  samples:          30
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

@ChrisRackauckas
Copy link
Member Author

master

BenchmarkTools.Trial: 
  memory estimate:  113.45 mb
  allocs estimate:  4238383
  --------------
  minimum time:     164.145 ms (6.51% GC)
  median time:      165.540 ms (6.76% GC)
  mean time:        166.303 ms (7.07% GC)
  maximum time:     175.030 ms (11.25% GC)
  --------------
  samples:          31
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

fast_master

BenchmarkTools.Trial: 
  memory estimate:  113.45 mb
  allocs estimate:  4238383
  --------------
  minimum time:     157.001 ms (8.04% GC)
  median time:      158.338 ms (8.44% GC)
  mean time:        160.537 ms (8.47% GC)
  maximum time:     184.700 ms (7.34% GC)
  --------------
  samples:          32
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

@ChrisRackauckas
Copy link
Member Author

The inbounds was the difference on master vs fast_master. New fast master without any commented out tstops parts:

BenchmarkTools.Trial: 
  memory estimate:  113.45 mb
  allocs estimate:  4238383
  --------------
  minimum time:     151.013 ms (6.45% GC)
  median time:      152.661 ms (6.88% GC)
  mean time:        157.836 ms (7.68% GC)
  maximum time:     191.556 ms (8.76% GC)
  --------------
  samples:          32
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

merging into master

@ChrisRackauckas
Copy link
Member Author

Now the branches are:

bench_pi
bench_mid
bench_accept_header
bench_post_accept
master
features

@ChrisRackauckas
Copy link
Member Author

bench_pi

BenchmarkTools.Trial: 
  memory estimate:  54.25 mb
  allocs estimate:  373479
  --------------
  minimum time:     47.844 ms (7.67% GC)
  median time:      48.733 ms (8.72% GC)
  mean time:        49.191 ms (8.39% GC)
  maximum time:     52.673 ms (8.52% GC)
  --------------
  samples:          11
  evals/sample:     1
  time tolerance:   1.00%
  memory tolerance: 1.00%

bench_mid

  BenchmarkTools.Trial: 
  memory estimate:  53.84 mb
  allocs estimate:  346942
  --------------
  minimum time:     47.779 ms (6.37% GC)
  median time:      47.927 ms (6.40% GC)
  mean time:        48.061 ms (6.49% GC)
  maximum time:     49.012 ms (7.23% GC)
  --------------
  samples:          12
  evals/sample:     1
  time tolerance:   1.00%
  memory tolerance: 1.00%

With these changes, going all of the way back to bench_pi is no longer necessary. Deleting branch.

@ChrisRackauckas
Copy link
Member Author

bench_accept_header with gc running between samples to make it more accurate:

BenchmarkTools.Trial: 
  memory estimate:  54.31 mb
  allocs estimate:  362457
  --------------
  minimum time:     48.888 ms (6.41% GC)
  median time:      49.069 ms (6.31% GC)
  mean time:        49.112 ms (6.33% GC)
  maximum time:     49.363 ms (6.35% GC)
  --------------
  samples:          12
  evals/sample:     1
  time tolerance:   1.00%
  memory tolerance: 1.00%

@ChrisRackauckas
Copy link
Member Author

Very strict tolerance benchmark:

bench_mid

  BenchmarkTools.Trial: 
  memory estimate:  53.84 mb
  allocs estimate:  346942
  --------------
  minimum time:     48.785 ms (6.33% GC)
  median time:      49.196 ms (6.61% GC)
  mean time:        49.152 ms (6.63% GC)
  maximum time:     49.326 ms (6.49% GC)
  --------------
  samples:          12
  evals/sample:     1
  time tolerance:   0.01%
  memory tolerance: 1.00%

bench_accept_header

BenchmarkTools.Trial: 
  memory estimate:  54.31 mb
  allocs estimate:  362457
  --------------
  minimum time:     49.319 ms (6.18% GC)
  median time:      49.427 ms (6.27% GC)
  mean time:        49.443 ms (6.26% GC)
  maximum time:     49.616 ms (6.46% GC)
  --------------
  samples:          11
  evals/sample:     1
  time tolerance:   0.01%
  memory tolerance: 1.00%

the difference is due to a slightly higher startup time to capture all of the new features, but that also is due to a dynamic dispatch which is fixed later. Thus bench_mid is done, going forward just from bench_accept_header

@ChrisRackauckas
Copy link
Member Author

ChrisRackauckas commented Jan 28, 2017

Current setup:

With stricter benchmarks and some improvements to outlying libraries, the difference between bench_pi and bench_accept_header eroded. Now the timing to match is accept header, which is shown above. The branches are

bench_accept_header
bench_post_accept
master
features

with faster on ChunkedArrays.

post_accept is the merge just after accept_header, and it's timings are:

BenchmarkTools.Trial: 
  memory estimate:  113.69 mb
  allocs estimate:  4253848
  --------------
  minimum time:     164.253 ms (5.09% GC)
  median time:      164.395 ms (5.10% GC)
  mean time:        164.843 ms (5.29% GC)
  maximum time:     168.654 ms (6.83% GC)
  --------------
  samples:          9
  evals/sample:     1
  time tolerance:   0.01%
  memory tolerance: 1.00%

while current master is:

BenchmarkTools.Trial: 
  memory estimate:  113.45 mb
  allocs estimate:  4238383
  --------------
  minimum time:     159.332 ms (5.33% GC)
  median time:      159.459 ms (5.35% GC)
  mean time:        159.488 ms (5.36% GC)
  maximum time:     159.647 ms (5.35% GC)
  --------------
  samples:          9
  evals/sample:     1
  time tolerance:   0.01%
  memory tolerance: 1.00%

So the main difference is found somewhere between accept_header and post_accept.

@ChrisRackauckas
Copy link
Member Author

There are 3 commits between bench_accept_header. bench_commit_one is the first commit:

BenchmarkTools.Trial: 
  memory estimate:  54.55 mb
  allocs estimate:  377929
  --------------
  minimum time:     58.250 ms (6.00% GC)
  median time:      58.686 ms (6.40% GC)
  mean time:        59.259 ms (6.60% GC)
  maximum time:     65.205 ms (7.59% GC)
  --------------
  samples:          10
  evals/sample:     1
  time tolerance:   0.01%
  memory tolerance: 1.00%

ouch, already a decent performance hit. Then bench_commit_two has

BenchmarkTools.Trial: 
  memory estimate:  113.69 mb
  allocs estimate:  4253848
  --------------
  minimum time:     176.014 ms (4.72% GC)
  median time:      176.255 ms (4.73% GC)
  mean time:        176.223 ms (4.73% GC)
  maximum time:     176.321 ms (4.72% GC)
  --------------
  samples:          9
  evals/sample:     1
  time tolerance:   0.01%
  memory tolerance: 1.00%

Then commit 3 is bench_post_accept. Commit 2 is the main problem.

@ChrisRackauckas
Copy link
Member Author

Just by inlining and inboundsing I get the timing down for commit one:

BenchmarkTools.Trial: 
  memory estimate:  54.55 mb
  allocs estimate:  377929
  --------------
  minimum time:     48.251 ms (6.50% GC)
  median time:      48.362 ms (6.57% GC)
  mean time:        49.361 ms (6.42% GC)
  maximum time:     60.290 ms (5.31% GC)
  --------------
  samples:          12
  evals/sample:     1
  time tolerance:   0.01%
  memory tolerance: 1.00%

@ChrisRackauckas
Copy link
Member Author

ChrisRackauckas commented Jan 28, 2017

Current state:

bench_accept_header is good
the first commit after is bench_commit_one which has benchmark just above which matches bench_accept_header

Then, with inlining and inbounding, bench_commit_two is

BenchmarkTools.Trial: 
  memory estimate:  113.69 mb
  allocs estimate:  4253848
  --------------
  minimum time:     175.758 ms (5.31% GC)
  median time:      176.086 ms (5.33% GC)
  mean time:        176.034 ms (5.35% GC)
  maximum time:     176.323 ms (5.32% GC)
  --------------
  samples:          9
  evals/sample:     1
  time tolerance:   0.01%
  memory tolerance: 1.00%

which is even worse than bench_post_accept. So the bad change is in this commit. Then current master is slightly better.

Running with PR-Heap on DataStructures (my fork) and faster on ChunkedArrays.

@ChrisRackauckas
Copy link
Member Author

BenchmarkTools.Trial: 
  memory estimate:  113.69 mb
  allocs estimate:  4253848
  --------------
  minimum time:     159.439 ms (5.45% GC)
  median time:      160.165 ms (5.45% GC)
  mean time:        160.460 ms (5.48% GC)
  maximum time:     163.774 ms (5.66% GC)
  --------------
  samples:          9
  evals/sample:     1
  time tolerance:   0.01%
  memory tolerance: 1.00%

is after cleaning every single change. the total diff is the following:

adding Roots to REQUIRE
using Roots in the main file.

...

@ChrisRackauckas
Copy link
Member Author

ChrisRackauckas commented Jan 28, 2017

adding using Roots to bench_commit_one gives:

BenchmarkTools.Trial: 
  memory estimate:  113.69 mb
  allocs estimate:  4253848
  --------------
  minimum time:     161.812 ms (5.43% GC)
  median time:      161.964 ms (5.45% GC)
  mean time:        161.962 ms (5.45% GC)
  maximum time:     162.081 ms (5.44% GC)
  --------------
  samples:          9
  evals/sample:     1
  time tolerance:   0.01%
  memory tolerance: 1.00%

We hit JuliaLang/julia#18465

@ChrisRackauckas
Copy link
Member Author

Removing using Roots brings master to:

BenchmarkTools.Trial: 
  memory estimate:  54.31 mb
  allocs estimate:  362458
  --------------
  minimum time:     47.875 ms (6.31% GC)
  median time:      47.901 ms (6.32% GC)
  mean time:        47.919 ms (6.33% GC)
  maximum time:     48.006 ms (6.32% GC)
  --------------
  samples:          12
  evals/sample:     1
  time tolerance:   0.01%
  memory tolerance: 1.00%

@ChrisRackauckas
Copy link
Member Author

Small known performance hit due to using copyat_or_push!

BenchmarkTools.Trial: 
  memory estimate:  54.31 mb
  allocs estimate:  362450
  --------------
  minimum time:     49.798 ms (6.41% GC)
  median time:      49.877 ms (6.42% GC)
  mean time:        49.918 ms (6.47% GC)
  maximum time:     50.250 ms (6.51% GC)
  --------------
  samples:          12
  evals/sample:     1
  time tolerance:   0.01%
  memory tolerance: 1.00%

@ChrisRackauckas
Copy link
Member Author

Closing this since this part is solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant