-
-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrator Change: Performance Tracking #13
Comments
Note that these same exact changes and design were used in OrdinaryDiffEq.jl and the pre vs post change benchmarks have little to no difference (with the post-change benchmarks being like 2% better). So I have no idea why it would fail so bad here. Though I did see some odd behavior like this. What I did there was, whenever weird stuff like this happened (I saw the |
While this all is frustrating, what I can say is, I can prove that they are all solving the same problem, and I did extensive testing at the master branch to show that this is once again a correct method (with the |
DataStructures.jl had non-stict typing which caused a type-instability: After fixing this,
A PR is being made. Note that this puts master at:
|
There was a changed to chunkedarrays. To update, change like this: The test
shows that using ChunkedArrays branch
with this, we now have (note: was not able to reproduce this fast of a timing with stricter benchmarking parameters)
on
while on
|
Declaring typing on dWtmp dZtmp dttmp did nothing on master:
|
Fully declared typing on
This is the timing to reach for. |
Chunked changes for
Chunked changes for
and accept_header_uncommented
Looks like the difference between single_solve and accept_header is now gone. This makes the branch ordering was: bench_pi Now should check what changes at mid and between accept_header to master. Deleting single_solve and accept_header_uncommented |
|
|
The inbounds was the difference on
merging into master |
Now the branches are: bench_pi |
bench_pi
bench_mid
With these changes, going all of the way back to bench_pi is no longer necessary. Deleting branch. |
bench_accept_header with gc running between samples to make it more accurate:
|
Very strict tolerance benchmark: bench_mid
bench_accept_header
the difference is due to a slightly higher startup time to capture all of the new features, but that also is due to a dynamic dispatch which is fixed later. Thus bench_mid is done, going forward just from bench_accept_header |
Current setup: With stricter benchmarks and some improvements to outlying libraries, the difference between bench_accept_header with post_accept is the merge just after accept_header, and it's timings are:
while current master is:
So the main difference is found somewhere between accept_header and post_accept. |
There are 3 commits between
ouch, already a decent performance hit. Then
Then commit 3 is |
Just by inlining and inboundsing I get the timing down for commit one:
|
Current state:
Then, with inlining and inbounding,
which is even worse than Running with |
is after cleaning every single change. the total diff is the following: adding Roots to REQUIRE ... |
adding
We hit JuliaLang/julia#18465 |
Removing
|
Small known performance hit due to using copyat_or_push!
|
Closing this since this part is solved. |
The integrator change is almost complete, but there is an issue with performance tracking. For the test problem I am using
To run these you need to be on
bench2
for DiffEqBase.I thoroughly checked the accuracy of the calculations. Except in early commits, everything calculates the exact same trajectory. For most commits, note that the branch needs to have a fix. There are two locations where
sqrt
needs to be updated: both in the rejections. See f3b0979 for an example.So at any point, a branch can be made off a commit and benchmarks can occur. Here's what happens over time.
The "Base" branch is
bench_pi
: it's before the most recent changes, only uses the macros, and is the fastest.Next we have
bench_mid
: it's most of the changes, but still lots of the@def
macrosIt's acceptably close to bench PI, though I would like to find out why it's slower if possible. After that it's
bench_single_solve
, where most of the macros are gone:Again, small performance loss, no idea why. Right after that is
bench_accept_header
, where now all of the solvers use the samesolve!
command and there's a new loop header. Here's where it gets super interesting. If I comment out one like, I get:and then if I uncomment it out (the branch
bench_accept_header_uncomment
) then I get:Line is just a check:
if isempty(tstops)
. Then some features are added to get to the master branch. It benchmarks at:Thus the performance regression is huge! I then go in an comment out some
if isempty(tstops)
conditionals and get the branchfast_master
which gives:The last branch is
features
which adds some features to master, but isn't worth looking at until it's found out why master is so slow and how to get around it.The text was updated successfully, but these errors were encountered: