-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regression in the past 9 days #118501
Comments
Did some more testing; removing |
Bisects to #110303, which does have an apparent perf impact. Interestingly that PR seems to improve performance on my machine (Intel i7-6700K / Arch Linux), so maybe we can blame this all on inlining changes? before:
after:
|
Can you share what you used to bisect this? Running the benchmarks myself on a quiet system with fixed CPU frequency, I can get either the slow number or the fast number just by running the benchmark executable repeatedly. The amount of spurious changes in the results that I am seeing is pretty typical for using the built-in benchmarking. Based on my own investigation, I do not think there has been a codegen change. The two benchmarks are completely inlined into a single function, so I highly doubt there was a relevant inlining change. @RivenSkaye Running |
First, Okay, so I've run the benchmarks on a loop before opening the report, and it pretty consistently reported averages within 800ns from one another across runs. My system is a tad more noisy than I'd like, but I'm still getting pretty close results across runs and usually ~2µs difference. There are outliers on both but that's to be expected. Though I do understand I can't come in saying coarse benchmarks report slightly different numbers and expect a magical fix, so I come bearing gifts. And I come bearing questions, as I'd like to know what, if anything, I could do myself to further investigate possible differences in the output files? The gifts: The output of |
My best guess then is that somehow your system ends up being very sensitive to code alignment in a way that mine isn't. I've disassembled both If you want to diff the output of objdump, you'll need to strip out the instruction addresses. If you can find a diff in the resulting instructions under the toolchains, that might indicate this perf difference is due to something more than lucky/unlucky alignment. |
I used I also tried testing it with criterion now and around the same results initially. criterion test here However, I found out something interesting: sticking
The only way i can explain these results is some hardware prefetch gone wrong. |
Maybe try running this with |
WG-prioritization assigning priority (Zulip discussion). @rustbot label -I-prioritize +P-medium +I-slow |
Okay, sorry for stalling so long. There's loads of differences other than addresses, mostly in ordering within the output and the remainder in code outside of my control (and also not in places that would impact the benchmarks). Decided to run against yesterday's nightly too for completeness sake and got results similar to the old situation; taking a peek at the objdump from that reveals slightly improved codegen with ordering very similar to the slower output. No matter the case, sorry for the wasted time as this seemed to have indeed been an issue to do with either alignment or binary layout. Whatever the case, this isn't on the |
I'm doing the AoC and I figured I might as well upgrade after finishing day one. I re-ran benchmarks but I'm consistently getting back slowdowns.
rustup update
updated my nightly toolchains as follows:Code
I tried this code:
AoC repo. Run from the
2023
directoryI'll happily try and make an MRE over the weekend if anyone prefers that
I expected to see this happen: Same speed, as I'm not using any nightly-only APIs in the code being run in the benchmarks. The
common
utility module only uses#![feature(byte_slice_trim_ascii)]
and that's only being called once.Instead, this happened: The same safe code is consistently running slower with no changes in system load.
Old situation on my hardware:
New:
CPU: AMD Ryzen 7 5800H
mem: 15.4 / 16GB usable
OS: Windows 10 Home
Version: 22H2
Build: 19045.3693
Version it worked on
It most recently worked on: rustc 1.76.0-nightly (2f8d81f 2023-11-21)
Possibly later, I usually don't grab daily updates. I'll go through the other 8 to find where the performance characteristics changed.
Version with regression
rustc --version --verbose
:I have tried both windows-gnu and windows-msvc toolchains, they both hit around the same speeds, and the same slowdown.
The text was updated successfully, but these errors were encountered: