-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
@inbounds slowing down a loop #26261
Comments
This seems to be system dependent. On my Windows laptop, I see the same slowdown you show here. On my Linux workstation, the |
Could it be that the boundscheck is: What does code_native say? |
It would be interesting if we could separate out the informative part of bounds checks from the error part: i.e. allow the high-level code to communicate guarantees that it believes must be true even when we turn off checks that they are in fact true. That could, of course, lead to undefined, dangerous behavior if the claimed facts are not true, but that's always the case with incorrect usage of |
If the bounds assertion has the right branch prediction hints, it's cost should be close to nothing (assuming that fetching the bounds is cheap and fetching the bounds is not done every time around a loop). |
I've found (mostly working with C code) that a liberal sprinkling of |
The problem with bounds checks is that they generally/currently prevent some optimizations (e.g. #21402). |
A cursory glance at the assembly shows a difference in the remainder calculation that LLVM optimised differently. Replacing |
Seems fixed now? |
Closing. Can reopen if still an issue. |
Some of the first Julia code I ever wrote was a
xorshift1024*
implementation, to try and compare speed against our nativerand()
code. I was looking at it again, and decided I would try and sprinkle some of the new Julia magic speed dust on it such as@inbounds
, and was surprised to find that it actually made the whole thing slower:Function in question:
I perform the allocation within
xorshift1024()
so as to better compare againstrand(UInt64, N)
. Which I do down here:Running this code gives:
E.g. there is ~2x overhead for small numbers, but as we increase
N
, it slowly converges. However, if I introduce@inbounds
to the for loop within the function:This is a surprising amount of performance drop, in my opinion.
The text was updated successfully, but these errors were encountered: