-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
float ranges: if a+n*s == b
, length(a:s:b)
should be n+1
#20373
Comments
Interestingly, this does work with #18777 (comment). That might introduce problems of its own, of course. A more conservative version is function mycolon(a, s, b)
len_est = (b-a)/s + 1
len_test = floor(Int, len_est+10*eps(len_est))
len = a+(len_test-1)*s <= b ? len_test : len_test-1 # needs modification for s < 0
StepRangeLen(a, s, len)
end |
I can't find the comment that links to – there's way too many. Which proposal fixes it? |
Why not this:
In case you haven't figured it out, I'm vehemently against arbitrary ulp fuzzing. Why is 10 the correct number of ulps? |
For this case would it be enough to check whether adding 1 to the length exactly hits |
Agree – that's all that's needed, I think. |
The problem comes if you've already committed to rational approximation of the inputs---at that point you may need some ulp fuzzing. |
I don't disagree with your vehement distaste for arbitrary constants, however. |
If you've already committed to rational approximation, then you do that – that guarantees hitting the endpoint exactly or you would bail out. This is a slightly better way of picking the length in the case where rational lifting has already failed. |
Yeah, you're probably right. Seems like it should work, and it would be a very simple tweak. Are you trying it, or should I? |
I'm trying it already. In other words, just this: diff --git a/base/twiceprecision.jl b/base/twiceprecision.jl
index 80d1917ea..05c7d1153 100644
--- a/base/twiceprecision.jl
+++ b/base/twiceprecision.jl
@@ -149,6 +149,7 @@ function colon{T<:Union{Float16,Float32,Float64}}(start::T, step::T, stop::T)
end
end
# Fallback, taking start and step literally
+ start + len*step <= stop && (len += 1)
StepRangeLen(TwicePrecision(start, zero(T)), twiceprecision(step, nbitslen(T, len, 1)), len)
end Not sure where else I'd need to apply this, though. |
That's probably it. I think the only other place with similar logic is for |
I also just noticed that the definition of |
? If you're looking at the same function, here's a resetting of |
So there's good news and bad news. The good news is that I fixed this problem. The bad news is that in writing tests for it, I found that there's a fair number of @test first(start:step:stop) == start
@test last(start:step:stop) == stop
@test first(linspace(start,stop,n)) == start
@test last(linspace(start,stop,n)) == stop The test cases are randomly generated, so not terribly typical and it does take a lot of randomly generated cases to hit a bad one, but this probably should not happen. The tests that sometimes fail are included as |
I think this being false is expected behavior, unavoidable given the interface. With |
In this case, I've constructed the values so that |
It can't possibly miss the endpoints if you use |
There are failures for the |
Oh, I see, I missed "lowercase version" in there. |
Some numbers (obtained by making the obvious modifications to Stefan's new test script): Fraction that fail:
colon:
len 0.0, start 6.103515625e-6, stop 0.0545379638671875
linspace:
len 0.0, start 2.13623046875e-5, stop 1.52587890625e-5 julia-0.5: Fraction that fail:
colon:
len 0.085711669921875, start 0.0284698486328125, stop 0.085711669921875
linspace:
len 0.0, start 0.02374267578125, stop 0.028363037109375 So we're better off in every respect, typically by several orders of magnitude, but still not perfect. |
Definitely not a show-stopper, but dayum, this is a hard problem! |
I'm pretty sure the framework is capable of hitting all of these cases exactly. There's an exception for really long ranges, because of this which deliberately breaks @simonbyrne's criterion
It turns out to be counterproductive to truncate more than half the bits of the mantissa of But that doesn't apply here. It's presumably "just" a matter of tweaking those TwicePrecision constants. The mathematics is very clear (it's just 2 linear equations with 2 unknowns), but the handling of twice-precision arithmetic and nonassociativity makes it a little more "exciting." Fortunately, you're good at that kind of thing, so I'm sure you'll pull it off 😉. |
This is the 0.5 edition: with this change a, b, n are always hit exactly by a:s:a+n*s construction tests. There are two parts to the fix: 1) check that the lifted rational endpoint is exact even *after* we've reduced terms of the fractions; 2) apply the same fix that I've proposed for the length in the non-lifted case.
See #20396 – with that patch the 0.5 float ranges always hit the start, length and endpoint exactly. |
This is the 0.5 edition: with this change a, b, n are always hit exactly by a:s:a+n*s construction tests. There are two parts to the fix: 1) check that the lifted rational endpoint is exact even *after* we've reduced terms of the fractions; 2) apply the same fix that I've proposed for the length in the non-lifted case.
This is the 0.5 edition: with this change a, b, n are always hit exactly by a:s:a+n*s construction tests. There are two parts to the fix: 1) check that the lifted rational endpoint is exact even *after* we've reduced terms of the fractions; 2) apply the same fix that I've proposed for the length in the non-lifted case. Combined backport of these commits: - e849169 - b7ad743 range tests: allow any range length that hits stop (#20532) (cherry picked from commit b7ad743)
[release-0.5] Fix computation of range length in certain cases (fix 0.5/#20373)
I don't think that's true. For example, my impression is that you think it's desirable that currently we have |
IIRC (it's been a while), I was excluding such cases and only testing cases where the end point is hit exactly by the |
Sure, but what should |
I don't get it – the last value of julia> a, s, b, n = 0.1, 0.1, 0.30000000000000004, 2
(0.1, 0.1, 0.30000000000000004, 2)
julia> a + n*s == b
true
julia> length(a:s:b) == n+1
true Perhaps I'm being dense here and not seeing what the problem is. Can you give me a problematic example? |
We have julia> 0.1:0.1:0.30000000000000004
0.1:0.1:0.30000000000000004
julia> 0.1:0.1:0.3
0.1:0.1:0.3
julia> 0.1:0.1:0.31
0.1:0.1:0.3 but this is fairly crazy and definitely not defensible. In #23194 you sometimes get this and sometimes don't hit the endpoint exactly, depending on whether either variant hits the rational approximation criterion. The point being that the test you devised is too stringent to be sensible with non-associative arithmetic. We have to drop the |
These seems fine to me... What's the principle that it violates? |
Ah, I do see what you're getting at here. The issue is that the stop value |
There used to be a check at the very top of the lifting code that made sure that |
I keep looking at this code trying to figure out where that check should go, but I don't really understand the # Integer ops could overflow, so check that this makes sense
if isbetween(start, start + (len-1)*step, stop + step/2) &&
!isbetween(start, start + len*step, stop) |
From the standpoint of exact arithmetic (i.e., math, not floating-point approximations of it), Consequently, all 3 of these should produce |
I don't disagree with that comment, but |
So then julia> collect(0.0:0.1:1.0)
11-element Array{Float64,1}:
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0 needs to switch to 0.0
0.1
0.2
0.30000000000000004
... ? Because of course |
Ah, here's the disagreement. I don't think that's true at all. We can think of the In the inexact endpoint case, when the stop point cannot be hit exactly, there's no justification for non-literal interpretation of the start and step. So while Part of the trouble here is that there's no set of principles written down for what these float ranges should do – without that it's impossible to verify that those principles are consistent and can actually all be satisfied at the same time. That, of course, is significantly my fault for not writing them down sufficiently clearly when I first implemented the lifting behavior. |
I worry that the behavior you want is much more difficult to formalize/document than what I was aiming for. But I want to give you space to try, so I'll focus my efforts solely on fixing linspace and leave the rest to you. Note this is a 1.0 issue since the values might change. |
I guess there are several interpretations we’ve “floated” (get it?) for the meaning of The literal interpretation. Here a is taken to be the real value of This interpretaiton has the benefit that it is very simple and easy to understand. Its most significant drawback is that The lifted exact + literal inexact interpretation. If there exist real numbers a, s and b and an integer n such that a rounds to This is what I implemented with my The always lifted interpretation? This is what we currently have. I’m not sure what principle this follows to be honest because I don’t fully understand it. Can you write up what the rules for this are? I know that it uses double precision numbers to allow linear interpolation to get behavior similar to my rational lifting technique. I thought it was the same as my approach but with a different lifting method, but that seems not to be the case since it no longer respects the rule that inexact ranges must be interpreted literally. E.g. |
My current best understanding of what you implemented, Tim, is that you still use rational lifting but then approximate the lifted rationals with double-precision values – which must still round to the original inputs (i.e the high part must equal the input). Intermediate values are then computed using linear interpolation on double-precision values, which allows the type to represent both float ranges and linear spaces accurately, and avoids doing an expensive division operation (although I don't recall if the end result ended up being faster or not). My only objection to any of this is doing rational lifting in the case where the stop value cannot be hit exactly – because that's the case where deviating from the user's literal input value cannot be justified. I suspect that others (e.g. @andreasnoack) may object even in some cases where the stop value is hit exactly, because there do exist ranges where a literal interpretation hits the same stop value but gives different intermediate values. Of course, those are precisely the cases where people find the literal interpretation unintuitive and they file issues, but I can understand the objection. |
On my phone (no wifi), but briefly I think we have an awkward hybrid now and something has to be changed. I would probably change so we decide to lift to rational based on a and s alone. Either way, compute the length in 2x precision and call range (thus discarding b from further participation). |
I suspect I would prefer to go the other way and do the lifting and interpolation the way you're doing it now but reinstate the exact interpretation rule in cases where the stop point can't be hit. We should figure out what the pros and cons of each direction are. |
Shouldn't a Julian user use something like |
Decimal only solves the problem if the numbers are representable exactly in decimal, which most aren't. The rational numbers already (and have always) worked. Arguably, that's the correct way to do this, except that it's quite slow. It's a considerably better API, however, since you can at least express what you'd like to produce accurately. |
is there a way that we could spell fancy float ranges as |
Just to clarify my view here: yes, I do think that we should base floating point operations on the exact binary floating point values of the inputs and not on low-denominator rational numbers in small neighborhoods around the inputs. My view is not that |
The good news is that now at least one has control over what happens using the exported interface: julia> StepRangeLen(0.0, 0.1, 4) # literal
0.0:0.1:0.30000000000000004
julia> range(0.0, 0.1, 4) # allows lifting
0.0:0.1:0.3
julia> typeof(range(0.0, 0.1, 4))
StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}} |
So it seems that the consensus is that the traditional way of inputting floating-point ranges is simply inherently flawed, and that a better way of writing them would be the only way to really fix the "float range problem". However, we don't know what that would look like and the place to experiment with it is in packages, not the base language. To that end, I think that specifying how we want floating-point ranges to behave and making sure our implementation matches that specification is the best we can do for the base language, short of taking float ranges out entirely (which I don't think is viable). |
Like `float(0:1//10:3//10)`? That could give speed and clarity.
…On Aug 12, 2017 9:15 PM, "oscardssmith" ***@***.***> wrote:
is there a way that we could spell fancy float ranges as float(rational
range) or something similar and optimize it correctly?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#20373 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABopE1WYL0-24UZ60e4qBYGlOC0KclRDks5sXk4YgaJpZM4L0Nps>
.
|
See http://stackoverflow.com/questions/41985718/why-the-values-are-not-correctly-iterated-using-range-arrays/41986875#41986875.
Example:
Rational lifting doesn't address this case since
0.15000000000000002 != 3/20 == 0.15
.The text was updated successfully, but these errors were encountered: