Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Range first and last can be misleading #22354

Closed
andyferris opened this issue Jun 13, 2017 · 31 comments
Closed

Range first and last can be misleading #22354

andyferris opened this issue Jun 13, 2017 · 31 comments
Labels
needs decision A decision on this change is needed
Milestone

Comments

@andyferris
Copy link
Member

I noticed that first(r) and last(r) where r is one of various range types can give spurious results in the case the range is empty.

julia> first(1:-1:2)
1

julia> last(1:-1:2)
2

I can see these are used to enable nice generic code for ranges, for various Base functions like isempty, issorted and so-on. However, to me, this seems to violate the iterator interface pretty badly, and I would expect a BoundsError thrown. Would it be more appropriate to use some specialized functions like rangestart and rangestop as an interface to get these numbers for ranges?

@timholy
Copy link
Member

timholy commented Jun 13, 2017

👍 to the BoundsError. I'm not sure we need a replacement? EDIT: non-exported makes sense for implementing the methods you mention.

@JeffBezanson
Copy link
Member

I recall we looked at this once quite a while ago, and I think fixing it caused some performance problems. Maybe we'll fare better now though.

@timholy
Copy link
Member

timholy commented Jun 13, 2017

In particular it might make sense to put it inside a @boundcheck?

@nalimilan
Copy link
Member

An even more interesting/disturbing case:

julia> last(1:-1:3)
2

julia> 1:-1:3 # Explanation
1:-1:2

@omus
Copy link
Member

omus commented Jun 13, 2017

There should probably be a special display for an empty range:

julia> 1:1:-3
1:1:0

julia> 0:1:-3
0:1:-1

julia> -3:-1:1
-3:-1:-2

@mbauman
Copy link
Member

mbauman commented Jun 13, 2017

I agree this is strange, but it's used in APIs like searchsorted, where you're interested in the insertion point of an empty range. It's documented:

  first(coll)

  Get the first element of an iterable collection. Returns the start point of a Range even if it is empty.

It'd definitely be worth seeing if we can replace it with a special function.

@timholy
Copy link
Member

timholy commented Jun 13, 2017

I'd favor undocumenting that, though, and going the route suggested by @andyferris of adding rangestart and rangestop.

@kmsquire
Copy link
Member

I agree this is strange, but it's used in APIs like searchsorted, where you're interested in the insertion point of an empty range.

I'm responsible for the return of a range by searchsorted, and I think that the combination of searchsorted, searchsortedfirst, and searchsortedlast should be cleaned up. The names are awkward, and I'm pretty sure they were added before keyword arguments were part of the language.

@nalimilan
Copy link
Member

I'm responsible for the return of a range by searchsorted, and I think that the combination of searchsorted, searchsortedfirst, and searchsortedlast should be cleaned up. The names are awkward, and I'm pretty sure they were added before keyword arguments were part of the language.

Funny how often this comes up this week. In the Search & Find Julep a solution to this was to wrap sorted vectors in a SortedVector object, which would allow generic find/search functions to be used instead of the searchsorted* functions. Keyword arguments wouldn't be enough since the return type (range vs. vector) would have to change, which would create a type instability.

But AFAICT it would still be useful to know the insertion point, so a rangestart function would be needed.

@nalimilan nalimilan added the triage This should be discussed on a triage call label Nov 29, 2017
@nalimilan
Copy link
Member

For triage: I think we'd better raise a BoundsError for now, and add rangestart and rangestop functions for the searchsorted use case. We could even recommend using r.start for searchsorted, since the result is necessarily a UnitRange{Int}: that would leave us more time to decide whether adding rangestart is a good idea.

@JeffBezanson
Copy link
Member

I don't love the rangestart and rangestop functions, but agree it would be nice to explore giving a BoundsError here. I agree that since rangestart is range-specific anyway, you might as well just access r.start directly. This can be considered if somebody happens to get to it in time.

@JeffBezanson JeffBezanson removed the triage This should be discussed on a triage call label Nov 30, 2017
@timholy
Copy link
Member

timholy commented Dec 1, 2017

But not all ranges have a start field:

julia> fieldnames(StepRangeLen)
4-element Array{Symbol,1}:
 :ref   
 :step  
 :len   
 :offset

(Explanation: for AbstractFloat, ref is the smallest-magnitude element in the range, even if that element occurs somewhere in the middle. This is necessary for high-precision arithmetic, otherwise you might, e.g., miss zero by an eps.)

@andyferris
Copy link
Member Author

Is this curious behavior only required for our implementation of searchsorted and so-on?

@timholy
Copy link
Member

timholy commented Dec 3, 2017

I may be misunderstanding your question here, but it's used in packages too. For example, I'm almost sure it's exploited in AxisArrays.

@nalimilan
Copy link
Member

For example, I'm almost sure it's exploited in AxisArrays.

For what purposes?

Regarding searchsorted, I'd like to be sure the current behavior is really needed. See #24883. If you want to know whether an element is already in the vector and to insert it if it isn't, wouldn't you use searchsortedfirst instead (which returns a single index)? At least what I can say is that searchsorted is used only once in Base, and no insertion happens.

@timholy
Copy link
Member

timholy commented Dec 3, 2017

Looks like it's basically the same purpose as in Base: https://github.com/JuliaArrays/AxisArrays.jl/blob/master/src/search.jl

@nalimilan
Copy link
Member

OK. Though AFAICT the returned value is not used to insert entries, so we still don't have an example where the start of an empty index is useful.

@nalimilan nalimilan added this to the 1.0 milestone Dec 29, 2017
@StefanKarpinski
Copy link
Member

StefanKarpinski commented Dec 29, 2017

What's up with adding this to the milestone without triage, @nalimilan? And yes, I do keep a list of 1.0 issues open and refresh it several times a day to see if there are any more or less issues on it.

@StefanKarpinski StefanKarpinski added the triage This should be discussed on a triage call label Dec 29, 2017
@nalimilan
Copy link
Member

Sorry, I was going to add a comment saying that the milestone should probably have been added when this issue was triaged. I then started to see whether it would be hard to make a PR and got distracted by other things.

@nalimilan
Copy link
Member

See PR #25385. It would be cool if triage could decide which approach is most appropriate.

@andyferris
Copy link
Member Author

Nice.

I had been thinking of making a simple PR which kind-of puns off @inbounds and @boundscheck for first and last (so is safe and correct by default), but I still felt a little conflicted if that was good...

@mbauman
Copy link
Member

mbauman commented Jan 4, 2018

Unfortunately @inbounds won't work here since it's not a guaranteed transform (e.g., running in --boundcheck=yes will break it).

I think we definitely need a completely new name for this — the question is which one: rangestart, r.start, or Range.first (or some such module as a namespace). Note that there's a third property in this set, step, which is a name we've talked about removing in the past since it's not really a general enough to be worth an exported generic function. Whichever scheme we choose, we may want to fold that one into it, too.

@StefanKarpinski
Copy link
Member

In the other thread I just proposed using properties to do r.start, r.step and r.stop uniformly across different range types.

@JeffBezanson
Copy link
Member

I think we can punt on this for 1.0 --- it doesn't seem to be an urgent problem, largely a theoretical concern. In fact I think #25385 demonstrates that the existing behavior was largely ok and convenient.

@JeffBezanson JeffBezanson removed this from the 1.0 milestone Jan 4, 2018
@JeffBezanson JeffBezanson added needs decision A decision on this change is needed and removed triage This should be discussed on a triage call labels Jan 4, 2018
@JeffBezanson JeffBezanson added this to the 2.0+ milestone Jan 4, 2018
@User-764Q
Copy link

Hi,

I tested this today on Julia 1.6.2 and got the same issue as the 'concerning' example above. Has there been a decision to not change this behaviour or is it worth keeping this open.

julia> last(1:-1:3)
2

julia> 1:-1:3 # Explanation
1:-1:2


@oscardssmith
Copy link
Member

We should fix this.

@oscardssmith oscardssmith added the good first issue Indicates a good issue for first-time contributors to Julia label Nov 8, 2021
@gbaraldi
Copy link
Member

gbaraldi commented Nov 8, 2021

I've gotten bitten by empty ranges before in #40331. Wouldn't changing this be breaking tho?

Also there should really be a special way to print an empty range, or make it clear that it is empty.

@vtjnash
Copy link
Member

vtjnash commented Nov 8, 2021

We previously had defined it this way to avoid a branch when computing length (and instead do the branch on construction), but now we've put that branch back in most cases for length, so it would likely be worthwhile for someone to experiment with this change now, making it a possible "good first issue".

@nalimilan
Copy link
Member

What do you mean by "this change"? Clearly, changing the behavior of first and last like #25385 experimented with isn't possible until 2.0. Do you mean adding rangestart and rangestop (which that PR also does)?

@goretkin
Copy link
Contributor

I think the surprising behavior is that the property last(a) == last(collect(a)) fails to hold:

julia> last(5:4)
4

julia> last(collect(5:4))
ERROR: BoundsError: attempt to access 0-element Vector{Int64} at index [0]

But I'm not sure what can be done about that, since the behavior of first/last on empty ranges is relied upon by firstindex/lastindex, which is in turn relied upon by begin and end as array indices (see #34697 (comment)).

If last(5:4) throws BoundsError, then so would

julia> ([])[begin:end]
Any[]

@vtjnash vtjnash removed the good first issue Indicates a good issue for first-time contributors to Julia label Nov 19, 2021
@vtjnash vtjnash closed this as completed Nov 19, 2021
@vtjnash
Copy link
Member

vtjnash commented Nov 19, 2021

Closing since we decided to keep this behavior (#25385) and have another issue already open for the misleading display #40331

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs decision A decision on this change is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.