Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String nextind is inconsistent about what to do with out-of-bounds indices #26511

Closed
Keno opened this issue Mar 18, 2018 · 10 comments
Closed

String nextind is inconsistent about what to do with out-of-bounds indices #26511

Keno opened this issue Mar 18, 2018 · 10 comments
Labels
error handling Handling of exceptions by Julia or the user strings "Strings!"

Comments

@Keno
Copy link
Member

Keno commented Mar 18, 2018

The docstring states that an out of bounds index returns i+1, but:

julia> nextind(SubString("abc"), 10, 1)
ERROR: BoundsError: attempt to access "abc"
  at index [10]
Stacktrace:
 [1] nextind(::SubString{String}, ::Int64, ::Int64) at ./strings/basic.jl:468
 [2] top-level scope

Somewhat more concerning, these cases actually happen a bunch during the test suite, but get suppressed by @inbounds, which is not how that functionality is supposed to be used.
We should figure out what this function does and adjust implementations accordingly.

@ararslan ararslan added error handling Handling of exceptions by Julia or the user strings "Strings!" labels Mar 19, 2018
@StefanKarpinski
Copy link
Member

The doc string is out of date in that case. I thought @bkamins or I had updated them, but perhaps that didn't happen. The valid index range is 0:ncodeunits(s)+1 for thisind, 0:ncodeunits(s) for nextind and 1:ncodeunits(s)+1 for prevind.

@Keno
Copy link
Member Author

Keno commented Mar 20, 2018

Ok, but then we have bigger problems, because the string tests call nextind with invalid indices all over the place, only hidden by @inbounds. I suspect running the tests individually with check-bounds=yes would reveal all of them without hacking inference.

@KristofferC
Copy link
Member

Why isn't the sysimg built with bounds checking always on when testing?

@bkamins
Copy link
Member

bkamins commented Mar 20, 2018

This is exactly the reason why I have opened #25566.
The documentation update issue is here #25478. I did not update it as I was not 100% clear about the final decision about thisind discussed there (if we leave it as is the documentation should be updated now as all functionality is there).

@Keno
Copy link
Member Author

Keno commented Mar 20, 2018

List of failures I see on my branch with broken boundscheck elim: https://gist.github.com/Keno/4bd3892b8c2cd431f9084858d3bb09df

Can't promise that some of these aren't due to optimizer bugs on that branch.

@bkamins
Copy link
Member

bkamins commented Mar 20, 2018

It seems to me that it should be rathe optimizer bug on that branch. The relevant part of the code is:

ss = SubString(s, a:b)
s2 = s[a:b]
@test ncodeunits(ss) == ncodeunits(s2) # this test passes
for i in 0:ncodeunits(ss)
    @test nextind(ss, i) == nextind(s2, i) # this test fails
end

Given the test above (that passes) ss and s2 have the same number of code units and we are not calling nextind on index outside the accepted range.

@Keno
Copy link
Member Author

Keno commented Mar 20, 2018

I think what looks like is happening in at least some of those examples is that nextind on the last index in the range accesses the underlying string data past the acceptable range.

@bkamins
Copy link
Member

bkamins commented Mar 20, 2018

Yes, there seems to be a bug in the line 141 of string.jl:

@inbounds b = codeunit(s, i += 1)

which causes problem if i==n. I will make a PR to fix that.

@bkamins
Copy link
Member

bkamins commented Mar 20, 2018

PR submitted in #26549. But anyway it would be nice to have some way to enable all bounds checking (as indicated in #25566) at least in development mode.

@stevengj stevengj changed the title String nextind is schizophrenic about what to do with out-of-bounds indicies String nextind is inconsistent about what to do with out-of-bounds indicies May 5, 2020
@stevengj stevengj changed the title String nextind is inconsistent about what to do with out-of-bounds indicies String nextind is inconsistent about what to do with out-of-bounds indices May 5, 2020
@vtjnash
Copy link
Member

vtjnash commented Feb 1, 2024

AFAICT, the issues Keno found are fixed now (where sys.ji is a copy of sys.so without native code)

$ JULIA_CPU_THREADS=1 ../julia -J ../sys.ji --check-bounds=yes runtests.jl strings
Running parallel tests with:
  getpid() = 1072057
  nworkers() = 1
  nthreads() = 1
  Sys.CPU_THREADS = 1
  Sys.total_memory() = 1007.709 GiB
  Sys.free_memory() = 957.726 GiB

Test          (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
strings/basic      (1) |        started at 2024-02-01T17:30:36.579
strings/basic      (1) |    31.25 |   0.10 |  0.3 |     698.21 |   680.33
strings/search     (1) |        started at 2024-02-01T17:31:08.593
strings/search     (1) |     7.20 |   0.03 |  0.4 |     183.98 |   680.33
strings/util       (1) |        started at 2024-02-01T17:31:15.800
strings/util       (1) |    14.97 |   0.22 |  1.4 |     742.99 |   807.01
strings/io         (1) |        started at 2024-02-01T17:31:30.769
strings/io         (1) |     5.37 |   0.05 |  0.9 |     314.65 |   807.01
strings/types      (1) |        started at 2024-02-01T17:31:36.143
strings/types      (1) |     3.43 |   0.13 |  3.8 |    1066.93 |   807.01
strings/annotated  (1) |        started at 2024-02-01T17:31:39.573
strings/annotated  (1) |     6.87 |   0.05 |  0.8 |     349.25 |   807.01

Test Summary: |    Pass    Total     Time
  Overall     | 2677329  2677329  1m13.8s
    SUCCESS

@vtjnash vtjnash closed this as completed Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
error handling Handling of exceptions by Julia or the user strings "Strings!"
Projects
None yet
Development

No branches or pull requests

6 participants