-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestion: Make "findall" work on a string #31788
Comments
Since we have function findall(t::Union{AbstractString,Regex}, s::AbstractString)
found = UnitRange{Int}[]
i = firstindex(s)
while true
r = findnext(t, s, i)
isnothing(r) && return found
push!(found, r)
i = nextind(s, last(r))
end
end which gives output like julia> findall("ing", "Spinning laughing dancing")
3-element Array{UnitRange{Int64},1}:
6:8
15:17
23:25
julia> findall(r"\w+", "Spinning laughing dancing")
3-element Array{UnitRange{Int64},1}:
1:8
10:17
19:25 |
Note that my implementation above does not terminate for |
To handle the empty-range case, maybe : function findall(t::Union{AbstractString,Regex}, s::AbstractString)
found = UnitRange{Int}[]
i = firstindex(s)
while true
r = findnext(t, s, i)
isnothing(r) && return found
push!(found, r)
j = isempty(r) ? first(r) : last(r)
j > lastindex(s) && return found
@inbounds i = nextind(s, j)
end
end which gives e.g. julia> findall("", "foo")
4-element Array{UnitRange{Int64},1}:
1:0
2:1
3:2
4:3
julia> findall(r"\w*", "Spinning laughing dancing")
6-element Array{UnitRange{Int64},1}:
1:8
9:8
10:17
18:17
19:25
26:25 |
It's also not clear to me what the desired behavior is for overlapping substrings? Do we want to find only disjoint substrings (the code above), or do we want all substrings even if they are overlapping? For the latter behavior, just change my code above to julia> findall(r"\w*", "Spinning laughing dancing")
26-element Array{UnitRange{Int64},1}:
1:8
2:8
3:8
4:8
5:8
6:8
7:8
8:8
9:8
10:17
11:17
12:17
13:17
14:17
15:17
16:17
17:17
18:17
19:25
20:25
21:25
22:25
23:25
24:25
25:25
26:25 I can imagine both possibilities being useful. Maybe a |
|
Personally, I like @fredrikekre suggestion with reusing the
So maybe instead just an empty UnitRange array? Thirdly, I just wondered why would you sometimes get ranges from 1:8 and then 9:8, is it also reading the string backwards? Maybe this should be a optional kwarg as well. Just suggestions from a users point of view. Kind regards |
@AhmedSalih3d, a range like
because the first word boundary after position 2 is between indices 4 and 3 in this string. Although the result my code returns for |
So, in summary, it seems that we want: function findall(t::Union{AbstractString,Regex}, s::AbstractString; overlap::Bool=false)
found = UnitRange{Int}[]
i, e = firstindex(s), lastindex(s)
while true
r = findnext(t, s, i)
isnothing(r) && return found
push!(found, r)
j = overlap || isempty(r) ? first(r) : last(r)
j > e && return found
@inbounds i = nextind(s, j)
end
end or equivalent? Anyone who wants to put together a PR with some tests and documentation is welcome to grab this code. |
Yes, that looks correct to me. |
Hello all
Currently it is possible in Julia:
If I use a string array:, it will still work:
But if I instead have one complete string ie;
Inspired by these posts here (made by me earlier on Discourse):
https://discourse.julialang.org/t/find-index-of-all-occurences-of-a-string/23044
https://discourse.julialang.org/t/suggestion-findall-to-work-on-strings/23143
Kind regards
The text was updated successfully, but these errors were encountered: