-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implemented intelligent slice functionality #414
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to avoid keywords as (dummy) variable names, how about end
-> last
?
Using |
I like |
Is their any reason to keep this in |
So we will have a interface slice
module procedure :: slice_character_sequence
module procedure :: slice_string
end interface slice and implementation of |
I'm guessing you have the right idea in mind. You can drop the
|
… dependencies accordingly
Yeah, I got your point. Your above comments start a new discussion here:
Currently the function slice is str = 'Hello, World'
print(str[:6])
# prints 'Hello,' (index 0 inclusive but index 6 exclusive) This might be linked with the fact that Java & Python uses 0-based indexing. |
I reckon the first and last should behave inclusively in agreement with the
intrinsic slice syntax.
…On Mon, 24 May 2021, 14:38 Aman Godara, ***@***.***> wrote:
Yeah, I got your point.
Your above comments start a new discussion here:
sstr = string(slice(cstr,last=6)) ! sstr contains "Hello", call to
slice_character
print *, slice(sstr,first=3) ! prints "llo", call to slice_string
Currently the function slice is first index and last index inclusive but
some other languages like Java as well as Python keep first index
inclusive and last index exclusive. So output of slice(cstr, last=6) will
be 'Hello,' for Fortran (output included index 6 as well)
whereas for Python index 6 will not be included:
str = 'Hello, World'print(str[:6])# prints 'Hello,' (index 0 inclusive but index 6 exclusive)
This might be linked with the fact that Java & Python uses 0-based
indexing whereas Fortran doesn't.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#414 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFA33S7A5JSW2UM562WL5ELTPJCENANCNFSM45L6LOPQ>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. This is a very useful addition to stdlib.
|
Indeed, they perform roughly the same operation. But On the other hand |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Only thing I was confused about was what happens in case of input like:
result = slice(str,8,2,stride=-2)
Are the characters now taken in reverse reverse? I would need to perform some tests with the function, to make sure all "invalid" input cases still produce sensible results.
Behaviour of the function is similar to this fortran loop: j = 1
do i = start, end, stride
<< insert the character present at index i of input_string to index j of output_string >>
j = j + 1
end do let's say I am showing few iterations as the function progresses here: function is at index = 6 function is at index = 4 function is at index = 2 so we get a reversed output "hfdb" Had the stride been positive 2 ( |
If input_string = 'abcdefgh'
output_string = slice(input_string, 12, 16)
|
I've written a new comment in #413 (comment) The implementation of In the case above where the |
In the zen of python it is stated that,
What is the value in allowing a negative stride, if we could also achieve this using the nested calls, e.g.
I'm happy to hear what are some possible counter-arguments. One is of course shorter call syntax. But it comes at the price of the complicated logic inside |
I can try implementing slice which takes positive strides only but I think it will still look more or less equally complicated. |
I'd stay with the current behavior for now. If we can get the corner cases right, I have nothing against the current behavior. It is just a matter of preference in the end. |
pure function clip(x, xmin, xmax) result(res)
integer, intent(in) :: x
integer, intent(in) :: xmin
integer, intent(in) :: xmax
integer :: res
res = max(min(x, xmax), xmin)
end function clip
function slice(string, first, last, stride) result(sliced_string)
character(len=*), intent(in) :: string
integer, intent(in), optional :: first, last, stride
integer :: first_index, last_index, stride_vector, strides_taken, length_string, i, j
character(len=:), allocatable :: sliced_string
length_string = len(string)
if (length_string > 0) then
first_index = 1
last_index = length_string
stride_vector = 1
if (present(stride)) then
stride_vector = max(1, abs(stride))
end if
if (present(first)) then
first_index = first
end if
if (present(last)) then
last_index = last
end if
if((last_index < first_index) .or. &
(first_index < 1 .and. last_index < 1) .or. &
(first_index > length_string .and. last_index > length_string)) then
sliced_string = ""
else
first_index = clip(first_index, 1, length_string)
last_index = clip(last_index, 1, length_string)
strides_taken = (last_index - first_index) / stride_vector
allocate(character(len=strides_taken + 1) :: sliced_string)
j = 1
do i = first_index, last_index, stride_vector
sliced_string(j:j) = string(i:i)
j = j + 1
end do
end if
else
sliced_string = ""
end if
end function slice Actually most of the complexity came when I started handling invalid cases. With invalid cases here I was referring to the case when user has given wrong indexes as inputs in |
I tried to break the implementation and haven't found any deviating behavior from the array slice so far (see 81f028a for a hacky |
What would this return? array = [1, 2, 3, 4, 5]
print *, array(-2 : 10 : 2) in my case it returns commit fa88905 version and commit 42a905d version of strings' |
You can think of the slice syntax
If you add the In other words, the Fortran intrinsic slicing does not do any smart resetting. If you access out of bounds you will get undefined behavior. Moreover, negative indexes are allowed because Fortran arrays can use custom bounds:
|
We need a couple of more reviewers here to move this patch forward. |
Add general tester against intrinsic array slice
bce562c
to
a895085
Compare
subroutine test_slice_gen | ||
character(len=*), parameter :: test = & | ||
& "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789" | ||
integer :: i, j, k | ||
integer, parameter :: offset = 3 | ||
|
||
do i = 1 - offset, len(test) + offset | ||
call check_slicer(test, first=i) | ||
end do | ||
|
||
do i = 1 - offset, len(test) + offset | ||
call check_slicer(test, last=i) | ||
end do | ||
|
||
do i = -len(test) - offset, len(test) + offset | ||
call check_slicer(test, stride=i) | ||
end do | ||
|
||
do i = 1 - offset, len(test) + offset | ||
do j = 1 - offset, len(test) + offset | ||
call check_slicer(test, first=i, last=j) | ||
end do | ||
end do | ||
|
||
do i = 1 - offset, len(test) + offset | ||
do j = -len(test) - offset, len(test) + offset | ||
call check_slicer(test, first=i, stride=j) | ||
end do | ||
end do | ||
|
||
do i = 1 - offset, len(test) + offset | ||
do j = -len(test) - offset, len(test) + offset | ||
call check_slicer(test, last=i, stride=j) | ||
end do | ||
end do | ||
|
||
do i = 1 - offset, len(test) + offset | ||
do j = 1 - offset, len(test) + offset | ||
do k = -len(test) - offset, len(test) + offset | ||
call check_slicer(test, first=i, last=j, stride=k) | ||
end do | ||
end do | ||
end do | ||
end subroutine test_slice_gen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes it even harder to see what the expected behavior is supposed to be. The goal of a good test suite is to serve as an example of how to use the code, and a definition of it's expected behavior. You should go back to the specific tests you had and just give them more meaningful descriptions. The message from a failing test should be a hint as to what aspect of the code is not working correctly. Something as vague as "it failed" doesn't accomplish that. Ideally, I should be able to read the test suite alone and understand the expected behavior.
Thanks everybody for the discussion on this patch. To move #433 forward I will go ahead and merge this PR. |
Status: code is open for review
resolves #413
Tasks:
slice
interface instdlib_strings.f90
: add two procedures underslice
interfaceslice
instdlib_string_functions.f90
include_last
feature after getting the basicslice
function mergedslice
function and correct documentation ofto_title
andto_sentence