-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include a split
function (202X feature)
#241
Comments
I put an implementation here: https://github.com/milancurcic/fortran202x_split. This is a "naive" implementation--I went for what seemed to me as the simplest solution. The For tests so far I only used the three examples from 16.9.194 in 20-007. They seem okay. More tests will be needed at the time of PR for stdlib. At this time, I'd like to get feedback on this before I prepare a PR for stdlib. |
Thank you for the implementation. I played a bit with it and it looks good to me. The API seemed a bit strange at the start. |
Here's my impression of the API.
character, intent(in) :: set(:) instead of character(*), intent(in) :: set Then you'd pass it as
pure function split(string, set) result(tokens)
character(*), intent(in) :: string
character(*), intent(in) :: set
character(:), allocatable :: tokens(:) and then you call it like this: tokens = split(string, set) which would allow a more functional style by passing So, to that end, I'd propose that in the stdlib we also include this 4th specific procedure, even if it ends up being non-standard, or an extension. So we'd have a total of 4 forms of
|
Unfortunately in Fortran you can not force functions and subroutines under the same interface. So the last split has to be named differently. |
Yes, I remembered this rule later. If people desire this as a function, maybe |
First I expected a function (similar to Second, I don't see the value of the return value Finally, I was wondering why Anyway, this version would be already a great addition in |
Yes, You can't predict the values of |
We should also do comparisons with Python and other languages, as we usually do for stdlib.
…On Sun, Nov 8, 2020, at 11:06 PM, Milan Curcic wrote:
Yes, `pos` is useful for efficiency if you're searching for a delimiter
at a specific position in the string. Both other forms parse the whole
string.
You can't predict the values of `last` based on values of `first`
because if `token(n)` is an empty string, `last(n)` is `first(n)-1`.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#241 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAFAWHVKFSQ2PSCCE7OD43SO6BHTANCNFSM4TBIUC4Q>.
|
Similar capability in other languages: |
The key differences seem to be:
|
Thanks @milancurcic, very helpful, exactly what I was looking for. The first example: I think is natural. That corresponds to the function So I think I like the API of I am not a fond of bundling the other functionality into the same |
I am also not fond of the current API, but even like that, it would be a nice addition. |
I added the This allows the user to do: tokens = string_tokens(string, set) There is also a simple benchmark program in app/main.f90 to compare the run-time between
|
Thank you @milancurcic for the function and the test. Can the difference be explained by calling the subroutine inside the function? Or was the subroutine inlined in the function? |
The subroutine is not inlined: pure function string_tokens(string, set) result(tokens)
!! Splits a string into tokens using characters in set as token delimiters.
character(*), intent(in) :: string
character(*), intent(in) :: set
character(:), allocatable :: tokens(:)
call split_tokens(string, set, tokens)
end function string_tokens but it allocates the function result before returning it to the caller. It's this extra allocation that makes the difference. There is probably some minimal overhead with calling the subroutine, but it should be negligible. |
A long discussion on the |
Comment from Youtube user ES on https://youtube.com/watch?v=HI-Yhn7Q8Ko:
|
@esterjo thank you and welcome! I personally agree and would like new additions to Fortran to first go into a library, get the API ironed out, get some usage, and only then propose it for the Fortran standard itself if there is interest. But other people at the committee have a different opinion on this and voted to have this in the language itself right away. So the second best we can do is to implement this into |
@certik Got it. Thank you for clarifying the situation for me, I wasn't aware. |
Apologies if this isn't the right place, but as a broad comment, if I could make one wish to the ISO gods it would be to make it easier for a user to build libraries. This would make it easier for the language to grow with the community |
@esterjo, you can propose that at https://github.com/j3-fortran/fortran_proposals, just open a new issue and try to describe the features you have in mind in more detail that would make it easier to build libraries. We can discuss it there. |
Hi @esterjo, I think your idea with a "split string" type is good. First we would need to agree upon some encapsulated string type (see #69). There are several disjoint community efforts in this direction already. With the new deferred-length character strings, it is quite simple to use something like: type :: string
character(len=:), allocatable :: s
end type A generic interface can be used to overload the With the great strides |
How about this: There should be a string_array type. This type:
Inheriting form this string_array would be a split_string type:
Some things I like about this kind of implementation:
|
Fortran 202X will include a new intrinsic
split
function:j3-fortran/fortran_proposals#187
It was approved, and then the API was changed after approving it. We should implement the latest approved version in
stdlib
, and play with it and ensure that the API looks good. And if we discover some improvements, we should propose them at the February 2021 Fortran Standards meeting.Then once
split
becomes part of the next Fortran standard, we can have a section instdlib
called "backwards compatibility", where we can have a "reference implementation" of such new features, so that people can use them right away even if some compiler might not support them yet.The text was updated successfully, but these errors were encountered: