Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Routines to handle (allocatable) character arrays #315

Open
awvwgk opened this issue Feb 10, 2021 · 14 comments
Open

Routines to handle (allocatable) character arrays #315

awvwgk opened this issue Feb 10, 2021 · 14 comments
Labels
implementation Implementation in experimental and submission of a PR topic: strings String processing

Comments

@awvwgk
Copy link
Member

awvwgk commented Feb 10, 2021

Another representation of strings rather than the fixed length character or deferred length character variable could be (allocatable) character array. Converting between deferred length character variables and allocatable character arrays is usually possible with the intrinsic transfer function. Especially for C compatible routines that do not yet use the ISO-Fortran binding header on the C side this is a frequently used functionality.

Should stdlib provide helper functions for operations on character arrays? Most of those will be automatic due to elemental functions in stdlib_ascii at some point, but (safe) conversion routines between allocatable character arrays and deferred length characters might be a useful addition (module namespace?).

@awvwgk awvwgk added the topic: utilities containers, strings, files, OS/environment integration, unit testing, assertions, logging, ... label Feb 10, 2021
@ivan-pi
Copy link
Member

ivan-pi commented Mar 5, 2021

Can't one just recover a pointer to the first element of string array (assuming contiguous memory layout)? Then all of the current functions for character scalars can be reused.

The C_F_STRPOINTER subroutine to be introduced Fortran 202X provides exactly this functionality, with the important difference that the length of the string is determined by the position of the null character.

character(len=1), target :: char_array(20)
character(len=:), pointer :: p

block ! to be hidden in a procedure
  character(len=size(char_array)) :: inner_ptr
  call c_f_pointer(c_loc(char_array),inner_ptr)
  p => inner_ptr
end block

@ivan-pi
Copy link
Member

ivan-pi commented Mar 30, 2021

Is there any interest in having a helper function to convert between a literal character set like 'abc' and a set of characters ['a', 'b', 'c']?

For character variables one can get away with [(set(i:i), i = 1, len_trim(set))], which is ok, but still annoying since a new integer variable needs to be declared (at least Fortran 202X will introduce the possibility of declaration in implied do loops, i.e. [(set(i:i), integer :: i = 1, len_trim(set))]).

This proposal is partially motivated by the newly proposed chomp API (see #343), but it would likely be useful in other usage cases too.

pure function str_to_arr_char(string) result(array)
  character(len=*), intent(in) :: string
  character(len=1), dimension(len_trim(string)) :: array
end function
pure function str_to_arr_chunk(string,len) result(array)
  character(len=*), intent(in) :: string
  integer, intent(in) :: len
  integer, parameter :: sz = len_trim(string)/len + merge(0,1,mod(len_trim(string),len) == 0)
  character(len=len), dimension(sz) :: array
end function

A second version allows a string to be broken into an array of fixed-length chunks. Both versions could have an optional trim or asis argument.

@arjenmarkus
Copy link
Member

arjenmarkus commented Mar 30, 2021 via email

@awvwgk
Copy link
Member Author

awvwgk commented Mar 30, 2021

You could always transfer instead of using an array constructor:

pure function string_to_set(string) result(set)
    character(len=*), intent(in) :: string
    character(len=1) :: set(len(string))

    set = transfer(string, ' ', size=len(string))
end function string_to_set

pure function set_to_string(set) result(string)
    character(len=1), intent(in) :: set(:)
    character(len=size(set)) :: string

    string = transfer(set, string)
end function set_to_string

@ivan-pi
Copy link
Member

ivan-pi commented Mar 30, 2021

Is it possible to merge the two? I am not sure if present() can be used to calculate the result array's size, but otherwise you could implement str_arr_char by calling str_arr_chunk. Saves a specific function.

I briefly recall having some problems when using present in a length declaration like:

character(len=merge(len,1,present(len))), dimension( ??? ) :: array

but would need to double check this.

@awvwgk
Copy link
Member Author

awvwgk commented Mar 30, 2021

Note that GCC 8 won't be able to evaluate more complex expressions in the declaration, merge usually fails with GCC 8 but is fine with 9 and newer.

@ivan-pi
Copy link
Member

ivan-pi commented Mar 30, 2021

You could always transfer instead of using an array constructor:

I guess transfer is the better option for inlining.

But putting it into a clear function helps clarify meaning. If I met a statement like chomp(string,transfer('abc', 'a', size=len('abc')), it would take me a while to decipher what is going on, not to mention the ugliness of having to repeat the 'abc' twice and the cryptic 'a' constant.

@awvwgk
Copy link
Member Author

awvwgk commented Mar 30, 2021

I have such a routine in #343 already, but they are not part of the public API proposed

last = verify(string, set_to_string(set), back=.true.)

@ivan-pi
Copy link
Member

ivan-pi commented Mar 30, 2021

On more thought, I would skip any optional trimming and simply follow @awvwgk's version. Users can always do string_to_set(trim(string)) if they want to trim. I doubt there would be many cases where the optional trim argument would be used in a variable context, and this can be replicated using a merge if truly necessary (i.e. merge(string_to_set(trim(string)),string_to_set(string),trim) ).

Addendum: since trim would be optional, we can always add it later in a backwards compatible way if the community requests it.

I am afraid the word set might make users think that only a unique set of characters is preserved.

@ivan-pi
Copy link
Member

ivan-pi commented Mar 30, 2021

pure function set_to_string(set) result(string)
    character(len=1), intent(in) :: set(:)
    character(len=size(set)) :: string

    string = transfer(set, string)
end function set_to_string

I would prefer to see this delegated to a function called join. Too often we are thinking in terms of types and kinds, instead of the operation we are trying to perform.

Makes we wonder if the opposite direction should also be called split or would it clash with the API of #241.

@awvwgk awvwgk added the implementation Implementation in experimental and submission of a PR label Mar 30, 2021
@awvwgk
Copy link
Member Author

awvwgk commented Mar 30, 2021

Fine with me, let's schedule this feature after we arrive at join / split. Since those would all go into stdlib_strings as well we have to plan them sequentially for now.

@arjenmarkus
Copy link
Member

The changes look good to me as well.

@ivan-pi
Copy link
Member

ivan-pi commented Apr 26, 2021

The string_functions module of Dave Frank also provides functions similiar to string_to_set and set_to_string, but puts them under the same generic interface copy (see below). I don't really like this idea, but it does show that someone might find these functions useful.

INTERFACE Copy    ! generic
   MODULE PROCEDURE copy_a2s, copy_s2a
END INTERFACE Copy

CONTAINS
! ------------------------
PURE FUNCTION Copy_a2s(a)  RESULT (s)    ! copy char array to string
CHARACTER,INTENT(IN) :: a(:)
CHARACTER(SIZE(a)) :: s
INTEGER :: i
DO i = 1,SIZE(a)
   s(i:i) = a(i)
END DO
END FUNCTION Copy_a2s

! ------------------------
PURE FUNCTION Copy_s2a(s)  RESULT (a)   ! copy s(1:Clen(s)) to char array
CHARACTER(*),INTENT(IN) :: s
CHARACTER :: a(LEN(s))
INTEGER :: i
DO i = 1,LEN(s)
   a(i) = s(i:i)
END DO
END FUNCTION Copy_s2a

@ivan-pi
Copy link
Member

ivan-pi commented Sep 14, 2021

What is the preferred approach to convert an array of type(string_type) into a fixed-length character array? Would a helper function be useful? (Example is shown below).

The opposite direction I believe is solved by the elemental assignment(=) operator.

type(string_type), allocatable :: st_names(:)
character(len=32), allocatable :: names(:)

allocate(st_names(5))
st_names(1) = string_type("Adam")
..
st_names(5) = string_type("Zion")

names = [(char(st_names(i)), i = 1, size(st_names))]

! helper subprogram
call string_type_to_character(st_names,names,len=maxval(len(st_names)))
subroutine string_type_to_character(sa,ca,len)
  type(string_type), intent(in) :: sa(:)
  character(len=:), allocatable, intent(out) :: ca(:)
  integer, intent(in) :: len

  allocate(character(len=len) :: ca(size(sa)))

  do i = 1, size(sa)
    ca(i) = char(sa(i))
  end do
end subroutine

@awvwgk awvwgk added topic: strings String processing and removed topic: utilities containers, strings, files, OS/environment integration, unit testing, assertions, logging, ... labels Sep 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
implementation Implementation in experimental and submission of a PR topic: strings String processing
Projects
None yet
Development

No branches or pull requests

3 participants