Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String list new #311

Closed
wants to merge 2 commits into from
Closed

Conversation

arjenmarkus
Copy link
Member

No description provided.

The API for the module to manipulate lists of strings has been discussed and this has resulted in the current implementation
There was a typo in some comments - corrected
@milancurcic
Copy link
Member

Thank you for pushing this PR forward. Note #320 by @awvwgk that proposes a non-fancy string_type.

stdlib should provide a uniform and consistent model of strings, whatever that model may look like. So if there's a string_type that provides the string essentials, stringlist_type should be a list of string_type instances. Do you agree?

It seems to me that the stringlist_type can be relatively easily refactored to work with string_type from #320.

@ivan-pi
Copy link
Member

ivan-pi commented Feb 15, 2021

@epagone wrote:

For example, I did raise a related issue while reviewing the string list PR (together with other observations) where the namespace was polluted with a list_end variable and I suggested to encapsulated it within the derived type. Well, although we got to string list "new", well, let's say that my comment did not get much traction smiley

I must admit, when doing a preliminary review of this pull request I was confused by the list_head and list_end derived-type instances. Are these two objects part of the public API and can they be reused for multiple string_list_type instances simultaneously?

@ivan-pi
Copy link
Member

ivan-pi commented Feb 15, 2021

stdlib should provide a uniform and consistent model of strings, whatever that model may look like. So if there's a string_type that provides the string essentials, stringlist_type should be a list of string_type instances. Do you agree?

Not @arjenmarkus, but the same train of thought emerged from discussion with @epagone in #268 (comment).

I guess it just depends which PR (#311 or #320) reaches maturity first. As long as the behavior of the stringlist_type is not coupled tightly to the string objects in the list, I don't see any problems.

@arjenmarkus
Copy link
Member Author

arjenmarkus commented Feb 15, 2021 via email

@arjenmarkus
Copy link
Member Author

arjenmarkus commented Feb 15, 2021 via email

@ivan-pi
Copy link
Member

ivan-pi commented Feb 15, 2021

Thanks @arjenmarkus for the explanation, it is consistent with my understanding of the desired behavior. I am impressed by the smart design, that a single list_end can be (re-)used with multiple objects of type(stringlist_type). (EDIT) An example might be more meaningful:

use stdlib_stringlist

type(stringlist_type) :: list1, list2

call list1%insert(1, ["A", "B"] )
call list2%insert(1, ["C", "D", "E", "F"])

print '(A,1X,A)', list1%get(list_end), list2%get(list_end) ! prints "B F"

Does the list_head object remain private because it is equivalent to the integer index 1?

@ivan-pi
Copy link
Member

ivan-pi commented Feb 15, 2021

Is this allowed (with cnversion to string_type under the hood): call list%insert( 1, "A new string" ) Or do we require: call list%insert( 1, string_type("A new string") )

My gut feeling is that if the non-fancy string type in #320 is accepted we should allow both. This could imply using the string_type proposed in #320 as the underlying data type, but it is not required. As Arjen emphasized initially in #269 (highlights are mine):

This proposal considers the features that would be useful for this derived type, a list of strings. It does not prescribe the methods for implementing the type. Given the ease by which arrays are handled in Fortran, it is possible to use allocatable arrays, but perhaps a linked list may prove to be more efficient, at least in certain scenarios of use.

Perhaps the more interesting question is whether list%get() should only return the intrinsic character type (as it does currently) or a similar method should exist that returns an object of type(string_type) (whatever this object might look like). I would solicit the opinion of @awvwgk.

@awvwgk
Copy link
Member

awvwgk commented Feb 15, 2021

Preferably we return a fixed length character whenever possible, string_type instances can be generated on-the-fly by assignment. This was my original plan for the string_type as well, turns out this is not working with GCC 8, so I had to go for returning string_type instances instead.

@epagone
Copy link

epagone commented Feb 15, 2021

I am not sure I understand this. The two objects are intended to be convenient offsets into the lists and as such are part of the public API and you can use them for any list. They translate to integer indices specific for the list in question.

@epagone, can you elaborate? How would you encapsulate list_end with the derived type? I thought I solved your critique by not using special integer values but instead independent derived types (actually there are only two objects of that type, as the type itself is private). Perhaps I misunderstood what you suggested.

@arjenmarkus I very much appreciate that you considered my previous comments (and I think that you also understood my comments correctly). I think that your design to make a "generic" list_end derived type might be too clever for some less-CS-literate Fortraneers like me: I did not understand that it was meant to be valid for any list. It might also confuse some users that might be wondering where list_start or list_head is (the latter is private instead, IIUC).

Ideally, I would like to use this feature of stdlib according to two alternative scenarios.

  1. This module will become a dependency (or an extension?) of a generic list module;
  2. list_end is an integer encapsulated in type(stringlist_type).

PS: I still cannot bite my tongue and avoid saying that I do not like that we can create "holes" in the list with acceptable "out-of-bounds" indices (unless I have misunderstood your design again 😅 ).

@epagone
Copy link

epagone commented Feb 15, 2021

Is this allowed (with cnversion to string_type under the hood):

call list%insert( 1, "A new string" )

Or do we require:

call list%insert( 1, string_type("A new string") )

FWIW, I'd prefer the first option.

@ivan-pi
Copy link
Member

ivan-pi commented Feb 15, 2021

PS: I still cannot bite my tongue and avoid saying that I do not like that we can create "holes" in the list with acceptable "out-of-bounds" indices (unless I have misunderstood your design again sweat_smile ).

I think the idea of the proposal is that an "out-of-bounds" access is simply translated to an empty string.

@arjenmarkus
Copy link
Member Author

arjenmarkus commented Feb 15, 2021 via email

@arjenmarkus
Copy link
Member Author

arjenmarkus commented Feb 15, 2021 via email

@epagone
Copy link

epagone commented Feb 15, 2021

Preferably we return a fixed length character whenever possible, string_type instances can be generated on-the-fly by assignment. This was my original plan for the string_type as well, turns out this is not working with GCC 8, so I had to go for returning string_type instances instead.

Are we sure that we want to be limited by GCC 8 bugs? I understand it's not a super-old release, but considering that stdlib is still experimental, we can afford to be a bit more nimble in this sense.

@awvwgk
Copy link
Member

awvwgk commented Feb 15, 2021

This means the current implementation allows to append at the end of a string list by:

call list%insert(huge(0), "value")

@epagone
Copy link

epagone commented Feb 15, 2021

Accessing an out-of-bounds element is indeed interpreted as an empty
string, but you cannot insert a string in, say, position 100, if there are
only 3 strings filled. It would become string no. 4.

Understood, thanks. I like this solution better than the previous one, but I still think that this opens the door to hard to find bugs when we do not enforce a strict check on bounds.

@ivan-pi
Copy link
Member

ivan-pi commented Feb 15, 2021

This means the current implementation allows to append at the end of a string list by:

call list%insert(huge(0), "value")

According to this answer (what is the biggest array size for double precision in Fortran 90?):

The Fortran language standards don't define a limit to the size of arrays that a program can (attempt to) declare or allocate. In practice you may find that your compiler limits the total number of elements in an array to 2^31-1 or 2^63-1, depending on whether your default integer size is 32- or 64-bits. You may find that the maximum size of any dimension of an array is also limited to the same values.

The actual array size seems to be limited by the fact, the size of the array in bytes must fit in a size_t C variable (assuming the OS is written in C). So it might be possible to go past the range of huge(0) if overloaded versions of insert for larger integer indexes are provided.

Of course the amount of available RAM is the second limitation which enters into consideration, particularly when the stringlist_type attempts to reallocate new memory.

@awvwgk
Copy link
Member

awvwgk commented Feb 15, 2021

I just want to note that using list_end can be inconvenient in some situations, i.e. when creating an array of indices to perform operations on a string list type, I can't mix prepending (0) and appending (list_end) in the same array, which would force me to use huge(0) instead of list_end in such a cases.

A cyclic indexing system could address this problem

  • 0, -len(list)-1: before the first element of the list
  • 1, -len(list): first element of the list
  • len(list), -2: last element of the list
  • len(list)+1, -1: after the last element of the list

@ivan-pi
Copy link
Member

ivan-pi commented Feb 15, 2021

Should the insert and append provide an optional stat argument to query for allocation errors?

(Alternatively one could pursue an impure logical function interface for these two methods, where .true. or .false. is returned depending on whether the operation was successful or not. Not sure if this is very useful though.)

@epagone
Copy link

epagone commented Feb 15, 2021

Should the insert and append provide an optional stat argument to query for allocation errors?

Thank you. IMO this is the best option for its flexibility.

@ivan-pi
Copy link
Member

ivan-pi commented Feb 15, 2021

A cyclic indexing system could address this problem

  • 0, -len(list)-1: before the first element of the list
  • 1, -len(list): first element of the list
  • len(list), -2: last element of the list
  • len(list)+1, -1: after the last element of the list

Would this match the Python semantics as mentioned in #268 (comment) by @certik?

@awvwgk
Copy link
Member

awvwgk commented Feb 15, 2021

I made a mistake in the comment above, sorry about that, ignore the list there, I made some clearer tables below.

Python is using a zero based indexing in the forward direction and one based indexing in the backwards direction. This approach does not allow to access beyond the first item with the forward index and also you cannot access after the last element with the backwards index:

head A B ... Y Z end
0 1 ... n–2 n–1 n
–(n+1) –n 1–n ... –2 –1

For Fortran we would use a one based forward index and maybe also a one based backwards index, but we will have one index leftover in this approach, which is zero:

head A B ... Y Z end
1 2 ... n–1 n n+1
–(n+1) –n 1–n ... –2 –1

The cyclic indexing I wanted to propose was using all possible indices with:

head A B ... Y Z end
0 1 2 ... n–1 n n+1
–(n+2) –(n+1) –n ... –3 –2 –1

The one based indexing for both forward and backwards direction looks more intuitive, the only question is what do we do with the zero in this approach?

@arjenmarkus
Copy link
Member Author

arjenmarkus commented Feb 16, 2021 via email

@arjenmarkus
Copy link
Member Author

arjenmarkus commented Feb 16, 2021 via email

@ivan-pi
Copy link
Member

ivan-pi commented Mar 5, 2021

I just want to note that using list_end can be inconvenient in some situations, i.e. when creating an array of indices to perform operations on a string list type, I can't mix prepending (0) and appending (list_end) in the same array, which would force me to use huge(0) instead of list_end in such a cases.

Could list_end be given an integer member or method, which would return the correct integer position? This way we could avoid the need for a cyclic indexing system. But it would imply some type of array reference structure internal to the stringlist_type module, which would maintain an array of list_end indexes, and a stringlist_type member method to recover the correct list index objext:

type(stringlist_type) :: list

associate(end => list%end()) ! list_end points to a private stringlist_index_t instance

! append to end of the list
call list%insert(end,"one")
call list%insert(end,"two")
call list%insert(end,"three")

! prepend using an integer position index (in this case head)
call list%insert(0,"zero")

! Print values "one", "two", "three"
do i = 2, end%as_int() ! ideally, this would be a protected member component, i.e. `end%idx`
   print *, list%get(i)
end do

end associate

Edit: I have devised an integer list as proof of concept. It appears to work correctly with Intel Fortran, but with gfortran (gcc version 10.1.0), the associate construct does not work correctly. The code can be found at: https://gist.github.com/ivan-pi/e2181b4335a1d93cbeb3d422c6732cfa

Returned results
(myenv) ipribec@ipribec-T530:~/fortran$ ifort -warn all integer_list_mod.f90 
(myenv) ipribec@ipribec-T530:~/fortran$ ./a.out
           3
           4
           4
           1           2           3           4           5           6
           7           8
(myenv) ipribec@ipribec-T530:~/fortran$ gfortran -Wall integer_list_mod.f90 
(myenv) ipribec@ipribec-T530:~/fortran$ ./a.out
           3
           4
           4

call append_stringlist%insert( list_after_end, slist )
end function append_stringlist

function append_stringarray( list, sarray )

This comment was marked as resolved.

@awvwgk
Copy link
Member

awvwgk commented Mar 13, 2021

With the string_type merged, can this patch make use of the newly available functionality?

@arjenmarkus
Copy link
Member Author

arjenmarkus commented Mar 22, 2021 via email

@arjenmarkus
Copy link
Member Author

arjenmarkus commented Mar 22, 2021 via email

@aman-godara

This comment has been minimized.

@arjenmarkus
Copy link
Member Author

arjenmarkus commented Mar 31, 2021 via email

@arjenmarkus
Copy link
Member Author

arjenmarkus commented Mar 31, 2021 via email

@arjenmarkus
Copy link
Member Author

arjenmarkus commented Mar 31, 2021 via email

@awvwgk awvwgk added the waiting for OP This patch requires action from the OP label Apr 17, 2021
@awvwgk
Copy link
Member

awvwgk commented May 31, 2021

What is the current status of this patch? I marked it as waiting for changes, but maybe this is incorrect and we should rather look for more reviewers. @arjenmarkus, what do you think?

@arjenmarkus
Copy link
Member Author

arjenmarkus commented May 31, 2021 via email

@arjenmarkus
Copy link
Member Author

arjenmarkus commented Jun 4, 2021 via email

@Jim-215-Fisher
Copy link
Contributor

Jim-215-Fisher commented Jun 29, 2021

This is a bit of confusing. We have string_type and strings in stdlib, now we have another string_list_new. Can we combine them all together?

@aman-godara
Copy link
Member

This might help: #399 (comment).

else
do i = idxn, list%size
inew = i + number
call move_alloc( list%string(i)%value, list%string(inew)%value )
Copy link
Member

@aman-godara aman-godara Jun 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move_alloc(FROM, TO)
@arjenmarkus if I have understood it correctly what happens if there is already a string at index inew (TO)?
If it gets overwritten we will lose what was stored at index inew.

for example:
list = ["1", "2", "3", "4", "5", "6", "7", "8"]
and something like this executes:

do i = 3, 8
    inew = i + 3
    call move_alloc(list%string(i)%value , list%string(inew)%value)

Then I will get
["1", "2", _, _, _, _, _, _, "3", "4", "5"]
_ represents that it is deallocated.

@aman-godara aman-godara mentioned this pull request Jul 21, 2021
10 tasks
@milancurcic
Copy link
Member

Closed in favor of #470

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting for OP This patch requires action from the OP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants