Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String list #269

Closed
wants to merge 4 commits into from
Closed

String list #269

wants to merge 4 commits into from

Conversation

arjenmarkus
Copy link
Member

Here is a very preliminary implementation of a modules for handling lists of strings (see issue #268). The implementation is based on a derived type that holds an array of separate strings, but the suggestion made by @esterjo of using a single long string and an indexing array is well worth considering. Therefore consider this as a proposal for the behaviour and the interface.

@jvdp1 jvdp1 mentioned this pull request Dec 18, 2020
Copy link
Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice progress. Here are a few minor comments.
It seems that the position of the elements starts at 0 until n-1? Did I understand it correctly?

@milancurcic
Copy link
Member

I agree with @jvdp1 regarding t_string -> string_type and t_stringlist -> stringlist_type.

The module should be called stdlib_stringlist (like the source file).

Would stringarray not be a more appropriate name for this derived type?

@arjenmarkus
Copy link
Member Author

arjenmarkus commented Dec 21, 2020 via email

@jvdp1
Copy link
Member

jvdp1 commented Dec 21, 2020

And I see why there are build failures: no documentation yet. Well, I will work on that.

It is partly due to that. There is also the issue that gfortran is called now gfortran-10 in MacOS. This PR #270 solves this issue. So you might want to rebase your branch to pass this action.

@milancurcic
Copy link
Member

Just one remark: "initialise" is quite acceptable according to a dictionary I checked (even though it is UK spelling).

It is, it's why I scolded the ReviewDog. :) I think we can just ignore it.

This commit introduces a very preliminary version of a module to handle lists of strings (that is: a structure containing strings of varying lengths)
Added a function to sort the list of strings (implementation inspired by the alternative set-up of a long string with indices)
@arjenmarkus
Copy link
Member Author

arjenmarkus commented Dec 22, 2020 via email

Add documentation of the interface for stdlib_stringlist.
Add several subroutines and functions. Note, however, that it is not quite complete (see the head of stdlib_stringliust.f90 for some details)
procedure :: index => index_of_string
procedure :: index_sub => index_of_substring
procedure :: delete => delete_strings
procedure :: range => range_list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First quick comment: shouldn't a final procedure (that potential calls the method destroy) be present?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could actually be a dummy - allocatables get cleaned up automatically.

Copy link
Member

@awvwgk awvwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked over the API documentations and left a few comments on the API design. Mainly the special positions list_head and list_end are inconsistent in my opinion.

Comment on lines +44 to +45
The special position `list_head` represents the position *before* the first element. Likewise,
the special position `list_end` represents the position of the *last* element. You can
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels inconsistent to me. list_head represents an element outside of the list, while list_end represents an element in the list. I would prefer either having both special indices refer to elements inside or outside the list, but not mixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One surprising consequence of the current choice of list_head (0) and list_end (-1) is that appending with insert is impossible:

call list%insert(list_end + 1, "last item")

Because it will evaluate to list_head and prepend instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is that ... I hope to solve this with special types rather than special values.


#### Result value

The result is either the index of the string in the list or -1 if the string was not found
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is -1 chosen here? The intrinsics index/scan/verify return 0 on “failure” rather than -1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, 0 is more consistent. Just corrected that


#### Class

Assignment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Assignment
Operator

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The standard makes a difference between assignment and operator ... I can live with either, though.

@arjenmarkus
Copy link
Member Author

arjenmarkus commented Jan 4, 2021 via email

@arjenmarkus
Copy link
Member Author

I have been thinking about the use of integers to indicate the head and the end of the list. Clearly this is a bad design choice, as there is no way to make it unambiguous. My current idea is to use instead a special derived type, something along the lines:

type list_special_index_type
logical :: head = .true.
integer :: offset = 0
end type
with appropriate operations + and - for defining an offset. Then there can be no confusion about the meaning.

(Of course, a bit more work, but that is limited :)).

@epagone
Copy link

epagone commented Jan 6, 2021

I have been thinking about the use of integers to indicate the head and the end of the list. Clearly this is a bad design choice, as there is no way to make it unambiguous. My current idea is to use instead a special derived type, something along the lines:

type list_special_index_type
logical :: head = .true.
integer :: offset = 0
end type
with appropriate operations + and - for defining an offset. Then there can be no confusion about the meaning.

(Of course, a bit more work, but that is limited :)).

I really like this solution: it seems to address at once most of the shortcomings that I have identified. I had a very limited amount of time to play with it but the current implementation seems to work already very well.

The aspects that I did not like much.

  • As already commented in the related issue, I believe that the sorting method should not be part of this PR but it should be included in the stdlib sorting module and stdlib_stringlist should depend on it.
  • I would prefer to encapsulate into the derived type the control variables (e.g. list_head, list_end in the current implementation, head and offset in the future one) to not "pollute" the user namespace. It is a bit more verbose but I think it's a better choice.
  • I do not understand fully the need to hard code the initial_size = 20 parameter to a somewhat arbitrary value that also "pollutes" the user namespace (see previous point).
  • I personally believe that the only way to let the user add an element to an already "initialised" list should be with the append method. Any other attempt to add new elements "out of bounds" should return an error. Quite strict, I know, but I think it's safer.

Thanks @arjenmarkus for doing this!

@ivan-pi
Copy link
Member

ivan-pi commented Jan 10, 2021

@arjenmarkus I suspect you initiated this project in an old branch. The stdlib_ascii module is the old version you already modified (and is now in the master branch). The solution would be to rebase against the master branch.

@arjenmarkus
Copy link
Member Author

arjenmarkus commented Jan 11, 2021 via email

@milancurcic
Copy link
Member

Closing in favor of #311.

@milancurcic milancurcic closed this Feb 5, 2021
@ivan-pi ivan-pi mentioned this pull request Feb 15, 2021
@milancurcic milancurcic deleted the string-list branch March 5, 2021 15:56
@aman-godara aman-godara mentioned this pull request Jul 23, 2021
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants