Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend stdlib_ascii module for handling character variables #310

Merged
merged 8 commits into from
Feb 15, 2021

Conversation

awvwgk
Copy link
Member

@awvwgk awvwgk commented Feb 4, 2021

This PR implements some character handling routines discussed in #69, but purposely does not put them into the stdlib_string namespace. There is one little caveat with this implementation due to a namespace collision with the stdlib_ascii module, which might require reconsidering the naming of the functions implemented here.

  • extend to_lower function to work on character strings
  • extend to_upper function to work on character strings
  • implement to_title function
  • implement reverse function

The change in stdlib_ascii might be breaking ABI compatibility with the default branch.

@awvwgk awvwgk added the topic: utilities containers, strings, files, OS/environment integration, unit testing, assertions, logging, ... label Feb 4, 2021
@awvwgk awvwgk mentioned this pull request Feb 4, 2021
@milancurcic
Copy link
Member

Thank you, I think it's mostly a good addition, and much needed functions. A few questions and comments:

  1. Why did you choose stdlib_character over stdlib_string? To me, character means a single character, and these functions work on strings. So I have a preference for stdlib_string, although I don't think stdlib_character is terrible, only misleading.
  2. For to_lower and to_upper, I have a preference for just lower and upper (like Python).

@milancurcic
Copy link
Member

One more question: Why do you need interface definitions in the module?

@milancurcic milancurcic requested review from ivan-pi and jvdp1 and removed request for ivan-pi February 4, 2021 22:00
@awvwgk
Copy link
Member Author

awvwgk commented Feb 4, 2021

From the discussion in #69 I concluded that there will be potentially both, a string type and a module for handling intrinsic character variables. For the former the stdlib_string module should be reserved, therefore the latter might as well be in a separate module (I'm open for a better name).

@milancurcic
Copy link
Member

I just saw your post in #69. I like lowercase/uppercase/titlecase also. I understand the desire to have consistent naming with the ascii module, but I think deciding which names people like the best should take priority over consistency (and may be necessary due to name clash that you pointed out).

Considering planning for a string_type, I'd then prefer either of these two options, higher preference first:

  1. Have both string_type and procedures that work on intrinsics in the same module called stdlib_string;
  2. Have stdlib_string_type module for string_type and stdlib_string for the procedures that work on intrinsics.

@awvwgk
Copy link
Member Author

awvwgk commented Feb 4, 2021

One more question: Why do you need interface definitions in the module?

Mainly because I was trying to overload the to_upper / to_lower interface and get both single character manipulation and character string manipulation in one interface, which of course doesn't work. A future string type would potentially implement procedures under the same generic interface as well.

@awvwgk
Copy link
Member Author

awvwgk commented Feb 4, 2021

Maybe the implementation could be moved into the stdlib_ascii module as well and just replace the existing implementations of to_lower and to_upper instead.

@milancurcic
Copy link
Member

Maybe the implementation could be moved into the stdlib_ascii module as well and just replace the existing implementations of to_lower and to_upper instead.

I like this solution also.

@awvwgk awvwgk marked this pull request as draft February 4, 2021 22:15
- extend to_lower function to work on character strings
- extend to_upper function to work on character strings
- implement to_title function
- implement reverse function
@awvwgk awvwgk changed the title Add stdlib_character module for handling character variables Extend stdlib_ascii module for handling character variables Feb 4, 2021
@awvwgk
Copy link
Member Author

awvwgk commented Feb 4, 2021

Moved the implementation to stdlib_ascii which seems much cleaner than creating a new module. To keep the repo history clean of the intermediary state the commits have been squashed.

@awvwgk awvwgk marked this pull request as ready for review February 4, 2021 22:45
@ivan-pi
Copy link
Member

ivan-pi commented Feb 5, 2021

Thanks for fixing this. This was an initial blunder, the result of following the C routines to literally...

The char_to_upper and char_to_lower remain private to the module right? Do you think there is any benefit to just inline them? The public routines to_upper/to_lover cover the character(len=1) case anyways. For an array of fixed-length characters I hope an optimizing compiler would delete the calls to len and incrementing the loop iterator. In that case there is really no downside.

I adopted the naming to_lower/to_upper initially in conformance with the C and D standard libraries (besides for the underscore, which was added). I am not really fixated to the current naming. If there is community interest we can change these to lowercase/uppercase or whatever else people prefer.

I am still slightly confused by the logic in the to_title routine, but will come back to it today. The rest of the pull request is fine. 👍

@awvwgk
Copy link
Member Author

awvwgk commented Feb 5, 2021

I have to admit I was deliberately opening this pull request to start a discussion on the features mentioned in #69.

I found a few additional caveats of the current stdlib_ascii implementation (especially with respect to arrays of fixed size character variables), which I would like to discuss separately (I can open a new issue or propose a fix with a different (later) PR and we discuss there).

The char_to_upper and char_to_lower remain private to the module right? Do you think there is any benefit to just inline them?

I'm fully trusting the compiler to be capable of inlining a pure function in the same module scope. I can inline them of course, which will require some code duplication, since I'm currently able to reuse char_to_upper and char_to_lower in the to_title function.

@ivan-pi
Copy link
Member

ivan-pi commented Feb 5, 2021

With three functions, I agree it is cleaner this way. No need to inline.

With reverse I was for a second worried that it might be surprising to newcomers since it breaks the left alignment. Upon checking what Python allows, there is no reverse function, but the array slice syntax can be used:

>>> s = "abcd  "
>>> s[::-1] == "  dcba"
True

This is consistent with your proposed behavior, and honestly not a problem, once you are accustome to it.

@ivan-pi
Copy link
Member

ivan-pi commented Feb 5, 2021

I found a few additional caveats of the current stdlib_ascii implementation (especially with respect to arrays of fixed size character variables), which I would like to discuss separately (I can open a new issue or propose a fix with a different (later) PR and we discuss there).

PR works fine for me.

Copy link
Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR. Here are some comments.

Copy link
Member

@ivan-pi ivan-pi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent of the to_title routine is clear to me now.

Copy link
Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The links to the specs in the source code should be added. But it could be done at the end of the review

@awvwgk awvwgk requested review from milancurcic and jvdp1 February 10, 2021 16:36
Copy link
Member

@milancurcic milancurcic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I promoted the four public functions to elemental (it's free lunch) and added tests for elementalness.

I think it's good to go, thank you!

@awvwgk
Copy link
Member Author

awvwgk commented Feb 10, 2021

I promoted the four public functions to elemental (it's free lunch) and added tests for elementalness.

Thanks, I was planning to look into making most of the stdlib_ascii module elemental while collecting the missing specs in a later pull request. But we can already have it here since it is an easy change.

@ivan-pi
Copy link
Member

ivan-pi commented Feb 10, 2021

I promoted the four public functions to elemental (it's free lunch) and added tests for elementalness.

Thanks, I was planning to look into making most of the stdlib_ascii module elemental while collecting the missing specs in a later pull request. But we can already have it here since it is an easy change.

The issue of elemental vs pure for the character validation functions was already raised in the original issue: #11 (comment)
A downside was that if the procedures are elemental they can not be used as procedure pointers. On the other hand the pure procedures can be overloaded to work on rank-1, rank-2, or higher arrays.

If the procedures are made elemental, then the subroutine test_ascii_table at

subroutine test_ascii_table
needs to be refactored/removed.

@milancurcic
Copy link
Member

I see, I wasn't aware of this. It's not free lunch after all. I see 3 options going forward (don't have to be resolved in this PR).

  1. We choose one or the other, we should decide which use case (use these functions on arrays or use them as procedure pointers) is more common or desired by the community. I vote for elemental.

  2. We provide many specific procedures behind a generic name, that would work with arrays and as procedure pointers, as @ivan-pi suggested.

  3. We provide both elemental and non-elemental versions under separate names. Then the user code could import them as:

use stdlib_ascii, only: to_lower => to_lower_elem, to_upper => to_upper_elem

or similar.

Change from pure to elemental requires extra discussion, therefore
inclusion of this patch is deferred.
@awvwgk
Copy link
Member Author

awvwgk commented Feb 10, 2021

I reverted e670262, but left it in the history. We should certainly discuss this separately and implement whatever solution we find most suitable with a different patch.

@ivan-pi
Copy link
Member

ivan-pi commented Feb 10, 2021

From my own experience, I have rarely resorted to using arrays of single characters. Therefore I am not strongly invested in making these procedures elemental. (After all the equivalent C routines are not elemental either...).

On the other hand the usage case as procedure pointers is quite artificial. If one needs to switch between different operations, one could easily replace procedure pointers with a select case statement and an integer flag to perform the right operation.

@milancurcic
Copy link
Member

From my own experience, I have rarely resorted to using arrays of single characters.

Elemental here would apply to multi-char strings as well (the test that I included did). It could be useful to pass output of the upcoming (F202X) split to an elemental to_lower, to_upper, or reverse.

@ivan-pi
Copy link
Member

ivan-pi commented Feb 10, 2021

I agree that for multi-char strings the elemental makes sense. 👍

If I might suggest a different generalization, the character validation functions (is_alpha, is_blank, ...) could be modified to accept multi-char strings, but only test the first character. The intrinsic function iachar follows this type of behavior:

IACHAR(C) returns the code for the ASCII character in the first character position of C.

@awvwgk
Copy link
Member Author

awvwgk commented Feb 15, 2021

What do you think? Are we ready to merge this PR?

@milancurcic
Copy link
Member

Yes, @jvdp1 please merge when ready.

Copy link
Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. I corrected some minor typos in specs and added hyperlinks in the code.
I'll commit them and merge this PR after a last check.

@jvdp1
Copy link
Member

jvdp1 commented Feb 15, 2021

All checks have passed! I'll merge it. thank you @awvwgk for this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: utilities containers, strings, files, OS/environment integration, unit testing, assertions, logging, ...
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants