Extend stdlib_ascii module for handling character variables #310

awvwgk · 2021-02-04T21:37:37Z

This PR implements some character handling routines discussed in #69, but purposely does not put them into the stdlib_string namespace. ~~There is one little caveat with this implementation due to a namespace collision with the stdlib_ascii module, which might require reconsidering the naming of the functions implemented here.~~

extend to_lower function to work on character strings
extend to_upper function to work on character strings
implement to_title function
implement reverse function

The change in stdlib_ascii might be breaking ABI compatibility with the default branch.

milancurcic · 2021-02-04T21:58:54Z

Thank you, I think it's mostly a good addition, and much needed functions. A few questions and comments:

Why did you choose stdlib_character over stdlib_string? To me, character means a single character, and these functions work on strings. So I have a preference for stdlib_string, although I don't think stdlib_character is terrible, only misleading.
For to_lower and to_upper, I have a preference for just lower and upper (like Python).

milancurcic · 2021-02-04T21:59:57Z

One more question: Why do you need interface definitions in the module?

awvwgk · 2021-02-04T22:04:02Z

From the discussion in #69 I concluded that there will be potentially both, a string type and a module for handling intrinsic character variables. For the former the stdlib_string module should be reserved, therefore the latter might as well be in a separate module (I'm open for a better name).

milancurcic · 2021-02-04T22:07:21Z

I just saw your post in #69. I like lowercase/uppercase/titlecase also. I understand the desire to have consistent naming with the ascii module, but I think deciding which names people like the best should take priority over consistency (and may be necessary due to name clash that you pointed out).

Considering planning for a string_type, I'd then prefer either of these two options, higher preference first:

Have both string_type and procedures that work on intrinsics in the same module called stdlib_string;
Have stdlib_string_type module for string_type and stdlib_string for the procedures that work on intrinsics.

awvwgk · 2021-02-04T22:09:58Z

One more question: Why do you need interface definitions in the module?

Mainly because I was trying to overload the to_upper / to_lower interface and get both single character manipulation and character string manipulation in one interface, which of course doesn't work. A future string type would potentially implement procedures under the same generic interface as well.

awvwgk · 2021-02-04T22:11:55Z

Maybe the implementation could be moved into the stdlib_ascii module as well and just replace the existing implementations of to_lower and to_upper instead.

milancurcic · 2021-02-04T22:14:47Z

Maybe the implementation could be moved into the stdlib_ascii module as well and just replace the existing implementations of to_lower and to_upper instead.

I like this solution also.

- extend to_lower function to work on character strings - extend to_upper function to work on character strings - implement to_title function - implement reverse function

awvwgk · 2021-02-04T22:45:42Z

Moved the implementation to stdlib_ascii which seems much cleaner than creating a new module. To keep the repo history clean of the intermediary state the commits have been squashed.

src/tests/ascii/test_ascii.f90

ivan-pi · 2021-02-05T14:00:32Z

Thanks for fixing this. This was an initial blunder, the result of following the C routines to literally...

The char_to_upper and char_to_lower remain private to the module right? Do you think there is any benefit to just inline them? The public routines to_upper/to_lover cover the character(len=1) case anyways. For an array of fixed-length characters I hope an optimizing compiler would delete the calls to len and incrementing the loop iterator. In that case there is really no downside.

I adopted the naming to_lower/to_upper initially in conformance with the C and D standard libraries (besides for the underscore, which was added). I am not really fixated to the current naming. If there is community interest we can change these to lowercase/uppercase or whatever else people prefer.

I am still slightly confused by the logic in the to_title routine, but will come back to it today. The rest of the pull request is fine. 👍

awvwgk · 2021-02-05T14:14:52Z

I have to admit I was deliberately opening this pull request to start a discussion on the features mentioned in #69.

I found a few additional caveats of the current stdlib_ascii implementation (especially with respect to arrays of fixed size character variables), which I would like to discuss separately (I can open a new issue or propose a fix with a different (later) PR and we discuss there).

The char_to_upper and char_to_lower remain private to the module right? Do you think there is any benefit to just inline them?

I'm fully trusting the compiler to be capable of inlining a pure function in the same module scope. I can inline them of course, which will require some code duplication, since I'm currently able to reuse char_to_upper and char_to_lower in the to_title function.

ivan-pi · 2021-02-05T14:24:53Z

With three functions, I agree it is cleaner this way. No need to inline.

With reverse I was for a second worried that it might be surprising to newcomers since it breaks the left alignment. Upon checking what Python allows, there is no reverse function, but the array slice syntax can be used:

>>> s = "abcd  "
>>> s[::-1] == "  dcba"
True

This is consistent with your proposed behavior, and honestly not a problem, once you are accustome to it.

ivan-pi · 2021-02-05T14:34:09Z

I found a few additional caveats of the current stdlib_ascii implementation (especially with respect to arrays of fixed size character variables), which I would like to discuss separately (I can open a new issue or propose a fix with a different (later) PR and we discuss there).

PR works fine for me.

jvdp1

Thank you for this PR. Here are some comments.

doc/specs/stdlib_ascii.md

src/tests/ascii/test_ascii.f90

ivan-pi

The intent of the to_title routine is clear to me now.

doc/specs/stdlib_ascii.md

jvdp1

The links to the specs in the source code should be added. But it could be done at the end of the review

src/stdlib_ascii.f90

Co-authored-by: Jeremie Vandenplas <[email protected]>

milancurcic

I promoted the four public functions to elemental (it's free lunch) and added tests for elementalness.

I think it's good to go, thank you!

awvwgk · 2021-02-10T17:56:02Z

I promoted the four public functions to elemental (it's free lunch) and added tests for elementalness.

Thanks, I was planning to look into making most of the stdlib_ascii module elemental while collecting the missing specs in a later pull request. But we can already have it here since it is an easy change.

ivan-pi · 2021-02-10T20:02:07Z

I promoted the four public functions to elemental (it's free lunch) and added tests for elementalness.

Thanks, I was planning to look into making most of the stdlib_ascii module elemental while collecting the missing specs in a later pull request. But we can already have it here since it is an easy change.

The issue of elemental vs pure for the character validation functions was already raised in the original issue: #11 (comment)
A downside was that if the procedures are elemental they can not be used as procedure pointers. On the other hand the pure procedures can be overloaded to work on rank-1, rank-2, or higher arrays.

If the procedures are made elemental, then the subroutine test_ascii_table at

stdlib/src/tests/ascii/test_ascii.f90

Line 487 in a1c911c

subroutine test_ascii_table

needs to be refactored/removed.

milancurcic · 2021-02-10T20:11:54Z

I see, I wasn't aware of this. It's not free lunch after all. I see 3 options going forward (don't have to be resolved in this PR).

We choose one or the other, we should decide which use case (use these functions on arrays or use them as procedure pointers) is more common or desired by the community. I vote for elemental.
We provide many specific procedures behind a generic name, that would work with arrays and as procedure pointers, as @ivan-pi suggested.
We provide both elemental and non-elemental versions under separate names. Then the user code could import them as:

use stdlib_ascii, only: to_lower => to_lower_elem, to_upper => to_upper_elem

or similar.

Change from pure to elemental requires extra discussion, therefore inclusion of this patch is deferred.

awvwgk · 2021-02-10T20:22:33Z

I reverted e670262, but left it in the history. We should certainly discuss this separately and implement whatever solution we find most suitable with a different patch.

ivan-pi · 2021-02-10T20:25:14Z

From my own experience, I have rarely resorted to using arrays of single characters. Therefore I am not strongly invested in making these procedures elemental. (After all the equivalent C routines are not elemental either...).

On the other hand the usage case as procedure pointers is quite artificial. If one needs to switch between different operations, one could easily replace procedure pointers with a select case statement and an integer flag to perform the right operation.

milancurcic · 2021-02-10T20:31:28Z

From my own experience, I have rarely resorted to using arrays of single characters.

Elemental here would apply to multi-char strings as well (the test that I included did). It could be useful to pass output of the upcoming (F202X) split to an elemental to_lower, to_upper, or reverse.

ivan-pi · 2021-02-10T20:52:07Z

I agree that for multi-char strings the elemental makes sense. 👍

If I might suggest a different generalization, the character validation functions (is_alpha, is_blank, ...) could be modified to accept multi-char strings, but only test the first character. The intrinsic function iachar follows this type of behavior:

IACHAR(C) returns the code for the ASCII character in the first character position of C.

awvwgk · 2021-02-15T15:09:48Z

What do you think? Are we ready to merge this PR?

milancurcic · 2021-02-15T16:08:57Z

Yes, @jvdp1 please merge when ready.

jvdp1

Sorry for the delay. I corrected some minor typos in specs and added hyperlinks in the code.
I'll commit them and merge this PR after a last check.

doc/specs/stdlib_ascii.md

src/stdlib_ascii.f90

jvdp1 · 2021-02-15T18:41:40Z

All checks have passed! I'll merge it. thank you @awvwgk for this PR.

awvwgk added the topic: utilities containers, strings, files, OS/environment integration, unit testing, assertions, logging, ... label Feb 4, 2021

awvwgk mentioned this pull request Feb 4, 2021

String handling routines #69

Open

milancurcic requested review from ivan-pi and jvdp1 and removed request for ivan-pi February 4, 2021 22:00

awvwgk marked this pull request as draft February 4, 2021 22:15

Extend stdlib_ascii module for handling character variables

6720abd

- extend to_lower function to work on character strings - extend to_upper function to work on character strings - implement to_title function - implement reverse function

awvwgk force-pushed the char-procedures branch from e17034b to 6720abd Compare February 4, 2021 22:41

awvwgk changed the title ~~Add stdlib_character module for handling character variables~~ Extend stdlib_ascii module for handling character variables Feb 4, 2021

awvwgk marked this pull request as ready for review February 4, 2021 22:45

ivan-pi reviewed Feb 5, 2021

View reviewed changes

src/tests/ascii/test_ascii.f90 Show resolved Hide resolved

ivan-pi reviewed Feb 5, 2021

View reviewed changes

src/tests/ascii/test_ascii.f90 Show resolved Hide resolved

jvdp1 reviewed Feb 5, 2021

View reviewed changes

ivan-pi approved these changes Feb 5, 2021

View reviewed changes

doc/specs/stdlib_ascii.md Outdated Show resolved Hide resolved

jvdp1 reviewed Feb 5, 2021

View reviewed changes

doc/specs/stdlib_ascii.md Outdated Show resolved Hide resolved

jvdp1 reviewed Feb 5, 2021

View reviewed changes

src/stdlib_ascii.f90 Show resolved Hide resolved

src/stdlib_ascii.f90 Show resolved Hide resolved

src/stdlib_ascii.f90 Show resolved Hide resolved

src/stdlib_ascii.f90 Show resolved Hide resolved

Apply restructring of specs from review

e47f028

Co-authored-by: Jeremie Vandenplas <[email protected]>

awvwgk added 3 commits February 5, 2021 23:39

Update documentation comments in stdlib_ascii

591678d

Fix link generation and code block issues with FORD

c92f414

Reword capitalization rules

59b195e

awvwgk requested review from milancurcic and jvdp1 February 10, 2021 16:36

promote public functions to elemental

e670262

milancurcic approved these changes Feb 10, 2021

View reviewed changes

Revert e670262

546fece

Change from pure to elemental requires extra discussion, therefore inclusion of this patch is deferred.

milancurcic mentioned this pull request Feb 13, 2021

Proposal for clamp #319

Closed

awvwgk mentioned this pull request Feb 14, 2021

Extend stdlib_ascii logical functions to character strings #321

Open

jvdp1 approved these changes Feb 15, 2021

View reviewed changes

Apply suggestions from code review

6e9574b

jvdp1 merged commit ae2e38c into fortran-lang:master Feb 15, 2021

awvwgk deleted the char-procedures branch February 15, 2021 18:43

awvwgk mentioned this pull request Mar 15, 2021

Added to_lower, to_upper, reverse and to_title function to stdlib_string_type.f90 file #346

Merged

ivan-pi mentioned this pull request Apr 25, 2021

Should reverse be in stdlib_string? #399

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend stdlib_ascii module for handling character variables #310

Extend stdlib_ascii module for handling character variables #310

awvwgk commented Feb 4, 2021 •

edited

Loading

milancurcic commented Feb 4, 2021

milancurcic commented Feb 4, 2021

awvwgk commented Feb 4, 2021

milancurcic commented Feb 4, 2021

awvwgk commented Feb 4, 2021

awvwgk commented Feb 4, 2021 •

edited

Loading

milancurcic commented Feb 4, 2021

awvwgk commented Feb 4, 2021

ivan-pi commented Feb 5, 2021

awvwgk commented Feb 5, 2021

ivan-pi commented Feb 5, 2021

ivan-pi commented Feb 5, 2021

jvdp1 left a comment

ivan-pi left a comment

jvdp1 left a comment

milancurcic left a comment

awvwgk commented Feb 10, 2021

ivan-pi commented Feb 10, 2021

milancurcic commented Feb 10, 2021

awvwgk commented Feb 10, 2021

ivan-pi commented Feb 10, 2021 •

edited

Loading

milancurcic commented Feb 10, 2021

ivan-pi commented Feb 10, 2021

awvwgk commented Feb 15, 2021

milancurcic commented Feb 15, 2021

jvdp1 left a comment

jvdp1 commented Feb 15, 2021

Extend stdlib_ascii module for handling character variables #310

Extend stdlib_ascii module for handling character variables #310

Conversation

awvwgk commented Feb 4, 2021 • edited Loading

milancurcic commented Feb 4, 2021

milancurcic commented Feb 4, 2021

awvwgk commented Feb 4, 2021

milancurcic commented Feb 4, 2021

awvwgk commented Feb 4, 2021

awvwgk commented Feb 4, 2021 • edited Loading

milancurcic commented Feb 4, 2021

awvwgk commented Feb 4, 2021

ivan-pi commented Feb 5, 2021

awvwgk commented Feb 5, 2021

ivan-pi commented Feb 5, 2021

ivan-pi commented Feb 5, 2021

jvdp1 left a comment

Choose a reason for hiding this comment

ivan-pi left a comment

Choose a reason for hiding this comment

jvdp1 left a comment

Choose a reason for hiding this comment

milancurcic left a comment

Choose a reason for hiding this comment

awvwgk commented Feb 10, 2021

ivan-pi commented Feb 10, 2021

milancurcic commented Feb 10, 2021

awvwgk commented Feb 10, 2021

ivan-pi commented Feb 10, 2021 • edited Loading

milancurcic commented Feb 10, 2021

ivan-pi commented Feb 10, 2021

awvwgk commented Feb 15, 2021

milancurcic commented Feb 15, 2021

jvdp1 left a comment

Choose a reason for hiding this comment

jvdp1 commented Feb 15, 2021

awvwgk commented Feb 4, 2021 •

edited

Loading

awvwgk commented Feb 4, 2021 •

edited

Loading

ivan-pi commented Feb 10, 2021 •

edited

Loading