Add module for validating ASCII characters and upper/lower conversion #32

ivan-pi · 2019-12-22T00:41:59Z

This PR addresses #11 (there are a few open question left there).

Tested with both the gfortran and Intel Fortran compilers for the 'default' (ascii) character set.

The tests are essentially a port of those at https://github.com/dlang/phobos/blob/master/std/ascii.d (hopefully not a licensing issue?).

…r conversion

certik · 2019-12-22T14:08:08Z

Would you mind rebasing on top of the latest master to pick up the CI tests?

certik · 2019-12-22T14:41:22Z

src/tests/ascii/test_ascii.f90

+program test_ascii
+
+    use stdlib_experimental_error, only: assert
+    use stdlib_experimental_ascii


Can you please explicitly import the symbols that are being used? I think we should follow that approach of "explicit imports", as in Python.

(The reason I noticed this is that I came here to see what the public API actually is. As it is a bit hard to immediately see from the module itself, because there is not a single public line, but rather quite a few symbols are decorated with public.)

I think we should have private declared at the top of all modules, and below one or more lists of public :: ...

@zbeekman, I agree, it makes it very easy to see what the public symbols are.

certik · 2019-12-22T14:54:57Z

So I think this looks really nice. The public API seems nicely done, the naming convention I think is good.

Will the is_* functions work with utf8 strings?

The to_upper and to_lower will only work on the ascii parts of an utf8 string I assume.
Python calls these just upper and lower methods (they work with Unicode). In C, they are called toupper and tolower. Finally in C++, Boost has to_upper and to_lower. There is also a nice table here: #11 (comment)

I think using to_upper is readable.

certik · 2019-12-22T15:00:30Z

Python also has an isupper method. Should we also implement is_upper and is_lower?

certik · 2019-12-22T15:04:08Z

General question: should we maintain stdlib_ascii and stdlib_unicode, or should we simply have stdlib_string, and all methods would automatically work with utf8?

I think getting the unicode working (as in Python) will be some work, so we can start with stdlib_ascii, then in another PR we can see how many of these extend to unicode and create stdlib_unicode or merge them. Since this is all in experimental, we can do that.

Things like to_upper just need to be extended to support Unicode (utf8), see the Python documentation. Here is an example in Python:

>>> name = "Ondřej Čertík"
>>> name
'Ondřej Čertík'
>>> name.upper()
'ONDŘEJ ČERTÍK'
>>> name.lower()
'ondřej čertík'

You can see the upper and lower works with Unicode characters (utf8).

ivan-pi · 2019-12-22T21:08:10Z

I think we should have separate modules for ASCII and Unicode characters. In fact using only the intrinsic Fortran character functions (achar and iachar) it is not possible to find say uppercase Slavic letters č, š, ž... I think it will be necessary to interface with C to achieve Unicode support. A second issue is that some preprocessing will be necessary, as not all Fortran compilers support the extended Unicode character set.

The current ascii module already contains is_upper and is_lower characters. At the moment all functions in the ascii module are limited to work on single characters. They are meant as support functions for a separate string module.

I will rebase and import explicitly the public functions for the test driver asap.

certik · 2019-12-22T21:39:06Z

@ivan-pi yes, if it is not possible to merge unicode with ascii, then we need two modules.

certik · 2019-12-22T21:58:36Z

The other thing is --- since you used https://github.com/dlang/phobos/blob/434429f273d0359744b6d3ba9db36d3bef1c7593/std/ascii.d as the original, we have to cite their license.

Overall this looks good to me. It would be nice to get some more reviews on this before we merge. @marshallward, @jacobwilliams, @milancurcic do you have any feedback on the API here?

milancurcic

Great, thanks @ivan-pi!

Interesting format for comments (some documentation generator?) but fine with me.

Good to merge IMO.

certik · 2019-12-24T06:23:25Z

@milancurcic thanks for the review.

@ivan-pi would you mind updating whitechar in stdlib_experimental_io to use your new functionality please? That should save some code.

dev-zero · 2020-04-24T10:28:40Z

since I've been pointed here, this project might be interesting: https://github.com/lemire/fastvalidate-utf-8 althought I don't see how that could be implemented in Fortran given missing inline assembly.

certik · 2020-04-24T16:17:41Z

@dev-zero thanks! I made a comment in #11 (comment).

Added module for validating ASCII characters and upper/lower characte…

fc0ce9f

…r conversion

ivan-pi changed the title ~~Added module for validating ASCII characters and upper/lower conversion~~ Add module for validating ASCII characters and upper/lower conversion Dec 22, 2019

certik mentioned this pull request Dec 22, 2019

Test every PR and master on Linux #33

Merged

Add comments to test_ascii_table

243d147

certik reviewed Dec 22, 2019

View reviewed changes

Explicitly imported public functions and constants in tests

0c64516

Merge https://github.com/fortran-lang/stdlib into ascii

c1a51e2

milancurcic self-requested a review December 24, 2019 01:32

milancurcic approved these changes Dec 24, 2019

View reviewed changes

certik merged commit 8ef2f81 into fortran-lang:master Dec 24, 2019

certik mentioned this pull request Dec 30, 2019

improve cmake build #51

Merged

jvdp1 mentioned this pull request Apr 24, 2020

What should be part of stdlib? #1

Open

ivan-pi mentioned this pull request Apr 24, 2020

Proposal for ascii #11

Open

jvdp1 mentioned this pull request Nov 17, 2024

promote ascii functions to elemental #886

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add module for validating ASCII characters and upper/lower conversion #32

Add module for validating ASCII characters and upper/lower conversion #32

ivan-pi commented Dec 22, 2019

certik commented Dec 22, 2019 •

edited

Loading

certik Dec 22, 2019

certik Dec 22, 2019

zbeekman Dec 22, 2019

certik Dec 22, 2019

certik commented Dec 22, 2019 •

edited

Loading

certik commented Dec 22, 2019

certik commented Dec 22, 2019 •

edited

Loading

ivan-pi commented Dec 22, 2019

certik commented Dec 22, 2019

certik commented Dec 22, 2019

milancurcic left a comment

certik commented Dec 24, 2019 •

edited

Loading

dev-zero commented Apr 24, 2020

certik commented Apr 24, 2020

Add module for validating ASCII characters and upper/lower conversion #32

Add module for validating ASCII characters and upper/lower conversion #32

Conversation

ivan-pi commented Dec 22, 2019

certik commented Dec 22, 2019 • edited Loading

certik Dec 22, 2019

Choose a reason for hiding this comment

certik Dec 22, 2019

Choose a reason for hiding this comment

zbeekman Dec 22, 2019

Choose a reason for hiding this comment

certik Dec 22, 2019

Choose a reason for hiding this comment

certik commented Dec 22, 2019 • edited Loading

certik commented Dec 22, 2019

certik commented Dec 22, 2019 • edited Loading

ivan-pi commented Dec 22, 2019

certik commented Dec 22, 2019

certik commented Dec 22, 2019

milancurcic left a comment

Choose a reason for hiding this comment

certik commented Dec 24, 2019 • edited Loading

dev-zero commented Apr 24, 2020

certik commented Apr 24, 2020

certik commented Dec 22, 2019 •

edited

Loading

certik commented Dec 22, 2019 •

edited

Loading

certik commented Dec 22, 2019 •

edited

Loading

certik commented Dec 24, 2019 •

edited

Loading