This repository contains Fortran routines aiming to reproduce the functionality found in <ctype.h>
header of the C standard library or std.ascii namespace from the D runtime library.
Currently, this is just my personal testing ground that should later become a pull request to the recent Fortran stdlib project (see also issue #11 and my previous pull requests #32, and #49).
TODO:
- Unicode support (or wide characters in general)
git clone https://github.com/ivan-pi/fortran-ascii
cd fortran-ascii
A Makefile is provided to build the following executables via make <target>
, where target is one of:
test_ascii
(default target): test routines for the ASCII functionsbenchmark_f90
: benchmarking the Fortran routinesbenchmark_cpp
: benchmark the C++ routinesgenerate_characters
: generates character strings used by the benchmarksgenerate_ascii_table
: generates the ASCII lookup table (not relevant anymore)print_table
: print the lookup table defined infortran_ascii_bits.f90
To set the compiler and flags use:
make FC=gfortran-9 FFLAGS="-O3 -march=native" benchmark_f90
One of the main goals of this repository is to test different implementations of the character routines (see sections below). This can be done very smoothly using submodules, which separate the function interfaces from the actual implementations.
The Makefile uses the SM
variable to link with a specific submodule:
make SM=bit benchmark_f90
Try switching between pure
, bit
, selectcase
, and cctype
, and comparing the performance.
A shell script is available to run benchmarks for all four implementations and a reference in C++:
./run_benchmarks.sh
The results are given as processed characters per second and are exported as txt files. If you switch the compiler run make clean
between trials.
All Fortran routines accept a default character(len=1)
dummy variable and return a logical value. This differs from the C routines which accept and return an integer value.
Purpose | Fortran | C/C++ | D | Python* |
---|---|---|---|---|
Checks if a character is alphabetic | is_alpha |
isalpha |
isAlpha |
isalpha |
Checks if a character is alphanumeric | is_alphanum |
isalnum |
isAlphaNum |
isalnum |
Checks if a character is in the ASCII character set | is_ascii |
/ | isASCII |
isascii |
Checks if a character is a control character | is_control |
iscntrl |
isControl |
|
Checks if a character is a digit | is_digit |
isdigit |
isDigit |
isdigit |
Checks if a character is a octal character | is_octal_digit |
/ | isOctalDigit |
|
Checks if a character is a hexadecimal character | is_hex_digit |
isxdigit |
isHexDigit |
|
Checks if a character is a punctuation character | is_punctuation |
ispunct |
isPunctuation |
|
Checks if a character is a graphical character | is_graphical |
isgraph |
isGraphical |
|
Checks if a character is a printing character | is_printable |
isprint |
isPrintable |
isprintable |
Checks if a character is an uppercase character | is_upper |
isupper |
isUpper |
isupper |
Checks if a character is lowercase | is_lower |
islower |
isLower |
islower |
Checks if a character is a whitespace character | is_white |
isspace |
isWhite |
isspace |
Checks if a character is a blank character (space or tab) | is_blank |
isblank |
/ |
These routines accept a character(len=1)
argument and return the same character with swapped case.
Purpose | Fortran | C/C++ | D | Python* |
---|---|---|---|---|
Converts a character to lowercase | to_lower |
tolower |
toLower |
|
Converts a character to uppercase | to_upper |
toupper |
toUpper |
In this approach we use comparison operators and the default character collating sequence (if applicable) to validate the characters.
Example for is_alphanum
:
pure logical function is_alphanum(c)
character(len=1), intent(in) :: c !! The character to test.
is_alphanum = (c >= '0' .and. c <= '9') .or. (c >= 'a' .and. c <= 'z') &
.or. (c >= 'A' .and. c <= 'Z')
end function
For characters with no guarantee on position in the collating sequence we use the iachar
intrinsic to check the position in the ASCII character set.
Example for is_control
:
pure logical function is_control(c)
character(len=1), intent(in) :: c !! The character to test.
integer :: ic
ic = iachar(c)
is_control = ic < z'20' .or. ic == z'7F'
end function
Instead of comparison operators, we can also use the select case construct and iachar
intrinsic to verify the character belongs to a specific set. This approach can be found in the Dan Nagle's character_functions
module.
Example for is_alphanum
:
pure logical function is_alphanum(c)
character(len=1), intent(in) :: c !! The character to test.
select case(iachar(c))
case (48:57,65:90,97:122) ! A .. Z, 0 .. 9, a .. z
is_alphanum = .true.
case default
is_alphanum = .false.
end select
end function
Example for is_control
:
pure logical function is_control(c)
character(len=1), intent(in) :: c !! The character to test.
select case(iachar(c))
case (0:31,127) ! contral chars, delete
is_control = .true.
case default
is_control = .false.
end select
end function
As noted on the C character classification Wikipedia page:
...the character classification routines are not written as comparison tests. In most C libraries, they are written as static table lookups instead of macros or functions.
This approach can be mimicked also in Fortran. Since there are 13 functions (excluding is_ascii
) we require an integer with a storage size of at least 13 bits to encode the various properties. (Perhaps even 11-bits suffice.) Here we simply use the int16
from the iso_fortran_env
module. On architectures with word sizes smaller than 8 bits, it might be benefical to use the result of selected_int_kind(4)
; this covers integers in the range from -10^4 to 10^4 (exclusive), thereby including the largest necessary mask 2^12 = 4096. First we create the bitmasks for the different character properties by left-shifting the value one:
integer(i16), parameter :: m_upper = shiftl(1_i16,0)
integer(i16), parameter :: m_lower = shiftl(1_i16,1)
integer(i16), parameter :: m_alpha = shiftl(1_i16,2)
integer(i16), parameter :: m_digit = shiftl(1_i16,3)
integer(i16), parameter :: m_hex_digit = shiftl(1_i16,4)
integer(i16), parameter :: m_octal_digit = shiftl(1_i16,5)
integer(i16), parameter :: m_space = shiftl(1_i16,6)
integer(i16), parameter :: m_printable = shiftl(1_i16,7)
integer(i16), parameter :: m_graphical = ior(ior(m_alpha,m_digit),shiftl(1_i16,10)) ! alnum|punct
integer(i16), parameter :: m_control = shiftl(1_i16,9)
integer(i16), parameter :: m_punctuation = shiftl(1_i16,10)
integer(i16), parameter :: m_alphanum = ior(m_alpha,m_digit)
integer(i16), parameter :: m_blank = shiftl(1_i16,12)
The mask for alphanumerical characters is built using the bitwise logical or of the digit and alphabetical masks. Similarly, the mask for graphical characters is built from the masks for alphabetical characters, digits, and punctuation.
Next, we populate a look-up table, assigning the ASCII characters their respective masks:
Click here to see the complete lookup table
integer(i16), parameter :: table(-128:127) = [ integer(i16) ::&
! The first 128 entries are all 0.
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,&
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,&
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,&
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,&
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,&
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,&
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,&
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,&
m_control,& ! null
m_control,& ! ^A
m_control,& ! ^B
m_control,& ! ^C
m_control,& ! ^D
m_control,& ! ^E
m_control,& ! ^F
m_control,& ! ^G
m_control,& ! ^H
ior(ior(m_space,m_control),m_blank),& ! tab
ior(m_space,m_control),& ! LF
ior(m_space,m_control),& ! ^K
ior(m_space,m_control),& ! FF
ior(m_space,m_control),& ! ^M
m_control,& ! ^N
m_control,& ! ^O
m_control,& ! ^P
m_control,& ! ^Q
m_control,& ! ^R
m_control,& ! ^S
m_control,& ! ^T
m_control,& ! ^U
m_control,& ! ^V
m_control,& ! ^W
m_control,& ! ^X
m_control,& ! ^Y
m_control,& ! ^Z
m_control,& ! esc
m_control,& ! ^\
m_control,& ! ^]
m_control,& ! ^^
m_control,& ! ^_
ior(ior(m_space,m_printable),m_blank),& ! space
ior(m_punctuation,m_printable),& ! !
ior(m_punctuation,m_printable),& ! "
ior(m_punctuation,m_printable),& ! #
ior(m_punctuation,m_printable),& ! $
ior(m_punctuation,m_printable),& ! %
ior(m_punctuation,m_printable),& ! &
ior(m_punctuation,m_printable),& ! '
ior(m_punctuation,m_printable),& ! (
ior(m_punctuation,m_printable),& ! )
ior(m_punctuation,m_printable),& ! *
ior(m_punctuation,m_printable),& ! +
ior(m_punctuation,m_printable),& ! ,
ior(m_punctuation,m_printable),& ! -
ior(m_punctuation,m_printable),& ! .
ior(m_punctuation,m_printable),& ! /
ior(ior(ior(m_digit,m_hex_digit),m_printable),m_octal_digit),& ! 0
ior(ior(ior(m_digit,m_hex_digit),m_printable),m_octal_digit),& ! 1
ior(ior(ior(m_digit,m_hex_digit),m_printable),m_octal_digit),& ! 2
ior(ior(ior(m_digit,m_hex_digit),m_printable),m_octal_digit),& ! 3
ior(ior(ior(m_digit,m_hex_digit),m_printable),m_octal_digit),& ! 4
ior(ior(ior(m_digit,m_hex_digit),m_printable),m_octal_digit),& ! 5
ior(ior(ior(m_digit,m_hex_digit),m_printable),m_octal_digit),& ! 6
ior(ior(ior(m_digit,m_hex_digit),m_printable),m_octal_digit),& ! 7
ior(ior(m_digit,m_hex_digit),m_printable),& ! 8
ior(ior(m_digit,m_hex_digit),m_printable),& ! 9
ior(m_punctuation,m_printable),& ! :
ior(m_punctuation,m_printable),& ! ;
ior(m_punctuation,m_printable),& ! <
ior(m_punctuation,m_printable),& ! =
ior(m_punctuation,m_printable),& ! >
ior(m_punctuation,m_printable),& ! ?
ior(m_punctuation,m_printable),& ! !
ior(ior(ior(m_alpha,m_upper),m_hex_digit),m_printable),& ! A
ior(ior(ior(m_alpha,m_upper),m_hex_digit),m_printable),& ! B
ior(ior(ior(m_alpha,m_upper),m_hex_digit),m_printable),& ! C
ior(ior(ior(m_alpha,m_upper),m_hex_digit),m_printable),& ! D
ior(ior(ior(m_alpha,m_upper),m_hex_digit),m_printable),& ! E
ior(ior(ior(m_alpha,m_upper),m_hex_digit),m_printable),& ! F
ior(ior(m_alpha,m_upper),m_printable),& ! G
ior(ior(m_alpha,m_upper),m_printable),& ! H
ior(ior(m_alpha,m_upper),m_printable),& ! I
ior(ior(m_alpha,m_upper),m_printable),& ! J
ior(ior(m_alpha,m_upper),m_printable),& ! K
ior(ior(m_alpha,m_upper),m_printable),& ! L
ior(ior(m_alpha,m_upper),m_printable),& ! M
ior(ior(m_alpha,m_upper),m_printable),& ! N
ior(ior(m_alpha,m_upper),m_printable),& ! O
ior(ior(m_alpha,m_upper),m_printable),& ! P
ior(ior(m_alpha,m_upper),m_printable),& ! Q
ior(ior(m_alpha,m_upper),m_printable),& ! R
ior(ior(m_alpha,m_upper),m_printable),& ! S
ior(ior(m_alpha,m_upper),m_printable),& ! T
ior(ior(m_alpha,m_upper),m_printable),& ! U
ior(ior(m_alpha,m_upper),m_printable),& ! V
ior(ior(m_alpha,m_upper),m_printable),& ! W
ior(ior(m_alpha,m_upper),m_printable),& ! X
ior(ior(m_alpha,m_upper),m_printable),& ! Y
ior(ior(m_alpha,m_upper),m_printable),& ! Z
ior(m_punctuation,m_printable),& ! [
ior(m_punctuation,m_printable),& ! \
ior(m_punctuation,m_printable),& ! ]
ior(m_punctuation,m_printable),& ! ^
ior(m_punctuation,m_printable),& ! _
ior(m_punctuation,m_printable),& ! `
ior(ior(ior(m_alpha,m_lower),m_hex_digit),m_printable),& ! a
ior(ior(ior(m_alpha,m_lower),m_hex_digit),m_printable),& ! b
ior(ior(ior(m_alpha,m_lower),m_hex_digit),m_printable),& ! c
ior(ior(ior(m_alpha,m_lower),m_hex_digit),m_printable),& ! d
ior(ior(ior(m_alpha,m_lower),m_hex_digit),m_printable),& ! e
ior(ior(ior(m_alpha,m_lower),m_hex_digit),m_printable),& ! f
ior(ior(m_alpha,m_lower),m_printable),& ! g
ior(ior(m_alpha,m_lower),m_printable),& ! h
ior(ior(m_alpha,m_lower),m_printable),& ! i
ior(ior(m_alpha,m_lower),m_printable),& ! j
ior(ior(m_alpha,m_lower),m_printable),& ! k
ior(ior(m_alpha,m_lower),m_printable),& ! l
ior(ior(m_alpha,m_lower),m_printable),& ! m
ior(ior(m_alpha,m_lower),m_printable),& ! n
ior(ior(m_alpha,m_lower),m_printable),& ! o
ior(ior(m_alpha,m_lower),m_printable),& ! p
ior(ior(m_alpha,m_lower),m_printable),& ! q
ior(ior(m_alpha,m_lower),m_printable),& ! r
ior(ior(m_alpha,m_lower),m_printable),& ! s
ior(ior(m_alpha,m_lower),m_printable),& ! t
ior(ior(m_alpha,m_lower),m_printable),& ! u
ior(ior(m_alpha,m_lower),m_printable),& ! v
ior(ior(m_alpha,m_lower),m_printable),& ! w
ior(ior(m_alpha,m_lower),m_printable),& ! x
ior(ior(m_alpha,m_lower),m_printable),& ! y
ior(ior(m_alpha,m_lower),m_printable),& ! x
ior(m_punctuation,m_printable),& ! {
ior(m_punctuation,m_printable),& ! |
ior(m_punctuation,m_printable),& ! }
ior(m_punctuation,m_printable),& ! ~
m_control] ! del (0x7f)
The reason why the first 128 characters are filled with zeros will be explained later. To make sure things worked correctly, we can write a short program to display the characters and their binary property masks:
do i = 0, 127
c = achar(i)
if (is_printable(c)) then
write(*,'(I3,3X,A,3X,I4,3X,B0.16)') i, c, table(i), table(i)
end if
end do
Click here to see the result.
32 4288 0001000011000000
33 ! 1152 0000010010000000
34 " 1152 0000010010000000
35 # 1152 0000010010000000
36 $ 1152 0000010010000000
37 % 1152 0000010010000000
38 & 1152 0000010010000000
39 ' 1152 0000010010000000
40 ( 1152 0000010010000000
41 ) 1152 0000010010000000
42 * 1152 0000010010000000
43 + 1152 0000010010000000
44 , 1152 0000010010000000
45 - 1152 0000010010000000
46 . 1152 0000010010000000
47 / 1152 0000010010000000
48 0 184 0000000010111000
49 1 184 0000000010111000
50 2 184 0000000010111000
51 3 184 0000000010111000
52 4 184 0000000010111000
53 5 184 0000000010111000
54 6 184 0000000010111000
55 7 184 0000000010111000
56 8 152 0000000010011000
57 9 152 0000000010011000
58 : 1152 0000010010000000
59 ; 1152 0000010010000000
60 < 1152 0000010010000000
61 = 1152 0000010010000000
62 > 1152 0000010010000000
63 ? 1152 0000010010000000
64 @ 1152 0000010010000000
65 A 149 0000000010010101
66 B 149 0000000010010101
67 C 149 0000000010010101
68 D 149 0000000010010101
69 E 149 0000000010010101
70 F 149 0000000010010101
71 G 133 0000000010000101
72 H 133 0000000010000101
73 I 133 0000000010000101
74 J 133 0000000010000101
75 K 133 0000000010000101
76 L 133 0000000010000101
77 M 133 0000000010000101
78 N 133 0000000010000101
79 O 133 0000000010000101
80 P 133 0000000010000101
81 Q 133 0000000010000101
82 R 133 0000000010000101
83 S 133 0000000010000101
84 T 133 0000000010000101
85 U 133 0000000010000101
86 V 133 0000000010000101
87 W 133 0000000010000101
88 X 133 0000000010000101
89 Y 133 0000000010000101
90 Z 133 0000000010000101
91 [ 1152 0000010010000000
92 \ 1152 0000010010000000
93 ] 1152 0000010010000000
94 ^ 1152 0000010010000000
95 _ 1152 0000010010000000
96 ` 1152 0000010010000000
97 a 150 0000000010010110
98 b 150 0000000010010110
99 c 150 0000000010010110
100 d 150 0000000010010110
101 e 150 0000000010010110
102 f 150 0000000010010110
103 g 134 0000000010000110
104 h 134 0000000010000110
105 i 134 0000000010000110
106 j 134 0000000010000110
107 k 134 0000000010000110
108 l 134 0000000010000110
109 m 134 0000000010000110
110 n 134 0000000010000110
111 o 134 0000000010000110
112 p 134 0000000010000110
113 q 134 0000000010000110
114 r 134 0000000010000110
115 s 134 0000000010000110
116 t 134 0000000010000110
117 u 134 0000000010000110
118 v 134 0000000010000110
119 w 134 0000000010000110
120 x 134 0000000010000110
121 y 134 0000000010000110
122 z 134 0000000010000110
123 { 1152 0000010010000000
124 | 1152 0000010010000000
125 } 1152 0000010010000000
126 ~ 1152 0000010010000000
We can easily see how characters with the same set of properties (alpha,digit,upper,lower,...) share the same bitmask.
For the actual validation functions we use the iachar
intrinsic to find the correct value in the lookup table and call the btest
intrinsic to verify if the bit encoding the target property is set.
Example for is_alpha
:
pure logical function is_alpha(c)
character(len=1), intent(in) :: c !! The character to test.
is_alpha = btest(table(iachar(c,i8)),2)
end function
A second approach is to apply a bitwise logical and and check the result is not zero. (See this blog post by Pierre de Buyl to brush up on Fortran bitmasks.)
Example for is_control
:
pure logical function is_control(c)
character(len=1), intent(in) :: c !! The character to test.
is_control = iand(table(iachar(c,i8)),m_control) /= 0
end function
For the functions is_graphical
and is_alphanum
(these properties are encoded via multiple bits) we can either follow the second approach or test multiple bits (e.g. for is_alphanum
we need to test the bits at positions 2 and 3 with a logical .or.
inbetween).
At the time of writing, I am still not sure how to deal correctly with any non-ASCII characters, which might still fit into the size of the character(len=1)
type (typically, this is one byte). The way this seems to be dealt with in the C standard library is by casting the character as an unsigned short
integer taking on 8-bits (values from 0 to 255) and using a lookup table with 256 values. Since Fortran does not provide unsigned integers, the approach I've adopted is to use custom array bounds from -128 to 127, which fit the range of values for the integer(int8)
type. Hence, the zeros in the first 128 table elements. To make this work as desired, we also need to provide int8
(from the iso_fortran_env
module) as the kind parameter in calls to the intrinsic iachar(c[,kind])
.
For compilers such as gfortran and ifort, the final executable is typically linked to the C standard library (see this discussion). This means we can easily borrow the functionality from C (apart from the function for octal digits, which is not part of the C standard library).
First, we create a module that defines the interfaces of the functions in <ctype.h>
:
module cctype
use iso_c_binding, only: c_int
implicit none
public
interface
pure integer(c_int) function isalnum(c) bind(c,name='isalnum')
import c_int
integer(c_int), intent(in), value :: c
end function isalnum
pure integer(c_int) function iscntrl(c) bind(c,name='iscntrl')
import c_int
integer(c_int), intent(in), value :: c
end function iscntrl
! ... remaining functions ...
end interface
end module
Then we use the iachar
intrinsic to convert Fortran characters to C integers and pass them to C routines. The return value of the C routines is an integer different from zero (i.e., true
) for characters which have the tested property, and zero (i.e. false
) otherwise.
Example for is_alphanum
:
pure logical function is_alphanum(c)
character(len=1), intent(in) :: c !! The character to test.
is_alphanum = isalnum(iachar(c,c_int)) /= 0
end function
Example for is_control
:
pure logical function is_control(c)
character(len=1), intent(in) :: c !! The character to test.
is_control = iscntrl(iachar(c,c_int)) /= 0
end function
The results shown below are from a ThinkPad T530 with an Intel(R) Core(TM) i5-3320M 2.6 GHz CPU and 16GB of RAM (DDR3 SODIMM). The operating system is Ubuntu 16.04 64-bit.
The benchmark drivers (for both gfortran and g++) were compiled with -O3 -mtune=native
. The first column is the number of characters tested. The remaining columns give the number of processed characters per second. Each row is the average of 10 loop iterations.
# is_control is_printable is_white is_blank is_graphical is_punctuation is_alphanum is_alpha is_upper is_lower is_digit is_hex_digit
1000 0.295902E+09 0.345805E+09 0.107078E+09 0.139196E+09 0.346957E+09 0.160805E+09 0.154152E+09 0.297965E+09 0.347174E+09 0.347379E+09 0.297876E+09 0.211689E+09
10000 0.299134E+09 0.341720E+09 0.135319E+09 0.136704E+09 0.348956E+09 0.134952E+09 0.114378E+09 0.299184E+09 0.342490E+09 0.349018E+09 0.269007E+09 0.208021E+09
100000 0.426288E+09 0.494526E+09 0.197466E+09 0.205922E+09 0.477580E+09 0.199789E+09 0.161736E+09 0.441021E+09 0.491093E+09 0.477734E+09 0.435060E+09 0.264156E+09
1000000 0.432747E+09 0.516781E+09 0.206718E+09 0.203574E+09 0.510438E+09 0.202444E+09 0.169083E+09 0.453219E+09 0.516010E+09 0.529410E+09 0.444434E+09 0.256150E+09
10000000 0.443961E+09 0.521427E+09 0.188798E+09 0.200465E+09 0.520107E+09 0.202299E+09 0.168897E+09 0.445520E+09 0.520562E+09 0.518847E+09 0.447540E+09 0.255309E+09
100000000 0.444802E+09 0.521159E+09 0.205292E+09 0.203831E+09 0.521019E+09 0.201788E+09 0.167931E+09 0.445627E+09 0.519019E+09 0.521419E+09 0.445376E+09 0.255544E+09
# is_control is_printable is_white is_blank is_graphical is_punctuation is_alphanum is_alpha is_upper is_lower is_digit is_hex_digit
1000 0.382307E+09 0.437963E+09 0.342021E+09 0.438020E+09 0.439967E+09 0.176053E+09 0.234456E+09 0.247788E+09 0.439734E+09 0.439967E+09 0.439657E+09 0.225154E+09
10000 0.386439E+09 0.441638E+09 0.343540E+09 0.441647E+09 0.441677E+09 0.155772E+09 0.160909E+09 0.217044E+09 0.470183E+09 0.470172E+09 0.470174E+09 0.193561E+09
100000 0.399968E+09 0.450582E+09 0.334573E+09 0.394983E+09 0.461833E+09 0.152483E+09 0.143694E+09 0.203813E+09 0.438749E+09 0.400575E+09 0.437472E+09 0.188699E+09
1000000 0.390082E+09 0.445382E+09 0.352295E+09 0.436648E+09 0.450134E+09 0.162312E+09 0.144433E+09 0.189585E+09 0.450995E+09 0.451086E+09 0.451696E+09 0.187176E+09
10000000 0.390598E+09 0.449703E+09 0.334224E+09 0.440343E+09 0.444530E+09 0.160974E+09 0.144977E+09 0.208038E+09 0.451376E+09 0.449494E+09 0.448684E+09 0.185860E+09
100000000 0.394643E+09 0.447510E+09 0.349625E+09 0.447777E+09 0.448016E+09 0.161371E+09 0.144631E+09 0.206839E+09 0.448119E+09 0.444974E+09 0.448962E+09 0.186707E+09
# is_control is_printable is_white is_blank is_graphical is_punctuation is_alphanum is_alpha is_upper is_lower is_digit is_hex_digit
1000 0.159434E+09 0.226357E+09 0.226413E+09 0.227082E+09 0.227123E+09 0.226963E+09 0.198602E+09 0.198791E+09 0.226968E+09 0.227082E+09 0.226989E+09 0.227087E+09
10000 0.216196E+09 0.224825E+09 0.227966E+09 0.227971E+09 0.222979E+09 0.197703E+09 0.207068E+09 0.191618E+09 0.234079E+09 0.210028E+09 0.227929E+09 0.223578E+09
100000 0.227998E+09 0.333144E+09 0.397760E+09 0.420600E+09 0.428791E+09 0.440839E+09 0.381704E+09 0.392492E+09 0.445905E+09 0.437860E+09 0.454628E+09 0.430326E+09
1000000 0.445413E+09 0.444731E+09 0.450100E+09 0.434676E+09 0.445038E+09 0.446458E+09 0.389616E+09 0.383579E+09 0.445309E+09 0.450493E+09 0.433688E+09 0.448359E+09
10000000 0.447539E+09 0.450530E+09 0.447429E+09 0.433857E+09 0.436260E+09 0.443248E+09 0.382014E+09 0.390311E+09 0.447822E+09 0.445711E+09 0.445183E+09 0.446293E+09
100000000 0.446386E+09 0.447392E+09 0.447972E+09 0.446720E+09 0.446444E+09 0.447610E+09 0.390032E+09 0.391572E+09 0.446561E+09 0.446939E+09 0.449459E+09 0.447393E+09
# is_control is_printable is_white is_blank is_graphical is_punctuation is_alphanum is_alpha is_upper is_lower is_digit is_hex_digit
1000 0.235388E+09 0.256089E+09 0.256680E+09 0.205491E+09 0.256812E+09 0.256746E+09 0.256239E+09 0.256779E+09 0.256779E+09 0.256792E+09 0.237113E+09 0.256739E+09
10000 0.237803E+09 0.257646E+09 0.257664E+09 0.206148E+09 0.252265E+09 0.251348E+09 0.274277E+09 0.274288E+09 0.274290E+09 0.274284E+09 0.245206E+09 0.274271E+09
100000 0.243029E+09 0.254922E+09 0.263502E+09 0.212183E+09 0.223243E+09 0.255979E+09 0.250611E+09 0.247153E+09 0.259765E+09 0.247556E+09 0.244033E+09 0.268073E+09
1000000 0.231861E+09 0.256933E+09 0.260220E+09 0.208580E+09 0.257341E+09 0.258544E+09 0.262324E+09 0.257480E+09 0.257221E+09 0.260155E+09 0.240084E+09 0.259270E+09
10000000 0.234693E+09 0.236757E+09 0.252244E+09 0.204679E+09 0.254001E+09 0.252421E+09 0.258623E+09 0.256673E+09 0.257663E+09 0.254969E+09 0.229814E+09 0.257418E+09
100000000 0.236562E+09 0.250049E+09 0.253881E+09 0.206114E+09 0.254522E+09 0.255598E+09 0.257089E+09 0.256059E+09 0.254120E+09 0.254236E+09 0.234923E+09 0.255699E+09
Order of the functions is the same as in the Fortran benchmark.
1000 3.225806e+08 3.333333e+08 3.333333e+08 3.333333e+08 3.333333e+08 3.448276e+08 3.333333e+08 3.448276e+08 3.333333e+08 3.448276e+08 3.333333e+08 3.448276e+08
10000 3.484321e+08 3.012048e+08 3.236246e+08 2.421308e+08 2.570694e+08 3.333333e+08 3.472222e+08 2.739726e+08 2.673797e+08 2.659574e+08 2.666667e+08 2.890173e+08
100000 3.177629e+08 3.374958e+08 3.475843e+08 4.500450e+08 5.470460e+08 5.293806e+08 5.232862e+08 5.382131e+08 5.387931e+08 5.370569e+08 5.485464e+08 4.803074e+08
1000000 5.211591e+08 5.010773e+08 5.291845e+08 4.804920e+08 5.140860e+08 5.289046e+08 4.984299e+08 5.322263e+08 5.224115e+08 4.893565e+08 5.028916e+08 4.997751e+08
10000000 5.109314e+08 5.150391e+08 5.115875e+08 5.165743e+08 4.614355e+08 5.073206e+08 5.086082e+08 5.132680e+08 5.149437e+08 5.151718e+08 5.159000e+08 5.156792e+08
100000000 5.122814e+08 5.084099e+08 5.114697e+08 5.138601e+08 5.061753e+08 5.119169e+08 5.123977e+08 5.072774e+08 5.130184e+08 5.082052e+08 5.099015e+08 5.066947e+08
We need atleast 1000000 characters to reach full speed, otherwise the timings are influenced by function call overheads...
Also note the large differences between various functions. The is_punctuation
and is_alphanum
functions contain the most complicated if logical sentences.
On the other hand, the results in C++ show fairly equal behavior across all functions, hinting at the implementation based on a lookup table.
Based on the measurements above we use the results for 10000000 characters to compare the different implementation approaches.
There is no clear cut winner. The lookup table based approach shows consistent timings across all functions, but 10 to 20 % slower than C++. The "direct" approach (the one currently used in Fortran stdlib) is able to compete with C++ on the simpler routines is_printable
, is_graphical
, is_upper
, and is_lower
, for which the characters form single blocks in the ASCII character set. The select case approach delivers performance similar to the lookup table on some functions, but performs poorly for the functions where the characters are split in multiple blocks in the ASCII character set. Last, the routines interfacing with the C library show consistent behavior across functions (again, lookup table?), but the performance appears to be halved vs C++ (overhead from dereferencing two function calls?).