-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster to_lower/to_upper implementations #703
Comments
An FYI that I compared some case conversions and got unexpected speed The scores for each compiler are relative to the assign. The times are in seconds.
So this is your method except with some compilers I found wp=>int8 sped things up pure function upper7(str) Result(string)
use,intrinsic :: iso_fortran_env, only : wp=>int8
Character(*), Intent(In) :: str
Character(LEN(str)) :: string
integer(kind=wp), parameter :: ADE_A = iachar('a'), ADE_Z = iachar('z')
integer(kind=wp), parameter :: CASE_DIFF = iachar('a')-iachar('A')
Integer(kind=wp) :: ADE_char
Integer :: i
do i = 1, len(str)
ADE_char = iachar(str(i:i),wp) ! ASCII Decimal Equivalent
if (ADE_char >= ADE_A .and. ADE_char <= ADE_Z) ADE_char = ADE_char - CASE_DIFF
string(i:i) = achar(ADE_char)
enddo
end function upper7 The biggest surprise to me with variants of yours was that initially just doing string=str Another surprise was Number 8, which uses a select, was a top performer with gfortran, and pretty dismal with ifort. It would be interesting to see timing differences with crayfort, nagfor, nvfortran, ... |
This is very interesting! When I was working on fast string to double conversion I also found that using 1-byte integers gave me a boost in performance. Would you mind sharing your benchmark fpm project? I can test on windows with ifort19, ifort23, ifx23 and on Linux with ifort19 and gfortran. I'm trying to get nvfortran, not there yet. |
nothing fancy. Most interesting thing to me is difference in performance between compilers and compiler options for do concurrent, select, a basic assign, ADE integers versus CHARACTER, .... some compilers that are often the fastest for numeric calculations to very poorly with I/O, is one observation. So what might seem like a great algorithm developed with one compiler can be very bad with another. For whatever reasons the one you used does not seem to be horrible with anyone I tested with, and is either at the top or near it. wget http://urbanjost.altervista.org/REMOVE/time_case.tgz PS: not all the compilers actually do the DO CONCURRENT in parallel at all, or require special compiler options to do so. |
Here a full comparison using your benchmark with the compilers I have at hand (increased the loop to 10000000, otherwise some would come out with elapsed time = 0) |
Just looking at reasonable contenders using ifort, ifx, and gfortran with optimization on Linux upper7 (your case with int8 kind) was a clear winner for consistency and performance. Although several others outperformed it on compiler or in some cases (very long strings, string needing no change, string needing all characters changed) it is a clear winner with the ones tested. The only one substantially beating it is if the DO CONCURRENT one is run in parallel with strings much longer than typical text lines. Since upper7 is no worse than 60 percent of the benchmark speed ( a simple assign), and is the most consistent assuming something is desired that is portable across compilers it looks good. The rather interesting information is that what is a top performer on one compiler is many times slower with another with very short simple procedures, and that using ADE representations is usually much faster even all these years after CHARACTER variables were introduced, and some bugs encountered with arrays of function pointers across compilers that ended up making it just easier to do brute-force tests (a more sophisticated test exists but it encounters a lot of unrelated compiler issues). It would be of particular interest to compare the stdlib version (upper10) and your proposed replacement (upper7) and see how it goes with other compilers as where the stdlib version performed very significantly worse was with ifort, which is being supplanted by ifx. |
Got my hands on nvfortran23 in wsl following the install from https://developer.nvidia.com/hpc-sdk-downloads and reran the benchmark. Didn't check for any special compiler directive, just used -03 as for the others. The results are in the same google spreadsheet. Have to check what will happen if I offload the computation to the GPU... And I think the do concurrent implementation can be modified. Results change a bit from one run to another but indeed upper7 comes as quite a robust approach. With nvfortran added upper7 with do concurrent, results are slight better but close (I'm betting more on some randomness than an actual improvement) not sure this actually gives any advantage for such short test. |
The upper6-style tests and code itself indicate it is not particularly sensitive to input being One of the oddest things found in exercising the variants is finding how For instance using "gfortran-11 -O3 xx.f90" generates nearly identical speeds
But with one compiler testing with more complex cases shows a continuing
[details="Example reproducer code"] module M_upper
use, intrinsic :: iso_fortran_env, only: stdout => output_unit, stderr => error_unit, stdin => input_unit
use, intrinsic :: iso_fortran_env, only: int8, int16, int32, int64, real32, real64, real128
use, intrinsic :: iso_fortran_env, only: compiler_version, compiler_options
implicit none
private
public :: say_hello, random_string, upper1, upper2, upper3
contains
subroutine say_hello
print *, "Hello, time_case!"
print '(4a)', &
'This file was compiled by ', &
compiler_version(), &
' using the options ', &
compiler_options()
end subroutine say_hello
pure function upper1(str) result(string)
character(*), intent(in) :: str
character(len(str)) :: string
integer :: i
integer(kind=int8), parameter :: ade_a = iachar('a'), ade_z = iachar('z')
integer(kind=int8), parameter :: diff = iachar('A', kind=int8) - iachar('a', kind=int8)
integer(kind=int8) :: ade_char
do concurrent(i=1:len(str))
ade_char = iachar(str(i:i), int8)
if (ade_char >= ade_a .and. ade_char <= ade_z) ade_char = ade_char + diff
string(i:i) = achar(ade_char)
end do
if (len(str) .eq. 0) string = str
end function upper1
pure function upper2(str) result(string)
character(*), intent(in) :: str
character(len(str)) :: string
integer :: i
integer(kind=int8), parameter :: diff = iachar('A', kind=int8) - iachar('a', kind=int8)
do concurrent(i=1:len(str))
select case (str(i:i))
case ('a':'z'); string(i:i) = achar(iachar(str(i:i), kind=int8) + diff)
case default; string(i:i) = str(i:i)
end select
end do
if (len(str) .eq. 0) string = str
end function upper2
pure function upper3(str) result(string)
character(*), intent(in) :: str
character(len(str)) :: string
integer :: i
integer(kind=int8) :: ch
integer(kind=int8), parameter :: diff = iachar('A', kind=int8) - iachar('a', kind=int8)
integer(kind=int8), parameter :: ade_a = iachar('a'), ade_z = iachar('z')
do concurrent(i=1:len(str))
ch = iachar(str(i:i), kind=int8)
select case (ch)
case (ade_a:ade_z); string(i:i) = achar(ch + diff)
case default; string(i:i) = str(i:i)
end select
end do
if (len(str) .eq. 0) string = str
end function upper3
function random_string(chars, length) result(out)
character(len=*), intent(in) :: chars
integer, intent(in) :: length
character(len=:), allocatable :: out
real :: x
integer :: ilen
integer :: which
integer :: i
ilen = len(chars)
out = ''
if (ilen .gt. 0) then
do i = 1, length
call random_number(x)
which = nint(real(ilen - 1)*x) + 1
out = out//chars(which:which)
end do
end if
end function random_string
end module M_upper
program main
use, intrinsic :: iso_fortran_env, only: real64
use M_upper, only: upper1, upper2, upper3, random_string,say_hello
implicit none
!! define an abstract template defining the procedures
abstract interface
function func (str)
character(len=*), intent (in) :: str
character(len=len(str)) :: func
end function func
end interface
!! define a pointer of the abstract type
procedure (func), pointer :: f_ptr => null()
call say_hello()
f_ptr => upper1; call timeit('upper1')
f_ptr => upper2; call timeit('upper2')
f_ptr => upper3; call timeit('upper3')
contains
subroutine timeit(name)
character(len=*),intent(in) :: name
character(len=:),allocatable :: str, in
integer,parameter :: calls=1000000
integer :: i, j
real(kind=real64) :: time_start, time_finish
character(len=*),parameter :: gen='(*(g0,1x))'
! use some random calls to help prevent compiler from optimzing away loops
do j=1,3
call setin(j,in)
call cpu_time(time_start)
do i = 1, calls
str = f_ptr(in)
if(i.eq.calls/2) call setin(j,in)
end do
call cpu_time(time_finish)
print gen, name, time_finish - time_start, ' ', str
enddo
end subroutine timeit
subroutine setin(j,dat)
character(len=:),intent(out),allocatable :: dat
integer,intent(in) :: j
select case(j)
case(1) ;dat = random_string('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ', 70)
case(2) ;dat = random_string('abcdefghijklmnopqrstuvwxyz', 70)
case(3) ;dat = random_string('ABCDEFGHIJKLMNOPQRSTUVWXYZ', 70)
case default;dat = random_string('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ', 70)
end select
end subroutine setin
end program main [/details] |
Yes, this is not the first time I have been disappointed with select case. I do like it for high level branching, in some cases the code is easier to read. but for low level computations I avoid it at all costs. With a colleague we tested it once to replace a long list of IFs statements in an intensive computation and the results were just disappointing. So yes, I just use it for non critical code. I took your last test and tried out: <style> </style>
|
so nvfortran and gfortran give consistent speeds but the ifort/ifx can be a factor of 50x. That is bad enough for me to look at the generated code and bring it up on the Intel Fortran forum. I saw that in the past when SELECT CASE was a new feature for a lot of compilers but especially for such a simple select that is horrible. Thanks for the tests. The upper1 being the one you proposed it holds up once again as a good general choice. If anyone can the stdlib procedure versus the one proposed here it would be very informative for nagfor, crayftn, ... for stdlib in general it would be nice to have a cd/ci interface to macos, windows, openbsd, linux, ... and various compilers so if you just set up and pushed something set up to do a "fpm test --compiler $NAME --profile release it would run on a lot of platforms. I have a .github directory on https://github.com/urbanjost/easy that was my attempt at that that works with ifort on linux, gfortran on macos, gfortran on linux, and gfortran on several MSWindows configurations that can easily run different gfortran versions as well but I am not a CD/CI wiz on github; that should be able to do nvfortran too. It seems like there could be an empty github repository that if you pushed a mini fpm project at it it could run a bunch of environments just using "fpm test" so the scripts would not have to change. A simple case like this shows how huge the variation can be in the performance of the same code |
After looking at the implementation for converting strings between lower and upper case I found a method that might be faster than the one currently implemented in stdlib and that should return the same output:
I published the method here https://stackoverflow.com/questions/10759375/how-can-i-write-a-to-upper-or-to-lower-function-in-f90/75838039#75838039 (look at the last answer to_lower_2 and to_upper_2)
thinking about doing a PR to contribute with these implementations, any opinions?
The text was updated successfully, but these errors were encountered: