addition of stdlib_experimental_stats mean #124

jvdp1 · 2020-01-23T23:24:33Z

follow-up of #119 and after merging #123 (See @certik comments).

In addition to #119, CMakefiles and Makefiles.manual were modified to support fypp. Therefore, the auto-generated .f90 files are not present in this PR.

Issue: I could not modify the CI for Windows properly. I would need help there.

certik · 2020-01-23T23:48:00Z

This makes it pass on Windows: jvdp1#5

certik · 2020-01-23T23:51:18Z

Now there is another problem. The CI suggests the file stdlib_experimental_stats_mean.f90 takes over 4.5 minutes to compile (https://github.com/jvdp1/stdlib/pull/5/checks?check_run_id=406216313):

Thu, 23 Jan 2020 23:43:13 GMT [ 28%] Building Fortran object src/CMakeFiles/fortran_stdlib.dir/stdlib_experimental_stats_mean.f90.o
Thu, 23 Jan 2020 23:47:30 GMT [ 31%] Linking Fortran static library libfortran_stdlib.a

Which in my opinion is a big problem. Although perhaps this is only in Release mode.

I tested on my desktop in Debug mode, and things take under 2s to compile:

$ time make -j
[  2%] Generating stdlib_experimental_stats_mean.f90
[  5%] Generating stdlib_experimental_stats.f90
Scanning dependencies of target fortran_stdlib
[  8%] Building Fortran object src/CMakeFiles/fortran_stdlib.dir/stdlib_experimental_error.f90.o
[ 11%] Building Fortran object src/CMakeFiles/fortran_stdlib.dir/stdlib_experimental_ascii.f90.o
...
[100%] Built target test_optval
[100%] Built target test_ascii
[100%] Built target test_mean

real	0m1.862s
user	0m3.014s
sys	0m0.644s

In Release mode it took 5.6s. So I think that's fine.

Update: I use GFortran 7.4.0. At the CI, this GFortran 7 on Linux takes 10s to compile. GFortran 8 takes 1m 26s and GFortran 9 takes 4m 25s. The timing for GFortran 7 is ok, but versions 8 and 9 are not acceptable in my opinion. But I don't know if this is a problem with GFortran, or with our code.

@zbeekman what do you think of these timings? Something is wrong, every new major version of GFortran got much slower than the previous one.

jvdp1 · 2020-01-24T07:20:17Z

@certik : Here are the times on my desktop:
When the files includes arrays with up to 15 dimensions:

$ gfortran --version
GNU Fortran (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1)
...
$ time make -j
[  5%] Generating stdlib_experimental_stats_mean.f90
[  5%] Generating stdlib_experimental_stats.f90
.....
[ 97%] Linking Fortran executable test_mean
[100%] Linking Fortran executable test_mean_f03
[100%] Built target test_mean
[100%] Built target test_mean_f03

real	0m21,621s
user	0m23,165s
sys	0m1,692s

When all loops for real arrays are replaced by sum(x) and sum(x, dim):

$ time make -j
[  5%] Generating stdlib_experimental_stats.f90
...
[100%] Built target test_mean_f03

real	0m16,390s
user	0m18,122s
sys	0m1,502s

See Actions of branch _stat_mean_dev_2 for details with this branch. The tests with GFortran 9 took still >3 minutes.

When the files includes arrays with up to 7 dimensions (with the same compiler):

$ time make -j
[  5%] Generating stdlib_experimental_stats_mean.f90
[  5%] Generating stdlib_experimental_stats.f90
....
[ 97%] Linking Fortran executable test_ascii
[100%] Linking Fortran executable test_mean
[100%] Built target test_ascii
[100%] Built target test_mean

real	0m3,312s
user	0m4,973s
sys	0m1,250s

So,

With CMake, I am far from the times we saw on Github. (Note: when using the Makefile.manual, GFortran 9 took over 2 minutes to compile the version with up to 15 dimensions)
Replacing all loops for real arrays does not help much.
The difference between GFortran 7 and Gfortran (8 or 9) can partially be explained by the fact that GFortran supports only arrays up to 7 dimensions, while GFortran 8 and 9 support arrays with dimentions up to 15.
It is strange that GFortran 9 requires >2 times the time of GFortran 8 since they are based on the same code.

jvdp1 · 2020-01-24T10:46:17Z

@certik @zbeekman

In the branch stat_mean_dev_3, I replaced all loops by using sum(x), or sum(real(x,dp)). Suprisingly to me, sum(real(x,dp)) does not create a double-precision temporary array of the same dimensions as x. It was the main reason I used loops instead of sum.

Since this reason is not valid (at least with GFortran 9; please explain me why ;)), I propose to merge this branch "stat_mean_dev_3" into this PR.

Regarding compile times of stat_mean_dev_3 on my desktop (GFortran 9):

$ time make -j
[  5%] Generating stdlib_experimental_stats_mean.f90
[  5%] Generating stdlib_experimental_stats.f90
...
[100%] Built target test_mean_f03

real	0m10,240s
user	0m11,773s
sys	0m1,729s

So there is a speedup of 2 (on my desktop) in comparison to this PR that includes all loops for real and integer arrays.

From Github Actions, Build and compile times are 6s for GFortran7, 1m 15s for GFortran 8, and 1m 51s for GFortran 9. These times are better than the initial 4 minutes, but I can't still understand why GFortran 8 and 9 need more time on CI than on my own desktop.

certik · 2020-01-24T21:21:53Z

The other concerning thing is if a single function such as mean adds 4min to our build, then once we have say 50 such functions, we are talking of a build time over 3h. That does not scale well.

As a practical work around, we can have a CMake option for development, which would restrict dimensions to 3. That would allow us to develop with a reasonable compile time.

@milancurcic what do you think we should do here?

milancurcic · 2020-01-24T21:29:28Z

More generally, can we have a build parameter to set the maximum rank to build? This would have a low default values for debug and release modes, and user could override it manually if they needed higher ranks? Something along the lines of:

cmake .. -DCMAKE_MAXIMUM_RANK=8 # overrides default maximum rank

jvdp1 · 2020-01-24T21:31:40Z

As a practical work around, we can have a CMake option for development, which would restrict dimensions to 3.

For GFortran 7, the number of dimensions restricticted to 7, and the compile time is 6-7 seconds on Github Actions. Would 7 dimensions be an option for you @certik?
Or is it possible that CMake recognizes that the tests are running on Github Actions, and that it limits the number of dimensions to 4? (Since it is a preprocessed loop starting at 3, going to 4 would be good).
When a user would compile the library, CMake would see that it is not a Github Action, and will compile stdlib accordingly to its environment (i.e., up to 7 or 15 dimensions; which is quite quick on my desktop).

certik · 2020-01-24T21:33:41Z

Yes, I am fine with any number of dimensions (including 7) as long as it is fast. If 7 can be made fast, then let's do it. Otherwise let's do less.

jvdp1 · 2020-01-24T23:26:40Z

@certik @milancurcic : I implemented the proposed idea. So, for the CI, the maximum dimension for auto-generation of the .f90 files is limited to 5 (by using cmake -DCMAKE_MAXIMUM_RANK=5). If the flag CMAKE_MAXIMUM_RANK is not defined, then CMake will test if the compiler supports 7 or 15 dimensions.

Issue: Something seems broken for CI on Linux when installing GFortran:

E: Failed to fetch https://packages.microsoft.com/repos/microsoft-ubuntu-bionic-prod/dists/bionic/main/binary-amd64/Packages.bz2 File has unexpected size (89974 != 89668). Mirror sync in progress? [IP: 40.76.35.62 443]
Hashes of expected file:
- Filesize:89668 [weak]
- SHA512:239b3775157309c1f1ff14624d2d0044c5a0c57f6bf9a431f628de1c9f19c91e107c4bc78e1d12bce44121e5866fdf67328e6482151b6edd3456f8c541dc5739
- SHA256:d737bf9c5418556ef337f43f0124a3db6e79c9f7209deb140ea87644169168b1
- SHA1:de9e09d12abdb967a62954611af31655b3a8b5d9 [weak]
- MD5Sum:9e9321f39e46dccc97d928ec9bffc87a [weak]
Release file created at: Thu, 23 Jan 2020 17:58:24 +0000
E: Some index files failed to download. They have been ignored, or old ones used instead.
##[error]Process completed with exit code 100.

Any idea how to solve this?

certik · 2020-01-24T23:43:01Z

Thanks for implementing it. Yes, the Linux image sometimes has this problem and last time I think it took about a day to resolve. I am hoping it will be faster this time.

jvdp1 · 2020-01-24T23:55:36Z

Thanks for implementing it. Yes, the Linux image sometimes has this problem and last time I think it took about a day to resolve. I am hoping it will be faster this time.

OK. We will see when it will resolve. At least for macos and GFortran 9 the compile time reduced from >3 minutes to 7s. I guess we can expect similar times for Linux.

jvdp1 · 2020-01-25T09:33:12Z

I limited the number of dimensions to 4 (with cmake .. -DCMAKE_MAXIMUM_RANK=4). All cheks passed, and compile times are <10s.
I think that all comments in #119 and in this PR were considered. Any other comments after an additional review?

ivan-pi · 2020-01-25T12:12:10Z

src/stdlib_experimental_stats_mean.fypp

+     case default
+       call error_stop("ERROR (mean): wrong dimension")
+    end select


A minor nitpick related to the discussion in #76:

the error message would be more informative if it were to say something like "wrong dimension, value of the dummy variable dim=dim exceeds rank of the dummy array x, rank(x)=rank(x)", where the italics were replaced with the run-time values

We need to wait for the numeric to string conversion functions to achieve this easily.

I agree with you. Ideally it should be check by the compiler (like for sum and other intrinsic functions).
I will keep it in mind for a next improvement. It could be also related to #121 .

certik

This looks good to me to merge. It compiles quickly with the proper cmake options, all CI tests pass and the code is clean enough. Further improvements can be made with subsequent PRs. So +1 to merge as is, as far as I am concerned.

milancurcic

Thanks @jvdp1!

addition of stdlib_experimental_stats mean

7d9b916

jvdp1 requested review from certik, milancurcic and ivan-pi January 23, 2020 23:24

jvdp1 mentioned this pull request Jan 23, 2020

Addition of stdlib_experimental_stats function mean #119

Closed

certik and others added 3 commits January 24, 2020 00:56

Install fypp on Windows (#5)

c7a6070

stat_mean_dev_2: replacing loops by sum for real arrays

e586bff

Merge branch 'stat_mean_dev_1' into stat_mean_dev_2

4005374

jvdp1 added 2 commits January 24, 2020 09:23

stat_mean_dev_3: replace all loops by the function sum

8d9e475

stat_mean_dev_3: add kind=int64 in function size

dedc284

jvdp1 added 2 commits January 25, 2020 00:13

add CMAKE_MAXIMUM_RANK

26f23a5

Removed loops and add CMAKE_MAXIMUM_RANK

bf41273

stat_mean_dev_1: limited to 4 dimensions

c8bf3c1

ivan-pi reviewed Jan 25, 2020

View reviewed changes

certik approved these changes Jan 25, 2020

View reviewed changes

milancurcic approved these changes Jan 27, 2020

View reviewed changes

milancurcic merged commit 078b400 into fortran-lang:master Jan 27, 2020

jvdp1 deleted the stat_mean_dev_1 branch January 27, 2020 17:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

addition of stdlib_experimental_stats mean #124

addition of stdlib_experimental_stats mean #124

jvdp1 commented Jan 23, 2020 •

edited

Loading

certik commented Jan 23, 2020

certik commented Jan 23, 2020 •

edited

Loading

jvdp1 commented Jan 24, 2020 •

edited

Loading

jvdp1 commented Jan 24, 2020 •

edited

Loading

certik commented Jan 24, 2020

milancurcic commented Jan 24, 2020

jvdp1 commented Jan 24, 2020

certik commented Jan 24, 2020

jvdp1 commented Jan 24, 2020

certik commented Jan 24, 2020

jvdp1 commented Jan 24, 2020

jvdp1 commented Jan 25, 2020

ivan-pi Jan 25, 2020 •

edited

Loading

jvdp1 Jan 25, 2020 •

edited

Loading

certik left a comment

milancurcic left a comment

addition of stdlib_experimental_stats mean #124

addition of stdlib_experimental_stats mean #124

Conversation

jvdp1 commented Jan 23, 2020 • edited Loading

certik commented Jan 23, 2020

certik commented Jan 23, 2020 • edited Loading

jvdp1 commented Jan 24, 2020 • edited Loading

jvdp1 commented Jan 24, 2020 • edited Loading

certik commented Jan 24, 2020

milancurcic commented Jan 24, 2020

jvdp1 commented Jan 24, 2020

certik commented Jan 24, 2020

jvdp1 commented Jan 24, 2020

certik commented Jan 24, 2020

jvdp1 commented Jan 24, 2020

jvdp1 commented Jan 25, 2020

ivan-pi Jan 25, 2020 • edited Loading

Choose a reason for hiding this comment

jvdp1 Jan 25, 2020 • edited Loading

Choose a reason for hiding this comment

certik left a comment

Choose a reason for hiding this comment

milancurcic left a comment

Choose a reason for hiding this comment

jvdp1 commented Jan 23, 2020 •

edited

Loading

certik commented Jan 23, 2020 •

edited

Loading

jvdp1 commented Jan 24, 2020 •

edited

Loading

jvdp1 commented Jan 24, 2020 •

edited

Loading

ivan-pi Jan 25, 2020 •

edited

Loading

jvdp1 Jan 25, 2020 •

edited

Loading