[STDLIB_STATS] need to upgrade `stdlib_stats` codes about compilation efficiency #438

zoziha · 2021-06-16T10:59:22Z

Overview: Compilation time is too long.

When compiling, I found that compiling stdlib_stats uses a lot of computer resources, especially RAM, which is related to the high-dimensional matrix dimensions defined in stdlib_stats, which greatly reduces the efficiency of stdlib and improves the overall compilation time of stdlib.

It took my computer (CPU: intel i5 8250U) more than two hours to compile stdlib completely,

When RANK=15, the compiled volume of stdlib reached 747MB.

I took a quick look at the source code and thought that there might be a better way to replace the polymorphic interface with such a large number of multi-dimensional array arguments.
(see high-dimensional matrix dimensions)
(see RANK)

My understanding is: Rethink, need to be more flexible.

The length within a single dimension defined by Fortran can theoretically be infinitely expanded, but the number of dimensions needs to be manually defined by the user.
In the future, we will also build a large number of functions that use matrices. The current implementation of stdlib_stats is unreasonable, not adaptable and needs to be improved, (see stdlib_stats_moment.fypp).

stdlib_stats presets several basic dimensions to form a polymorphic interface, and sets multiple judgments (see condition judgments) on the number of processing dimensions, resulting in a decrease in compilation speed and an increase in compilation load.

#281
#283

My solution is: Set up a matrix parser, or use a single-dimensional matrix algorithm.

If it is not for the communication within the different dimensions, we can achieve the effect by only setting the one-dimensional column vector, and hand the specific dimensional operation to the user to improve the versatility and flexibility of stdlib.

Or we use the wiki solution in stdlib to set up a matrix parser and transform it when necessary to meet the polymorphic needs of multi-dimensional arrays.

I have seen another library, and its solution is also good: muesli!

I don't know much more about stdlib_stats, so there may be limitations of my idea. However, I think the multi-dimensional array polymorphic interface in stdlib_stats needs to be improved.
Hope to get the discussion, thank you all! 😍

The text was updated successfully, but these errors were encountered:

jvdp1 · 2021-06-16T16:07:48Z

Thank you for trying stdlib. This issue was often raised in the past and is mentined in the README (see the section Build wth CMake. A solution is to limit the number of ranks to e.g., 7.
I recognize that it could be more highlited in the README.

The aim of stdlib_stats is to provide procedures related to descriptive statistics for arrays (e.g. for computing means, variances, std, momemts of elements of arrays), similar to what is available in Matlab, Julia,....

The API of the functions in stdlib_stats are the same (or at least really similar) to the one of the intrinsic, e.g., sum. As such the procedures in stdlib_stats could be considered as extensions of the intrinsic sum IMO.

Due to the "complexitiy" of the API of sum (i.e., it supports arrays (from rank 1 to 15) of all types of integer, real, and complex, an argument dim, and a (scalar or array) mask), and due to the lack of generics in Fortran, the number of functions generated for a single generic procedure (e.g., mean) is quite huge. fypp was quite helpful for generating all the needed functions.

I don't think that a solution like the one proposed by muesli would be approriate for this, because the aim was to provide procedures for Fortran arrays (but I may be wrong; at least it is how I find stdlib_stats useful for my daily work), and not for a derived type, e.g., provided by stdlib`. I am not sure to understand the 2 other solutions.

Anyway, I agree that compilation of this part can be an issue, that could increase later with inroduction of new functions in stdlib_stats and with a similar API to mean (I have at least 3 more in mind).

zoziha · 2021-06-16T16:29:35Z

Thanks, I understand. I don't know much about c/c++,and is it possible to implement generics through the interfaces of these languages and Fortran? As far as I know, Fortran comes with functions such as real(integer, kind) whose parameters are all integers, but they can return different float precisions, which is difficult to achieve in Fortran's existing syntax.
Is it possible for Fortran to perfect this generic programming in some form in the future?😁

epagone · 2021-06-16T16:49:05Z

Hi @zoziha, concerning your last question, you might want to have a look here 😉

ghost · 2021-06-21T01:02:23Z

I think -DCMAKE_MAXIMUM_RANK should be 4 by default, that means stdlib will work by default for almost everybody. This is specially useful for new users who are not familiar with stdlib.

jvdp1 · 2021-06-21T19:50:05Z

I think -DCMAKE_MAXIMUM_RANK should be 4 by default, that means stdlib will work by default for almost everybody. This is specially useful for new users who are not familiar with stdlib.

This is indeed a good idea. I am for it. @awvwgk @milancurcic @ivan-pi what is your opinion about making -DCMAKE_MAXIMUM_RANK=4 as default value?

awvwgk · 2021-06-21T20:14:24Z

I think, just because we can, doesn't mean we have to compile with full rank support, especially the stats modules get quite compilation intensive for no good reason. I'm usually compiling with 4 anyway, sometimes with 7 if I make system-wide installations, but I have yet to exceeded rank 4 in any actual application of stdlib. The CMake template for stdlib also reduces the max rank by default. A max rank of 4 sounds like more sensible default.

I'm still looking forward to package stdlib, once we start putting a version on it, where a higher maximum rank than 4 might be much more relevant, because the end-user can't recompile if they depend on a binary distribution.

awvwgk · 2021-09-18T09:27:44Z

Resolved by changing the default maximum rank in the CMake build files

zoziha added the bug Something isn't working label Jun 16, 2021

zoziha changed the title ~~[STDLIB_STATS] need to upgrade stdlib_stats codes~~ [STDLIB_STATS] need to upgrade stdlib_stats codes about compilation efficiency Jun 16, 2021

ivan-pi mentioned this issue Jun 21, 2021

[FPM] add fpm support #437

Closed

zoziha mentioned this issue Jun 23, 2021

First implementation of real-valued linspace. #420

Merged

awvwgk added build: cmake Issue with stdlib's CMake build files documentation Improvements or additions to documentation labels Sep 18, 2021

awvwgk closed this as completed Sep 18, 2021

zoziha mentioned this issue Oct 1, 2021

[stdlib_math] add is_close routines. #488

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[STDLIB_STATS] need to upgrade `stdlib_stats` codes about compilation efficiency #438

[STDLIB_STATS] need to upgrade `stdlib_stats` codes about compilation efficiency #438

zoziha commented Jun 16, 2021 •

edited

Loading

jvdp1 commented Jun 16, 2021

zoziha commented Jun 16, 2021 •

edited

Loading

epagone commented Jun 16, 2021

ghost commented Jun 21, 2021

jvdp1 commented Jun 21, 2021

awvwgk commented Jun 21, 2021

awvwgk commented Sep 18, 2021

[STDLIB_STATS] need to upgrade stdlib_stats codes about compilation efficiency #438

[STDLIB_STATS] need to upgrade stdlib_stats codes about compilation efficiency #438

Comments

zoziha commented Jun 16, 2021 • edited Loading

Overview: Compilation time is too long.

My understanding is: Rethink, need to be more flexible.

My solution is: Set up a matrix parser, or use a single-dimensional matrix algorithm.

jvdp1 commented Jun 16, 2021

zoziha commented Jun 16, 2021 • edited Loading

epagone commented Jun 16, 2021

ghost commented Jun 21, 2021

jvdp1 commented Jun 21, 2021

awvwgk commented Jun 21, 2021

awvwgk commented Sep 18, 2021

[STDLIB_STATS] need to upgrade `stdlib_stats` codes about compilation efficiency #438

[STDLIB_STATS] need to upgrade `stdlib_stats` codes about compilation efficiency #438

zoziha commented Jun 16, 2021 •

edited

Loading

zoziha commented Jun 16, 2021 •

edited

Loading