bitset count performance #724

PierUgit · 2023-07-01T13:43:44Z

In the bitsets module, the bit_count_large() function is scanning bit by bit with btest() in the main loop:

        bit_count = 0
        do block_ = 1_bits_kind, size(self % blocks, kind=bits_kind) - 1
            do pos = 0, block_size-1
                if ( btest( self % blocks(block_), pos ) ) &
                    bit_count = bit_count + 1
            end do
        end do

Using chunks of blocks and the fact that btest() is elemental, one can get significant performance improvement (on a similar implementation I obtain a 5x speed-up witch a chunk size of 1024):

        bit_count = 0
        do block_ = 1_bits_kind, size(self % blocks, kind=bits_kind) - 1, chunk
            block__ = min(block_+chunk-1,size(self % blocks, kind=bits_kind))
            do pos = 0, block_size-1
                bit_count = bit_count + count( btest(self % blocks(block_:block__), pos ) )
            end do
        end do

(edit) of course, this makes sense only for a large number of bits

The text was updated successfully, but these errors were encountered:

Euler-37 · 2023-07-06T09:09:51Z

fortran intrinsic function popcnt looks better. https://gcc.gnu.org/onlinedocs/gfortran/POPCNT.html

PierUgit · 2023-07-06T10:14:42Z

Looks like, but then why is it not used in bit_count_large()?

jvdp1 · 2023-12-25T22:38:16Z

If I remember well, bitset_large was developed as an externsion of bitset_64. For bitset_64, popcnt cannot be used because some bits may not be used. The solution with btest() is therefore legitimate. The extension of bit_count for bitset_large from bitset_64 was straighfoward.

However, I agree that @PierUgit 's solution or using popcnt in this loop would be a more efficient implementation.

@PierUgit , do you still have such an implementation (i recognize that my answer is quite late!)? If so, would you like to open a PR with your solution/optimization?

PierUgit · 2024-01-03T09:38:27Z

@jvdp1 I don't have a code ready in Github, I did some tests independently on a code of mine... To be honest I have first to learn how to use fpm and build the stdlib before being able to submit PRs here...

jvdp1 · 2024-01-03T12:16:26Z

@jvdp1 I don't have a code ready in Github, I did some tests independently on a code of mine...

Is it ok for you if I submit a PR and ask you for reviewing/testing it?

To be honest I have first to learn how to use fpm and build the stdlib before being able to submit PRs here...

I usually use CMake for building stdlib. However, I agree that the doc is not clear enough for building stdlib with fpm. I believe that @jalvesz will submit soon a PR with more detailed information for building stdlib with fpm. Your review of this proposal (if you want of course) could be useful too.

PierUgit · 2024-01-03T12:56:18Z

It was actually much easier than expected! I could build stdlib with fpm, modify the code, and add a test case (this routine was not tested). I will submit a PR soon.

I did it from the stdlib-fpm branch, though, and it looks quite behind the master branch...

jvdp1 · 2024-01-03T14:11:18Z

It was actually much easier than expected! I could build stdlib with fpm, modify the code, and add a test case (this routine was not tested). I will submit a PR soon.

I did it from the stdlib-fpm branch, though, and it looks quite behind the master branch...

Actually development should be always from the master branch. The branch stdlib-fpm is just generated for the users. It contains all .f90 generated from the .fypp files, because fpm does not support the fypp preprocessor yet. But it seems fine for your PR.

PierUgit · 2024-01-03T14:59:26Z

Probably I should close the PR and restart from the master branch...

jvdp1 · 2024-01-03T15:32:45Z

Probably I should close the PR and restart from the master branch...

Ok. Indeed, probably better and easier.

jalvesz · 2024-01-03T21:11:18Z

Regarding the build with fpm, what I do and want to include as info in the readme is the following:

source ./ci/fpm-deployment.sh
cd stdlib-fpm/
fpm test --profile release

The fpm-deployment.sh scrit takes care of processing the files and creating the stdlib-fpm which contains only .f90

BTW, in my PR regarding str2num I included a change to this script as we realized that the .fypp files in the test folder were not processed.

PierUgit mentioned this issue Jan 3, 2024

Issue #724 - bitset count performance #755

Closed

PierUgit mentioned this issue Jan 3, 2024

Replaced btest() by popcnt() in bit_count_large() #756

Merged

jvdp1 closed this as completed Jan 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bitset count performance #724

bitset count performance #724

PierUgit commented Jul 1, 2023 •

edited

Loading

Euler-37 commented Jul 6, 2023

PierUgit commented Jul 6, 2023

jvdp1 commented Dec 25, 2023

PierUgit commented Jan 3, 2024

jvdp1 commented Jan 3, 2024

PierUgit commented Jan 3, 2024

jvdp1 commented Jan 3, 2024

PierUgit commented Jan 3, 2024

jvdp1 commented Jan 3, 2024

jalvesz commented Jan 3, 2024

bitset count performance #724

bitset count performance #724

Comments

PierUgit commented Jul 1, 2023 • edited Loading

Euler-37 commented Jul 6, 2023

PierUgit commented Jul 6, 2023

jvdp1 commented Dec 25, 2023

PierUgit commented Jan 3, 2024

jvdp1 commented Jan 3, 2024

PierUgit commented Jan 3, 2024

jvdp1 commented Jan 3, 2024

PierUgit commented Jan 3, 2024

jvdp1 commented Jan 3, 2024

jalvesz commented Jan 3, 2024

PierUgit commented Jul 1, 2023 •

edited

Loading