Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bitset count performance #724

Closed
PierUgit opened this issue Jul 1, 2023 · 10 comments
Closed

bitset count performance #724

PierUgit opened this issue Jul 1, 2023 · 10 comments

Comments

@PierUgit
Copy link
Contributor

PierUgit commented Jul 1, 2023

In the bitsets module, the bit_count_large() function is scanning bit by bit with btest() in the main loop:

        bit_count = 0
        do block_ = 1_bits_kind, size(self % blocks, kind=bits_kind) - 1
            do pos = 0, block_size-1
                if ( btest( self % blocks(block_), pos ) ) &
                    bit_count = bit_count + 1
            end do
        end do

Using chunks of blocks and the fact that btest() is elemental, one can get significant performance improvement (on a similar implementation I obtain a 5x speed-up witch a chunk size of 1024):

        bit_count = 0
        do block_ = 1_bits_kind, size(self % blocks, kind=bits_kind) - 1, chunk
            block__ = min(block_+chunk-1,size(self % blocks, kind=bits_kind))
            do pos = 0, block_size-1
                bit_count = bit_count + count( btest(self % blocks(block_:block__), pos ) )
            end do
        end do

(edit) of course, this makes sense only for a large number of bits

@Euler-37
Copy link
Contributor

Euler-37 commented Jul 6, 2023

fortran intrinsic function popcnt looks better. https://gcc.gnu.org/onlinedocs/gfortran/POPCNT.html

@PierUgit
Copy link
Contributor Author

PierUgit commented Jul 6, 2023

Looks like, but then why is it not used in bit_count_large()?

@jvdp1
Copy link
Member

jvdp1 commented Dec 25, 2023

If I remember well, bitset_large was developed as an externsion of bitset_64. For bitset_64, popcnt cannot be used because some bits may not be used. The solution with btest() is therefore legitimate. The extension of bit_count for bitset_large from bitset_64 was straighfoward.

However, I agree that @PierUgit 's solution or using popcnt in this loop would be a more efficient implementation.

@PierUgit , do you still have such an implementation (i recognize that my answer is quite late!)? If so, would you like to open a PR with your solution/optimization?

@PierUgit
Copy link
Contributor Author

PierUgit commented Jan 3, 2024

@jvdp1 I don't have a code ready in Github, I did some tests independently on a code of mine... To be honest I have first to learn how to use fpm and build the stdlib before being able to submit PRs here...

@jvdp1
Copy link
Member

jvdp1 commented Jan 3, 2024

@jvdp1 I don't have a code ready in Github, I did some tests independently on a code of mine...

Is it ok for you if I submit a PR and ask you for reviewing/testing it?

To be honest I have first to learn how to use fpm and build the stdlib before being able to submit PRs here...

I usually use CMake for building stdlib. However, I agree that the doc is not clear enough for building stdlib with fpm. I believe that @jalvesz will submit soon a PR with more detailed information for building stdlib with fpm. Your review of this proposal (if you want of course) could be useful too.

@PierUgit
Copy link
Contributor Author

PierUgit commented Jan 3, 2024

It was actually much easier than expected! I could build stdlib with fpm, modify the code, and add a test case (this routine was not tested). I will submit a PR soon.

I did it from the stdlib-fpm branch, though, and it looks quite behind the master branch...

@jvdp1
Copy link
Member

jvdp1 commented Jan 3, 2024

It was actually much easier than expected! I could build stdlib with fpm, modify the code, and add a test case (this routine was not tested). I will submit a PR soon.

I did it from the stdlib-fpm branch, though, and it looks quite behind the master branch...

Actually development should be always from the master branch. The branch stdlib-fpm is just generated for the users. It contains all .f90 generated from the .fypp files, because fpm does not support the fypp preprocessor yet. But it seems fine for your PR.

@PierUgit
Copy link
Contributor Author

PierUgit commented Jan 3, 2024

Probably I should close the PR and restart from the master branch...

@jvdp1
Copy link
Member

jvdp1 commented Jan 3, 2024

Probably I should close the PR and restart from the master branch...

Ok. Indeed, probably better and easier.

@jalvesz
Copy link
Contributor

jalvesz commented Jan 3, 2024

Regarding the build with fpm, what I do and want to include as info in the readme is the following:

source ./ci/fpm-deployment.sh
cd stdlib-fpm/
fpm test --profile release

The fpm-deployment.sh scrit takes care of processing the files and creating the stdlib-fpm which contains only .f90

BTW, in my PR regarding str2num I included a change to this script as we realized that the .fypp files in the test folder were not processed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants