Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for a bitset data type #221

Closed
wclodius2 opened this issue Jul 18, 2020 · 66 comments
Closed

API for a bitset data type #221

wclodius2 opened this issue Jul 18, 2020 · 66 comments
Labels
implementation Implementation in experimental and submission of a PR

Comments

@wclodius2
Copy link
Contributor

Both C++ and Java define a bitset type, so I propose that the standard
library also have a module implementing a bitset type. My first draft
for a public API for that module is as follows.

The module name is STDLIB_BITSETS.

The module exports one constant, BITS_KIND, to be used as a KIND
value in addressing bits in the bitset. This will initially be
INT32.

The module exports one derived type, BITSETS, with a plural name so
that users can name their scalar realizations as BITSET. The type
will contain two private components: BITS a scalar integer of kind
BITS_KIND giving the number of bits in a bitset entity, and a rank
one allocatable integer array BLOCK of private kind BLOCK_KIND,
that will be INT32, to be used in holding the bit values. The type
has an ASSIGNMENT(=) operation defined.

BITSETS will have the following procedures: ALL, AND, AND_NOT,
ANY_BIT, BIT_COUNT, BITS, CLEAR, EOR, EQUIV, EXTRACT,
FLIP, INIT, INPUT, MAX_SET_BIT, NONE, OR, OUTPUT,
PRINT_BITSET, READ_BITSET, SET, TEST, and VALUE. Some of
these procedures will be overloaded:

interface clear
    module procedure clear_bit, clear_range
end interface clear

interface flip
    module procedure flip_bit, flip_range
end interface flip

interface init
    module procedure init_copy, init_zero
end interface init

interface set
    module procedure set_bit, set_range
end interface set

interface print_bitset
    module procedure print_bitset_string, print_bitset_unit
end interface print_bitset

interface read_bitset
    module procedure read_bitset_string, read_bitset_unit
end interface read_bitset

The API for the individual procedures is listed below. The following
aspects of the API may be controversial:

  1. How the range is defined for CLEAR_RANGE, FLIP_RANGE, and
    SET_RANGE. I have roughly followed Java's bitset here.

  2. That the first argument is modified in AND, AND_NOT, EOR, and
    OR. That is how it is done in the Java bitset, but I can see
    having a function that returns the result rather than a subroutine
    that modifies the first argument.

  3. That the I/O procedures INPUT, OUTPUT, PRINT_BITSET, and
    READ_BITSET have STATUS flags.

  4. That the procedures often ignore user specification of positions
    outside the range 0 to BITS-1. An alternative is to make the
    procedures impure and invoking ERROR STOP. Another option is to
    add a STATUS argument, but I am reluctant to do that for a simple
    user error.

  5. The names of some of the procedures could be more similar to the
    names of Fortran's bit intrinsics.

subroutine init_copy(set, aset, bits)
: Creates the bitset SET, of size BITS if present, otherwise of the
size of ASET. All bits in the range of ASET are copied from ASET. If
BITS is present and larger than the size of ASET then all additional
bits are zero.

subroutine init_zero(set, bits)
: Creates the bitset, SET, of size BITS, with all bits initialized to
zero. BITS must be non-negative.

elemental function all_bits( set )
: Returns .TRUE. if all bits in SET are 1, .FALSE. otherwise.

elemental subroutine and(set1, set2)
: Sets the bits in SET1 to the bitwise AND of the original bits in
SET1 and SET2. If SET1 has fewer bits than SET2 then the
additional bits in SET2 are ignored. If SET1 has more bits than
SET2, then the absent SET2 bits are treated as if present with
zero value.

elemental subroutine and_not(set1, set2)
: Sets the bits in SET1 to the bitwise and of the original bits in
SET1 with the bitwise negation of SET2. If SET1 has fewer bits
than SET2 then the additional bits in SET2 are ignored. If SET1
has more bits, then the absent SET2 bits are treated as if present
with zero value.

elemental function any_bit(set)
: Returns .TRUE. if any bit in SET is 1, .FALSE. otherwise

elemental function bit_count(set)
: Returns the number of non-zero bits in SET.

elemental function bits(set)
: Returns the number of bit positions in SET.

elemental subroutine clear_bit(set, pos)
: Sets to zero the POS position in SET. If POS is less than zero
or greater than BITS(SET)-1 it is ignored.

pure subroutine clear_range(set, start_pos, stop_pos)
: Sets to zero all bits from the START_POS to STOP_POS positions in
SET. If STOP_POS < START_POS then no bits are modified. Positions
outside the range 0 to BITS(SET)-1 are ignored.

elemental subroutine eor(set1, set2)
: Sets the bits in SET1 to the bitwise EOR of the original bits in
SET1 and SET2. If SET1 has fewer bits than SET2 then the
additional bits in SET2 are ignored. If SET1 has more bits than
SET2, then the absent SET2 bits are treated as if present with
zero value.

elemental function equiv(set1, set2)
: Returns .TRUE. if all bits in SET1 and SET2 have the same
value, .FALSE. otherwise. If the sets differ in size a value true
will be returned if and only if the sets are equivalent in the
overlapping range, and all bits outside the overlapping range are
zero.

pure function extract(set, start_pos, stop_pos)
: Creates a new bitset from a range, START_POS to STOP_POS, in
bitset SET.

elemental subroutine flip_bit(set, pos)
: Flips the value at the POS position in SET, provided the
position is valid. If POS is less than 0 or greater than
BITS(SET)-1, then no value is changed.

pure subroutine flip_range(set, start_pos, stop_pos)
: Flips all valid bits from the START_POS to STOP_POS positions in
SET. If STOP_POS < START_POS no bits are flipped. Positions less
than 0 or greater than BITS(SET)-1 are ignored.

subroutine input(unit, set, status)
: Reads the components of the bitset, SET, from the logical unit,
UNIT, assuming that the components were written using OUTPUT.

elemental function max_set_bit( set )
: Returns the maximum position with a set bit. If no bit is set
returns -1.

elemental function none(set)
: Returns .TRUE. if none of the bits in SET have the value 1.

elemental subroutine or(set1, set2)
: Sets the bits in SET1 to the bitwise OR of the original bits in
SET1 and SET2. If SET1 has fewer bits than SET2 then the
additional bits in SET2 are ignored. If SET1 has more bits than
SET2, then the absent SET2 bits are treated as if present with
zero value.

subroutine output(unit, set, status)
: Writes the components of the bitset, SET, to the logical unit,
UNIT, in a unformatted sequence compatible with INPUT.

subroutine print_bitset_string(string, set)
: Writes a BITSETS literal to the allocatable default character
STRING, representing the individual bit values in the bitsets,
SET.

subroutine print_bitset_unit(unit, set, status, advance)
: Writes a bitsets literal to the logical unit, UNIT, representing
the individual bit values in the bitsets, SET. If STATUS is not
present and an error occurs then processing stops with an error
message. If STATUS is present then it has the error code SUCCESS
if no error occurs, has the value ALLOC_FAULT if failure is due to
the allocation of a temporary and, has the value WRITE_FAULT if an
error occurs in the write to the unit. By default or if ADVANCE is
present with the value 'YES', advancing output is used. If ADVANCE
is present with the value 'NO', then the current record is not
advanced by the write.

subroutine read_bitset_string(string, set, status)
: Uses the bitsets literal in the default character STRING, to
define the bitset, SET. The literal may be preceded by an an
arbitrary sequence of blank characters. If STATUS is not present
then an error results in the sending an error message to ERROR_UNIT
and the termination of the program. If STATUS is present then it has
the error code SUCCESS if no error occurs, the value
INVALID_STRING if the sequence of characters after an initial
sequence of blanks is not a BITSETS literal, the value
INVALID_ARRAY_SIZE if the literal's bit size is too large to be
represented by the bit size integer kind, the value ALLOC_FAULT if
allocation of SET failed for the specified BITSIZE, or
INVALID_INTEGER if the HEX literal constant is too large to be
represented by a bit size binary integer. If STATUS is present with
the value SUCCESS then SET is defined, otherwise it is not
defined.

subroutine read_bitset_unit(unit, set, status)
: Uses the bitsets literal at the current position in the formatted
file with logical unit, UNIT, to define the bitset, SET. The
literal may be preceded by an an arbitrary sequence of blank
characters. If STATUS is not present then an error results in the
sending an error message to ERROR_UNIT and the termination of the
program. If STATUS is present then it has the error code SUCCESS
if no error occurs, the value INVALID_STRING if the sequence of
characters after an initial sequence of blanks is not a BITSETS
literal, the value INVALID_ARRAY_SIZE if the literal's bitsize is
too large to be represented by the bitsize integer kind, the value
ALLOC_FAULT if allocation of SET failed for the specified bitsize,
or INVALID_INTEGER if the HEX literal constant is too large to be
represented by a bitsize binary integer. If STATUS is present with
the value SUCCESS then SET is defined, otherwise it is not
defined.

elemental subroutine set_bit(set, pos)
: Sets the value at the POS position in SET, provided the position
is valid. If the position is less than 0 or greater than BITS(SET)-1
then SET is unchanged.

pure subroutine set_range(set, start_pos, stop_pos)
: Sets all valid bits to 1 from the START_POS to the STOP_POS
positions in SET. If STOP_POS < START_POS no bits are
changed. Positions outside the range 0 to BITS(SET)-1 are ignored.

elemental function test(set, pos)
: Returns .TRUE. if the POS position is set, .FALSE.
otherwise. If POS is negative or greater than BITS(SET) - 1 the
result is .FALSE..

elemental function value(set, pos)
: Returns 1 if the POS position is set, 0 otherwise. If POS is
negative or greater than BITS(SET) - 1 the result is 0.

@jvdp1
Copy link
Member

jvdp1 commented Jul 18, 2020

Thank you. Interesting proposition.
I work quite often at the bit-level to spare memory (I must store large tables with values only equal to 0,1,2, or 3). Therefore, I wrote something similar but not as complete.

Could it be that some procedures are available outside the DT? E.g. The olderFortran Standard does not provide a pop_count intrinsic function. Therefore, such a function outside the DT could be useful.

The module exports one constant, BITS_KIND, to be used as a KIND
value in addressing bits in the bitset. This will initially be
INT32.

I guess that it could be easily extended to other kinds with fypp.

@wclodius2
Copy link
Contributor Author

wclodius2 commented Jul 18, 2020 via email

@jvdp1
Copy link
Member

jvdp1 commented Jul 21, 2020

Maybe I am misunderstanding what you are saying, but I think you are misunderstanding what I am saying.

Indeed, I misundertood you. Sorry and thank you for the explanations.

On overall I am fine with the API.

@wclodius2
Copy link
Contributor Author

wclodius2 commented Jul 21, 2020 via email

@Romendakil
Copy link

Personally I never saw the need for a bitmaps type, however, it was an ever and ever recurring topic on c.l.f., so I am pretty sure it would be used if available. So you clearly have my endorsement. But I am pretty new here and just started to read the messages since 2 weeks or so. Your proposal for the API seems very good to me.

@milancurcic
Copy link
Member

I'm not the target audience for this, but I don't object to it being part of stdlib. I think it's in scope and the API is clean and clear.

@certik @aradi @arjenmarkus @MarDiehl do you mind offering your feedback?

@arjenmarkus
Copy link
Member

arjenmarkus commented Jul 23, 2020 via email

@arjenmarkus
Copy link
Member

I have looked at the proposal and while I don't use bitsets myself, I can see that there are applications for them in Fortran. The specifications seem complete, but perhaps a bit overcomplete. Here are my detailed remarks:

  • In the introduction you mention ALL, but later it is called ALL_BITS
  • I think a name like ANY_BITS is more consistent
  • Is a function like NONE really needed? You can get the same effect with .NOT. ANY
  • Exclusive OR appears as .XOR., not as .EOR. - perhaps use "XOR"?
  • Equivalent appears as .EQV. - I therefore suggest EQV
  • There is a max_set_bit function, but not a first_set_bit function - does that not have any use?
  • The description of the formatted input/output routines does not include the actual format for the bitsets. Only a vage mention is made of HEX. Is it really useful to only support hexadecimal input/output? Why not a sequence of 0 's and 1's? It would be much easier to specify in my opinion.
  • The elemental routines AND, AND_NOT etc. require some further explanation: if SET2 is smaller than SET1, then some convention must be used to deal with the extra bits in SET1. Am I correct in assuming that the design is such that SET2 is (at least conceptually) extended with 0 bits? An alternative could be to extend it (again conceptually) in such a way that the extra bits in SET1 are NOT changed.

Just a few remarks and suggestions :).

@wclodius2
Copy link
Contributor Author

wclodius2 commented Jul 23, 2020 via email

@arjenmarkus
Copy link
Member

arjenmarkus commented Jul 24, 2020 via email

@wclodius2
Copy link
Contributor Author

wclodius2 commented Jul 24, 2020 via email

@MarDiehl
Copy link
Contributor

MarDiehl commented Jul 24, 2020

I am really inclined to make the BITSETS literal a string of zeros and ones. Its intuitive and if people want to save memory they should use the binary representation.

I have never used BITSETS, but using a string of zeros and ones seems odd to me. Why not use an array (of variable size) of logicals/booleans?

@wclodius2
Copy link
Contributor Author

wclodius2 commented Jul 24, 2020 via email

@ivan-pi
Copy link
Member

ivan-pi commented Jul 25, 2020

Speaking of BITSET literals, I don't see a reason why arrays of integers could not be used:

use stdlib_bitsets, only: bitsets, assignment(=)
type(bitsets) :: b

b = [1,0,1,0,0,0,0,0] ! (overloaded assignment for assumed size arrays)
b = "10100000"        ! (overloaded assignment for character literals)

But personally, I find using a string more attractive. Using logical literals .true./.false. is too verbose.

In the potential use case, where you would need to convert an array of zeros and ones (or a logical mask) to a string, you could probably do so using something along the lines of:

character(len=:), allocatable :: bstr
integer, allocatable :: ib(:), dc(:)
integer :: i
dc = [(i,i=1,20)]
allocate(ib,mold=dc)
where (mod(dc,2)==0)
  ib = 1
else where
  ib = 0
end where

allocate(character(len=size(ib)) :: bstr)
write(bstr,'(*(I1))') ib

write(*,'(*(I1))') dc
write(*,'(*(I1))') ib
write(*,'(A)'), bstr

which produces the output:

12345678
01010101
01010101

I haven't looked up the details on boz constants, so I don't know if this could be perhaps also allowed:

b = b'10100000'
b = z'A0'

@wclodius2
Copy link
Contributor Author

wclodius2 commented Jul 25, 2020 via email

@ivan-pi
Copy link
Member

ivan-pi commented Jul 26, 2020

My response was partially referring to the comment by @MarDiehl. I understand that the internal representation is different.

My understanding of literals though was:

A literal is a source code representation of a fixed value. They are represented directly in the code without any computation.

With the currently proposed API, how do I initialize a bitset? Like this?

type(bitsets) :: bitset
integer :: stat
call read_bitset_string("1111111",bitset,stat)

@wclodius2
Copy link
Contributor Author

wclodius2 commented Jul 26, 2020 via email

@MarDiehl
Copy link
Contributor

@wclodius2: thanks for the clarification

I think with overloading = the following operations would be user friendly

type(bitset) :: b
b = '101'
b = [.True.,.False.,.True.]
b = [1,0,1]
b = 101

The skeleton code would then look like

module bitset_m                                                                                     
                                                                                                    
 implicit none                                                                                      
                                                                                                    
 type, public :: bitset                                                                             
 end type bitset                                                                                    
                                                                                                    
 interface assignment (=)                                                                           
   module procedure assign_str                                                                      
   module procedure assign_int                                                                      
   module procedure assign_int_array                                                                
   module procedure assign_bool_array                                                               
 end interface assignment (=)                                                                       
                                                                                                    
contains                                                                                            
                                                                                                    
pure subroutine assign_str(self,str)                                                                
  type(bitset), intent(out)    :: self                                                              
  character(len=*), intent(in) :: str                                                               
end subroutine                                                                                      
                                                                                                    
pure subroutine assign_int(self,i)                                                                  
  type(bitset), intent(out) :: self                                                                 
  integer,      intent(in)  :: i                                                                    
end subroutine                                                                                      
                                                                                                    
pure subroutine assign_int_array(self,i)                                                            
  type(bitset),         intent(out) :: self                                                         
  integer, dimension(:),intent(in)  :: i                                                            
end subroutine                                                                                      
                                                                                                    
pure subroutine assign_bool_array(self,b)                                                           
  type(bitset),         intent(out) :: self                                                         
  logical, dimension(:),intent(in)  :: b                                                            
end subroutine                                                                                      
                                                                                                    
end module bitset_m 

@MarDiehl
Copy link
Contributor

I personally would prefer object oriented code:

type(bitset) :: b
call b%init(size=7) ! length of 7 bits
call b%from_formatted('file.txt') ! filename
call b%from_formatted(fh) ! filehandle

@wclodius2
Copy link
Contributor Author

wclodius2 commented Jul 26, 2020 via email

@wclodius2
Copy link
Contributor Author

wclodius2 commented Jul 26, 2020 via email

@ivan-pi
Copy link
Member

ivan-pi commented Jul 27, 2020

What does b=101 mean?.Do I treat it as a 3 bit bitset of 1,0, and 1 or a 32 bit bitset whose bits are those of the underlying INTEGER(INT32). For a long character string how do I ensure that the user hasn’t accidentally dropped or added a character. That is why I am considering having the syntax for a bit string be S#B# here the first # is the length of the bitset, and the second # is a string of ‘0’s and ‘1’s. What do I do if the user enters O instead of 0? What if I am passed b=[1,0,2] or b=[1,0,-1]?

In case of b = 101 it would have to be the underlying integer(int32) representation. This would allow usage of binary, octal, and hexadecimal boz constants. Upon more thought, I would avoid initialization using integer arrays altogether.

I see many similarities in this discussion to the C++ std::bitset. They allow initialization from either integers or character strings (zeros and ones, or a custom pair of characters). An exception is raised for invalid arguments in the constructor. Interestingly, they do not overload the assignment operator (apart from creating copies).

Seeing that the C++ bitset uses a template for the size, perhaps a solution using parametrized derived types would be also of interest.

  integer, parameter :: bits_kind = int32
  integer, parameter :: block_kind = int32

  type :: bitsets(b)
    integer, len :: b
    integer(bits_kind) :: bits = b
    integer(block_kind) :: block(b/storage_size(block_kind)+1)
  end type

The downsides are that since block is not allocatable anymore, enlarging a bitset means declaring a second (larger) instance; and the compiler support might not be mature enough.

@ivan-pi
Copy link
Member

ivan-pi commented Jul 27, 2020

I have found out the C++ ecosystem also has a dynamic bitset: https://www.boost.org/doc/libs/1_35_0/libs/dynamic_bitset/dynamic_bitset.html

@ivan-pi
Copy link
Member

ivan-pi commented Jul 29, 2020

This would allow usage of binary, octal, and hexadecimal boz constants.

Upon reading a recent comp.lang.fortran discusssion, I came to realize that boz literals can only be used in data statements and intrinsic procedures, meaning these constants need to be passed through the conversion function int:

open(newunit=unit)
write(unit,*) int(b'10110101')
call init(set=bs,bits=6)
call read_bitset(unit=unit,set=bs) ! bs contains '110101'
close(unit)

@wclodius2
Copy link
Contributor Author

Right now I feel stymied on this issue for several reasons:

  1. It is hard to write much of the code without a decision on how to handle and propagate errors. Issue Handling and propagating errors in stdlib #224 could use a wider participation so that a consensus on that issue could be reached.
  2. People are suggesting changes in the API without commenting as to whether they would support the BITSET type with or without the changes.
  3. Most of the suggested changes involve additional ways to initialize a BITSET object, that I feel should have low priority. Collectively they will add a burden to the API and testing that will provide only incremental functionality.
  4. The level of participation in this issue is small, so that even if all the participants in this issue supported the API it is not clear that the BITSET type has sufficient support to be worth adding to the library.

@wclodius2
Copy link
Contributor Author

wclodius2 commented Aug 1, 2020 via email

@certik
Copy link
Member

certik commented Aug 1, 2020

@wclodius2 the reason why there is relatively low traffic here (although 8 participants is not bad!) is that I think most people (myself included) haven't needed this in Fortran. So I am fine to include it into stdlib if there is interest, but I will let others who actually use this to lead this particular effort.

@Romendakil
Copy link

Is it much effort to include a base construction? If not, why not include it and see whether there is feedback in order to decide to extend it?

@wclodius2
Copy link
Contributor Author

wclodius2 commented Aug 1, 2020 via email

@ivan-pi
Copy link
Member

ivan-pi commented Sep 7, 2020

Regarding the abstract bitset_t, I think it's suitable for declaring polymorphic bitset variables such as:

class(bitset_t), allocatable :: foo ! could be bitset_64 or bitset_large

In my own codes I have someties prepended abstract, as in abstract_bitset_t, but here this would be unnecessarily verbose. Good documentation should lead users to the right type for their application.

The second usage of the abstract type name will occur in subroutine declaration blocks:

subroutine some_bitset_operation(b)
  class(bitset_t), intent(in) :: b
  select type(b)
    type is (bitset_64)
      ! do something
    type is (bitset_large)
      ! do something else
  end select
end function

Concerning bitset_64 I wonder if dropping the underscore would be better? This would make it similar to the constants from iso_fortran_env, i.e int64, real64 (I realize this comparison is not totally correct as these are type constants and not the actual types). I understand the underscore is there for consistency with bitset_large, however bitset64 seems easier to use in practice.

From the linguistic point of view, putting the adjective first - large_bitset, is more logical than bitset_large. In Boost they also put the adjective first (see dynamic_bitset). Perhaps a pair of adjective-derived names such as small_bitset for the one with 64 bits, and large_bitset for the variable one, would communicate the purpose of the derived types. Users can always rename upon import to use whatever they want, i.e. use stdlib_bitset, bitset64 => small_bitset. A problem however arises with this kind of naming if other fixed-size bitsets are introduced (with 32 or 128 bits).

@ivan-pi
Copy link
Member

ivan-pi commented Sep 7, 2020

I also agree with the answer to the second question by @arjenmarkus. Since the two sub-types will likely appear in different usage cases, I would initially go for an explicit conversion function. If I have understood correctly, the bitset_large can have variable size? For this type I would still expect the binary operators to work irrespective of the number of bits held internally. I would however (for now) avoid mixed binary operators between bitset_64 and bitset_large.

@wclodius2
Copy link
Contributor Author

wclodius2 commented Sep 7, 2020 via email

@arjenmarkus
Copy link
Member

arjenmarkus commented Sep 8, 2020 via email

@arjenmarkus
Copy link
Member

arjenmarkus commented Sep 8, 2020 via email

@wclodius2
Copy link
Contributor Author

@arjenmarkus I have also never tried parameterized derived types. Support is more widespread than I expected, http://fortranwiki.org/fortran/show/Fortran+2003+status, but it has always sounded more quirky than I like.

@wclodius2
Copy link
Contributor Author

In thinking further a bitset type is about as close as you can come to a type suited for parameterized derived types, if there is one that is suited for PDTs. I am going to experiment a little with using PDTs for bitsets.

@wclodius2
Copy link
Contributor Author

After experimenting I have found a problem with gfortran 10.2's implementation of PDTs for which I cannot find a workaround. Also I had hoped to be able to declare bitsets with their bitesize initialized to zero and ifort's error messages seem to be indicate that that would be non-standard. So I have given up on using PDTs for bitsets.

@ivan-pi
Copy link
Member

ivan-pi commented Sep 9, 2020

In thinking further a bitset type is about as close as you can come to a type suited for parameterized derived types, if there is one that is suited for PDTs. I am going to experiment a little with using PDTs for bitsets.

A PDT could bring bitsets very close to the C++ std:bitset. I've also done some testing, reaching the same conclusion - compiler support is not good enough.

With ifort 19.1 I am able to compile the following type hierarchy:

module stdlib_abstract_bitset

  implicit none
  private

  public :: bitset_t

  type, abstract :: bitset_t
  end type

end module

module stdlib_small_bitset

  use stdlib_abstract_bitset, only: bitset_t
  use iso_fortran_env, only: int32, int64
  implicit none

  integer, parameter :: block_kind = int32
  integer, parameter :: bits_per_block = storage_size(block_kind)

  type, extends(bitset_t) :: small_bitset(len)
    integer, len :: len = 0
    integer(block_kind) :: m_bits(ceiling(real(len)/real(bits_per_block)))
  end type

contains
...
end module

The part that started to bother me was in functions similar to the example from @arjenmarkus:

  function small_bitset_and(a,b) result(new)
    type(small_bitset(*)), intent(in) :: a
    type(small_bitset(a%len)), intent(in) :: b
    type(small_bitset(a%len)) :: new 
    integer :: i, nblocks

    nblocks = size(a%m_bits)
    do i = 1, nblocks
        new%m_bits(i) = iand(b%m_bits(i),a%m_bits(i))
    end do
  end function

Assuming I now bind this function to the operator .and. and use it as follows:

  type(small_bitset(64)) :: a, b, c
  type(small_bitset(32)) :: d

  c = a .and. b ! valid
  d = a .and. b ! expected it to be invalid, but compiler produces no error message

the value of d%len after the assignment becomes 64!

@wclodius2
Copy link
Contributor Author

I now have an implementation of bitset_t, bitset_64, and bitset_large that passes what I consider to be extensive testing. Before I consider doing a PR for it I would like to have opinions on the following questions:

  1. Should I change bitset_t to bitset_type?
  2. Should I add an _t or _type to bitset_64 or bitset_large?
  3. Currently I assume without checking that for the binary ops and, and_not, or, xor, ==, /=, <, <=, >, and >= that the arguments have the same size. Should I enforce requiring the same size by adding a branch to error stop?

@wclodius2
Copy link
Contributor Author

@ivan-pi before I even consider using PDTs I need to be able to compile with both of the two most widely used compilers ifort and gfortran, and preferably versions of both compilers more than a couple of years old. Gfortran will not let me combine inheritance and PDTs, giving unrelated error messages. Ifort will accept combining inheritance and PDTs, but will not let me initialize at declaration the arrays holding the bits, i.e. will not accept code such as

module stdlib_abstract_bitset

  implicit none
  private

  public :: bitset_t

  type, abstract :: bitset_t
  end type

 end module

module stdlib_small_bitset

  use stdlib_abstract_bitset, only: bitset_t
  use iso_fortran_env, only: int32, int64
  implicit none

  integer, parameter :: block_kind = int32
  integer, parameter :: bits_per_block = storage_size(block_kind)

 type, extends(bitset_t) :: small_bitset(len)
    integer, len :: len = 0
    integer(block_kind) :: m_bits(ceiling(real(len)/real(bits_per_block)))=0 ! note the initialization
  end type

contains
...
end module

If I have to explicitly initialize the arrays I see no advantage to PDTs.

@FortranFan
Copy link

@wclodius2 wrote:

..
type, extends(bitset_t) :: small_bitset(len)
integer, len :: len = 0
integer(block_kind) :: m_bits(ceiling(real(len)/real(bits_per_block)))=0 ! note the initialization
end type
..
.. If I have to explicitly initialize the arrays I see no advantage to PDTs.

From what I understand, it's the Fortran standard that disallows the initialization shown above, Intel Fortran implementation is simply trying to conform.

But now, I don't understand the comment, "If I have to explicitly initialize the arrays I see no advantage to PDTs."

What is the alternate "design" option with bitset_large? The ALLOCATABLE attribute for the array component of the derived type toward "bitset"? If so, that too will need to be explicitly initialized.

@wclodius2
Copy link
Contributor Author

wclodius2 commented Sep 20, 2020

FWIW I was tired when I wrote that. I know it is the Fortran standard that forbids it, Ifort's error messages are clear on that. I just think it was an unfortunate decision for the standard to forbid it. It has an obvious interpretation, that looks to me very implementable. It is equivalent to character strings, by default, always being blank padded. I know that an allocatable array would also require an explicit initialization. I also know that users are human, and will forget to perform an initialization. I also know that trying to access unallocated memory is more likely to generate a useful error message, than accessing allocated, but uninitialized, memory. All of this is moot though, since my implementation uses inheritance, gfortran will not let me combine inheritance and PDTs, and I consider compilation by gfortran critical.

@ivan-pi
Copy link
Member

ivan-pi commented Sep 25, 2020

@ivan-pi before I even consider using PDTs I need to be able to compile with both of the two most widely used compilers ifort and gfortran, and preferably versions of both compilers more than a couple of years old. Gfortran will not let me combine inheritance and PDTs, giving unrelated error messages.

That is a major drawback indeed. The example I produced above could only be compiled with ifort. It is a shame the compiler support is not mature enough, as I think this would be a compelling use case for PDTs. Maybe writing a post about this on the Intel Fortran Forum or comp.lang.fortran could help grab some attention.

@wclodius2
Copy link
Contributor Author

I am now ready to do a PR if there is demand. Is there anyone ready to do a review of a PR with the following files?
stdlib_bitsets.f9 - 2131 lines, the module
stdlib_bitset_64.f90 - 1347 lines, a submodule implementing a bitset_64 type
stdlib_bitset_large.f90 - 1580 lines, a submodule implementing a bitset_large type

stdlib_bitsets.md - 1975 lines, a markdown document documenting the code

I also have two test programs that may or may not need review:
test_stdlib_bitset_64.f90 - 745 lines
test_stdlib_bitset_large.f90 - 1477 has lines

@jvdp1
Copy link
Member

jvdp1 commented Sep 28, 2020

I am now ready to do a PR if there is demand. Is there anyone ready to do a review of a PR with the following files?
stdlib_bitsets.f9 - 2131 lines, the module
stdlib_bitset_64.f90 - 1347 lines, a submodule implementing a bitset_64 type
stdlib_bitset_large.f90 - 1580 lines, a submodule implementing a bitset_large type

stdlib_bitsets.md - 1975 lines, a markdown document documenting the code

I also have two test programs that may or may not need review:
test_stdlib_bitset_64.f90 - 745 lines
test_stdlib_bitset_large.f90 - 1477 has lines

Thank you for these contributions. I think you can submit the PR.
I will be a reviewer ;)

@ivan-pi
Copy link
Member

ivan-pi commented Sep 28, 2020

Thank you for the large effort! I would also be happy to review the PR . (At latest I will find the necessary time next weekend.)

@wclodius2
Copy link
Contributor Author

I have now created a PR. The code is in my branch https://github.com/wclodius2/stdlib/tree/bitsets . I made the branch from my logger branch which may have been a mistake as it is showing the changes not only for the bitsests, but also for the logger. Let me know if I need to change that, and how to change that if necessary. FWIW I have done I think a reasonable job of better matching the undocumented aspects of the styles for the source code and markdown document, so I think it will require fewer changes than the logger branch, even though it is more than twice as large.

@jvdp1
Copy link
Member

jvdp1 commented Sep 28, 2020

I have now created a PR. The code is in my branch https://github.com/wclodius2/stdlib/tree/bitsets . I made the branch from my logger branch which may have been a mistake as it is showing the changes not only for the bitsests, but also for the logger. Let me know if I need to change that, and how to change that if necessary.

Thank you @wclodius2 for the PR #236 . Indeed, it would be good to remove the logger code from #236. But I am not sure how to remove the logger code from this branch. @certik @milancurcic Any idea?

@milancurcic
Copy link
Member

@wclodius2 I think the best way to do it is to:

  1. Close Bitsets #236
  2. Create a new branch that is even with master
  3. Add bitsets code to the new branch
  4. Open a new PR

But somebody more experienced with git and GitHub may know a better way.

@wclodius2
Copy link
Contributor Author

The PR seems to have broken the build process. I can see two possible causes:

  1. Testing with gfortran 10.2 is not sufficient. I will try testing with ifort 17 to see if that finds a problem.

  2. I failed to properly update a CMakeLists.txt or Makefile.manual file.

@wclodius2
Copy link
Contributor Author

Checking the error reports it looks like one problem is having error stop in pure procedures. I will fix that in the declarations of stdlib_bitsets.f90 and the actual code of stdlib_bitset_64.f90 and stdlib_bitset_large.f90.

@wclodius2
Copy link
Contributor Author

I now have the code passing the build process. Ifort also caught a problem with an uninitialized block in bitset_64, that I suspect gfortran was initializing to zero, as intended, but not specified. However now the document builder (FORD?) is failing with an internal error on macOS, but not ubuntu. FWIW the error message is
"GitHub Actions has encountered an internal error when running your job."

@wclodius2
Copy link
Contributor Author

I have created a PR for a branch called bitsets3. The PR had one one that failed on ubuntu, out of six attempts, three on ubuntu and three on macOS..

@wclodius2
Copy link
Contributor Author

At this point @jvdp1 has performed an extensive review of the code, and @14NGiestas has commented on some parts of it It would be useful if someone else also extensively reviewed the code.

@ivan-pi ivan-pi added the implementation Implementation in experimental and submission of a PR label Nov 23, 2020
@14NGiestas
Copy link
Member

The bitsets module was merged a while ago in #239 I believe the corresponding API discussion can be closed.

@milancurcic
Copy link
Member

Agreed, thanks @14NGiestas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
implementation Implementation in experimental and submission of a PR
Projects
None yet
Development

No branches or pull requests