Function to pad string with zeros #394

ivan-pi · 2021-04-23T13:43:51Z

Description

A function to pad a string with zeros or another symbol to a given width could be helpful in many cases, especially together with the to_string() function for integer to string conversion.

Currently available methods to achieve this are using concatenation

width = 32
str = "hello"
padded_str = repeat('0',width-len(str)')//str

and perhaps also internal file I/O

write(buffer,'(A32)', blank='zero') "hello"   ! doesn't work for some reason

I propose to add the following functions

! left - padding
function padl(str,width,symbol) result(padded_str)
  character(len=*), intent(in) :: str
  integer, intent(in) :: width
  character(len=1), intent(in), optional :: symbol
  character(len=max(len(str),width) :: padded_str
end function
! right - padding
function padr(str,width,symbol) result(padded_str)
  character(len=*), intent(in) :: str
  integer, intent(in) :: width
  character(len=1), intent(in), optional :: symbol
  character(len=max(len(str),width) :: padded_str
end function

The original string is returned if width < len(str). The default padding symbol could be either the blank symbol ' ' or '0'. Special treatment of a leading sign symbol '+'/'-' might be also desirable.

Prior art

Python: zfill
(Another method to achieve padding in Python is using the string class method .format())
Julia: lpad, rpad
MATLAB: pad

The text was updated successfully, but these errors were encountered:

milancurcic · 2021-04-23T16:06:50Z

I use this often to left-pad integers with zeros when enumerating filenames, but using the edit descriptors in format.

Do you think a variant of padl and padr that accepts integers would be useful in addition to the ones you proposed?

res = padl(23, 4) ! returns "0023"

Compare it to:

res = padl(to_string(23), 4) ! returns "0023"

(Edit: fixed the snippet above)

I like the proposed names. Rarely there's an opportunity to introduce such short but clear names.

ivan-pi · 2021-04-23T16:30:46Z

I use this often to left-pad integers with zeros when enumerating filenames, but using the edit descriptors in format.

That's my use case too. Currently I resort to the pattern:

integer :: istep,unit
character(len=32) :: step
write(step,'(I0.8)') istep
open(newunit=unit,file="output"//trim(step)//".txt")

Note your second call should be res = padl(to_string(23),4).

Concerning the name, I thought that padl/padr is more in line with the adjustl/adjustr Fortran intrinsic functions than the prefixed versions like Julia has.

I'm not yet sure if making zero the default padding symbol is the right thing to do. But I would certainly support introducing an integer to zero-padded string function. I use this in practically every code. Maybe borrowing the Python zfill name specifically for this purpose?

function zfill(x,w) result(res)
  integer, intent(in) :: x
  integer, intent(in) :: width
  character(len=:), allocatable :: res
end function

integer :: unit
open(newunit=unit,"output"//zfill(23,4)//".txt)    ! opens "output0023.txt"

milancurcic · 2021-04-23T16:40:34Z

Indeed, with the second snippet corrected, the difference is less drastic.

I usually like a separate function with a specific purpose (e.g. zfill), although the same thing can be accomplished with padl with not that much more verbosity. So now I'm split between the two approaches.

ivan-pi · 2021-04-23T18:08:52Z

Maybe instead of leaving the concatenation to the consumer, we can make an even more specific function like

function filename(name,num,extension) result(res)
character(len=*), intent(in) :: name
integer, intent(in) :: num, width
character(len=*), intent(in) :: extension
character(len=:), allocatable :: res
end function

where we would allow some primitive type of integer formatting (e.g. using something like {:08d} in Python). But at this point, I wonder why not just go back to the string buffer

character(len=14) :: filename
write(filename,'(A,I0.8,A)') "myfile", 23, ".txt"

So I think a general zero-padding function will give greater flexibility.

Irrespective of the default padding symbol, if we overload pad to work for integers, you could use padl(23,4,'0') which is pretty short.

My preferences:

If the default padding character is '0', we also overload it to work for integers, meaning padl(23,4) is equivalent to padl(23,4,'0').
If the default padding character is ' ', we introduce a separate function (called zfill,zpad,padz, enumz,zstr or something similar) specific to the file enumeration scenario.

So now we need some arguments to support/reject the blank symbol.

cc @LKedward @awvwgk

awvwgk · 2021-04-23T19:15:38Z

Maybe instead of leaving the concatenation to the consumer, we can make an even more specific function like

I'm using something similar for unit testing round-tripping in an IO library of mine where I need some temporary file names rather than scratch units in this particular context:
https://github.com/grimme-lab/mctc-lib/blob/5603d588ac12fd20b6643527d7cf68fe8c05e070/test/test_read.f90#L420-L429
https://github.com/grimme-lab/mctc-lib/blob/5603d588ac12fd20b6643527d7cf68fe8c05e070/test/test_write.f90#L218-L227

Writing a small helper is especially easy in this scenario, because you are padding the integer printout to a fixed length and therefore know exactly the required size of the buffer.

As a first start, we could wrap internal file IO as function similar to to_string but allow the user to specify the format, which gives at least access to all intrinsic formatters:

character(len=:), allocatable :: str
str = to_string(42)  ! str == "42"
str = to_string(42, '(i0.4)')  ! str == "0042"
str = to_string(42, '(z0.4)')  ! str == "002A"

Using to_string(42, '(i0)') will be slower than to_string(42) because of the internal file IO in such an approach. We might have to parse the formatter as well to avoid overflows for to_string(42, '(i0.1000)') or similar.

urbanjost · 2021-04-24T01:14:54Z

lenset, stretch, and atleast in M_strings, which duplicate some functionality for historical purposes are used for slightly different use cases, as shown in the examples in the man-pages in the link above.

There are a couple of other routines like uniq(3f) in M_io at the same github site that are related to adding a numeric suffix to a name, although I usually just use an internal WRITE() to add a numeric suffix in a loop when I want to overwrite like

 
CHARACTER(LEN=256) :: FILENAME
INTEGER :: I
I=12
WRITE(FILENAME,'("PREFIX.",i0.3)')i
WRITE(*,*)'OPEN FILE ',trim(FILENAME)
I=1000
WRITE(FILENAME,'("PREFIX.",i0.3)')i
WRITE(*,*)'OPEN FILE ',trim(FILENAME)
end

shows, with newer features like "i0" lets you get leading zeros but still get a good name if you go over the expected range of numbers.

Just some Fortran-based prior art with some variants from what was discussed that were/are needed at various times (put them all together with the intent of doing something like stdlib, but hoping efforts like this will do the work for me!).
where the internal write with a format gives nice leading zeros

ivan-pi · 2021-04-26T15:57:31Z

I made a small demonstration of padding without the optional padding symbol.

module pad_mod
  implicit none
contains
  function padl(str,width) result(res)
    character(len=*), intent(in) :: str
    integer, intent(in) :: width
    character(len=max(len_trim(str),width)) :: res
    res = str
    res = adjustr(res)
  end function
  function padr(str,width) result(res)
    character(len=*), intent(in) :: str
    integer, intent(in) :: width
    character(len=max(len_trim(str),width)) :: res
    res = str
  end function
end module

program test_pad
  use pad_mod
  implicit none
  print *, "Hello"
  print *, padl("Hello",10)
  print *, padr("Hello",12), len(padr("Hello",12))
end program

Output of the program:

$ gfortran pad_mod.f90
$ ./a.out
 Hello
      Hello
 Hello                 12

I'm not really sure what is the purpose of padr with blanks. It is effectively the opposite of trim.

ivan-pi · 2021-04-26T16:01:58Z

Depending on what our goals are, we could simply introduce pad (left-pad) as the opposite of trim. Just throwing it out on the table.

Right padding would be achieved with adjustl(pad(str,width)) (symmetric with respect to how left-trimming can be currently performed using trim(adjustl(str))).

ivan-pi · 2021-04-26T16:11:12Z

I also found one previous Fortran implementation of zfill here (GPL Licensed).

aman-godara · 2021-06-19T19:25:46Z

This is what was proposed in the proposal for GSoC. This was designed with an aim to provide as much functionalities as possible to a user. It might vary from current state of agreement but let me still put it here.

Signature: pad(string, width, pad_with(Optional) = “ ”)
Output: a new string_type object

Function has 2 versions: padl and padr (like adjustl and adjustr)
If the width required after the execution of the function is greater than
len(string) then a new string_type object with one side (either left or right)
padded with pad_with character sequence is returned, otherwise a deep
copy of input string is returned. Note that the length of pad_with must be
greater than 0 and can be greater than 1.
If pad_with has length greater than 1 then there are 2 possibilities of
padding: padding from left to right or padding from right to left. Take a look
at the example to understand this better:

padl(“pad this”, width = 15, pad_with = “!@#”):

1). “!@#!@#!pad this” (padding from left to right)
2). “#!@#!@#pad this” (padding from right to left)
Function will provide both the options to users. Note that pad is different
from adjustr and adjustl also because adjust* provide an inherent trimming in them
whereas pad function doesn’t.

I think it is better to keep size of pad_with to 1 at this moment.
If we agree to truncate the output (if needed) so that it is strictly of the asked length, then IMO it shouldn't be named as pad.

Can we also divide trim into 2: triml and trimr?: but then it would become inconsistent with trim intrinsic ultimately we will have to rename the function.
I think adjustl and adjustr make use of something like padr(triml( string ), len(string)) and padl(trimr( string ), len(string)) respectively where optional argument pad_with is ' ' (1 space).

ivan-pi · 2021-06-21T11:25:17Z

I think it is better to keep size of pad_with to 1 at this moment.

I agree. If we can reach agreement on the behavior of padding strings where len(pad_with) > 1 is true, we can always add them later in a backward compatible way.

If we agree to truncate the output (if needed) so that it is strictly of the asked length, then IMO it shouldn't be named as pad.

For fixed length strings truncation will happen upon assignment, while allocatable strings will be re-allocated to the result length. We cannot change these rules. But I don't see any issue with the case when len(pad_with) == 1 is true.

I think adjustl and adjustr make use of something like padr(triml( string ), len(string)) and padl(trimr( string ), len(string)) respectively where optional argument pad_with is ' ' (1 space).

I guess these are equal in practice, but at the compiler level the implementation for the intrinsic functions might work slightly differently. The big difference is of course that adjustl and adjustr are functions used when working with fixed-length (or already allocated) strings, and the result length is determined implicitly from the actual string size. They don't offer the possibility to modify the length of the resulting string (used as an expression or before assignment occurs).

Can we also divide trim into 2: triml and trimr?: but then it would become inconsistent with trim intrinsic ultimately we will have to rename the function.

Regarding trim, you can already achieve trimming on both sides:

trim(string): right trim
trim(adjustl(string)): left trim

One way to condense the interfaces would be to extend trim with an optional argument:

function trim(string,left) result(res)
  character(len=*), intent(in) :: string
  logical, intent(in), optional :: left ! .false. by default
  integer, parameter :: reslen = len_trim(adjustl(string))
  character(len=reslen) :: res

Then we would would have:

trim(string): right trim
trim(string,left=.true.): left trim

which actually takes three more characters than the currently available method and also does not read sequentially compared to what really happens. If you leave out the keyword like trim(string,.true.) to make it shorter it is just confusing for the person reading the code. Separate functions (i.e. triml/trimr) are a better solution then. For the time being I would just avoid trim until someone opens a dedicated issue about it.

aman-godara · 2021-06-21T13:19:19Z

I wasn't able to explain myself in the best possible manner, excuse me for that.

By truncate I was referring to the case when a user gives something like this:

padl("    This string is of length 34   ", width = 20)

Then the output " This string is of length 34 " (length 34) is better than the output " This string is o" (truncated output of asked length 20) for pad function. If one decides to return latter then it shouldn't be named as pad IMO.

If we run the padl function as implemented in this comment on this same input, output will be " This string is of length 34" (length 30) which is also performing a truncation but without any loss of information.

ivan-pi · 2021-06-21T13:29:17Z

Yes, I believe we should not do a "hard truncation", but rather follow the behavior in other languages:

Julia:

Stringify s and pad the resulting string on the left with p to make it n characters (code points) long. If s is already n characters long, an equal string is returned. Pad with spaces by default."
MATLAB

pad(str,numberOfCharacters) adds space characters so the strings in newStr have the length specified by numberOfCharacters. If any strings in str have more characters than numberOfCharacters, then pad does not modify them.

ivan-pi · 2021-07-23T10:15:04Z

@milancurcic, now that #441 added padl should we still pursue a specific function for zero-filling integers or should we settle on a general format function like #444 that requires parsing a format string?

Using the currently available tools a string like "000123" could be created using:

character(len=6) :: zstr
zstr = padl(to_string(123),6,'0')

compared with the intrinsic internal I/O

write(zstr,'(I0.6)') 123

Of course the major difference is the stdlib functions can be inlined, e.g. unit = open("my_file"//padl(...)), freeing the programmer from the need to declare a character buffer.

Personally, I'd still be interested in having a zfill function like Python does.

ivan-pi added easy Difficulty level is easy and good for starting into this project good first issue Good for newcomers idea Proposition of an idea and opening an issue to discuss it labels Apr 26, 2021

aman-godara mentioned this issue Jun 22, 2021

implemented pad function #441

Merged

4 tasks

awvwgk added the topic: strings String processing label Sep 18, 2021

This comment was marked as spam.

Sign in to view

ecasglez mentioned this issue Dec 22, 2022

New zfill function to left-pad a string with zeros #689

Merged

14NGiestas linked a pull request Dec 22, 2022 that will close this issue

New zfill function to left-pad a string with zeros #689

Merged

milancurcic closed this as completed in #689 Mar 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Function to pad string with zeros #394

Function to pad string with zeros #394

ivan-pi commented Apr 23, 2021 •

edited

Loading

milancurcic commented Apr 23, 2021 •

edited

Loading

ivan-pi commented Apr 23, 2021

milancurcic commented Apr 23, 2021

ivan-pi commented Apr 23, 2021 •

edited

Loading

awvwgk commented Apr 23, 2021 •

edited

Loading

urbanjost commented Apr 24, 2021

ivan-pi commented Apr 26, 2021

ivan-pi commented Apr 26, 2021 •

edited

Loading

ivan-pi commented Apr 26, 2021 •

edited

Loading

aman-godara commented Jun 19, 2021 •

edited

Loading

ivan-pi commented Jun 21, 2021 •

edited

Loading

aman-godara commented Jun 21, 2021 •

edited

Loading

ivan-pi commented Jun 21, 2021

ivan-pi commented Jul 23, 2021 •

edited

Loading

This comment was marked as spam.

Function to pad string with zeros #394

Function to pad string with zeros #394

Comments

ivan-pi commented Apr 23, 2021 • edited Loading

milancurcic commented Apr 23, 2021 • edited Loading

ivan-pi commented Apr 23, 2021

milancurcic commented Apr 23, 2021

ivan-pi commented Apr 23, 2021 • edited Loading

awvwgk commented Apr 23, 2021 • edited Loading

urbanjost commented Apr 24, 2021

ivan-pi commented Apr 26, 2021

ivan-pi commented Apr 26, 2021 • edited Loading

ivan-pi commented Apr 26, 2021 • edited Loading

aman-godara commented Jun 19, 2021 • edited Loading

ivan-pi commented Jun 21, 2021 • edited Loading

aman-godara commented Jun 21, 2021 • edited Loading

ivan-pi commented Jun 21, 2021

ivan-pi commented Jul 23, 2021 • edited Loading

This comment was marked as spam.

ivan-pi commented Apr 23, 2021 •

edited

Loading

milancurcic commented Apr 23, 2021 •

edited

Loading

ivan-pi commented Apr 23, 2021 •

edited

Loading

awvwgk commented Apr 23, 2021 •

edited

Loading

ivan-pi commented Apr 26, 2021 •

edited

Loading

ivan-pi commented Apr 26, 2021 •

edited

Loading

aman-godara commented Jun 19, 2021 •

edited

Loading

ivan-pi commented Jun 21, 2021 •

edited

Loading

aman-godara commented Jun 21, 2021 •

edited

Loading

ivan-pi commented Jul 23, 2021 •

edited

Loading