Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for clamp #319

Closed
milancurcic opened this issue Feb 12, 2021 · 23 comments · Fixed by #355
Closed

Proposal for clamp #319

milancurcic opened this issue Feb 12, 2021 · 23 comments · Fixed by #355
Assignees
Labels
easy Difficulty level is easy and good for starting into this project good first issue Good for newcomers idea Proposition of an idea and opening an issue to discuss it

Comments

@milancurcic
Copy link
Member

milancurcic commented Feb 12, 2021

Intro

clamp(x, xmin, xmax) is a function that clamps (or, clips) an input scalar or array x so that the result is within given bounds xmin and xmax. For example:

res = clamp(0.9. 0.2, 0.8) ! res is 0.8
res = clamp(0.7. 0.2, 0.8) ! res is 0.7
res = clamp(0.1, 0.2, 0.8) ! res is 0.2

This function is common in standard libraries. I've used it occasionally in Fortran, and I use it often in Python. It provides an easy way to ensure that a variable with a finite (non-NaN, non-Inf) value stays within bounds.

Name

Python has numpy.clip, while C++, Julia, and Rust all have clamp. So clamp is more common. Both clamp and clip are equally meaningful to me. I have slight preference for clip because it's shorter and has a "sharper" sound to it.

What inspired me to open this proposal is reading that clamp was stabilized in Rust 1.50.0.

API

elemental real function clamp(x, xmin, xmax) result(res)
  real, intent(in) :: x ! input value to clip
  real, intent(in) :: xmin ! lower bound
  real, intent(in) :: xmax ! higher bound

Alternative names for the bounds could be low and high.

Implementation

There are a few options that I know of, each of which produce different assembly instructions, and performance depending on optimization levels, array sizes, and whether the clamping is in a loop or over a whole array in a single call.

A min/max clamp

This is the simplest implementation I could think of:

elemental real function clamp(x, xmin, xmax) result(res)
  real, intent(in) :: x, xmin, xmax
  res = min(max(x, xmin), xmax)
end function clamp

There are two intrinsic function calls here, but maybe a good compiler will optimize one away. I don't know.

An if/then/else clamp

elemental real function clamp(x, xmin, xmax) result(res)
  real, intent(in) :: x, xmin, xmax
  if (x < xmin) then
    res = xmin
  else if (x > xmax) then
    res = xmax
  else
    res = x
  end if
end function clamp

A branchless clamp

This avoids branching but introduces a floating-point error (~epsilon(x)) for some elements, even if x is not clipped.

elemental real function clamp(x, xmin, xmax) result(res)
  ! A branchless clamp.
  ! Credit: Mark Ransom (https://stackoverflow.com/a/7424900/827297)
  real, intent(in) :: x, xmin, xmax
  res = (x + xmin + abs(x - xmin)) / 2
  res = (res + xmax - abs(res - xmax)) / 2
end function clamp

What module would this go to?

I don't think we have an appropriate existing module yet, but stdlib_numeric or stdlib_math sound meaningful to me.

@milancurcic milancurcic added idea Proposition of an idea and opening an issue to discuss it easy Difficulty level is easy and good for starting into this project labels Feb 12, 2021
@ivan-pi
Copy link
Member

ivan-pi commented Feb 13, 2021

Do you think there will be much use for passing arrays to the xmin or xmax arguments?

If not, I would propose a fourth implementation option:

A where clamp

The procedures are auto-generated for each rank and type (real, double, ...)

interface clamp
  module procedure clamp_rank0
  module procudure clamp_rank1
  module procedure clamp_rank2
  ...
end interface
function clamp_rank0(x,xmin,xmax)
! ... use min max or ifthen/else version
end function

function clamp_rank1(x,xmin,xmax) result(res)
  real, intent(in) :: x(:)
  real, intent(in) :: xmin, xmax
  real :: res(size(x))
  where (x < xmin)
    res = xmin
  elsewhere (x > xmax)
    res = xmax
  elsewhere
   res = x
  end where
end function

The same logic applies to where/elsewhere logic applies to higher ranks.

@ivan-pi
Copy link
Member

ivan-pi commented Feb 13, 2021

Concerning name I also have a personal preference for clip. On the other hand consistency with C++, Julia, and Rust does have some weight. I would also vote in favor of stdlib_numeric vs stdlib_math.

@milancurcic
Copy link
Member Author

Is the usefulness of a where clamp to avoid it being elemental, like in #310 (comment)?

If yes, is it so that clamp could be passed as a procedure pointer? If yes, it makes me think--how do you do it with a generic? I thought that generic can't be used as a procedure pointer because you need to specify the procedure interface in the list of dummy arguments. Or are there other benefits to non-elemental procedures?

@milancurcic
Copy link
Member Author

milancurcic commented Feb 13, 2021

Do you think there will be much use for passing arrays to the xmin or xmax arguments?

I don't think much but maybe some. I used it a few times in Python for 2-d arrays to clamp only within a radius and to do a masked clamp. There are alternative ways to do that though.

But this could also be provided through specific functions that define xmin and xmax as arrays. So if we chose to do a generic implementation instead of elemental, then allowing xmin and xmax to be arrays would only double the number of specific functions.

@ivan-pi
Copy link
Member

ivan-pi commented Feb 13, 2021

Is the usefulness of a where clamp to avoid it being elemental, like in #310 (comment)?

No, in this context I was only playing with the idea of a generic but non-elemental interface to forbid usage such as:

res = clamp(0.1,[0.2,0.3,0.4],0.5)

which if the implementation were min(max(x, xmin), xmax) would return the 3-by-1 vector [0.2,0.3,0.4].

I haven't used the clamp function before to be able to say is it useful or not. Just throwing the concern out for discussion.

@milancurcic
Copy link
Member Author

I see, that didn't occur to me. I can't think of an use for it. I agree that we should somehow forbid it or at least discourage it. I see a few solutions:

  • Use the generics and the where clamp to do it at compile time (your suggestion)
  • Check that xmin and xmax are scalar at run-time and stop if they are not
  • Leave unchecked and document it in user docs

@ivan-pi
Copy link
Member

ivan-pi commented Feb 14, 2021

I would prefer to avoid run-time checking. This would make the procedures impure so I don't see any benefit. Having the routine return without warning also make no sense.

This leaves us with the first and third bullet points, meaning either pure procedures for different ranks with a generic interface, or the elemental version (either min/max or the if-then-else versions). Actually it seems like a great experiment to learn about potential performance differences. 👍

Am I right that according to the specified behavior positive Inf values are mapped to xmax, while -Inf is mapped to xmin? NaN values however are preserved?

@milancurcic
Copy link
Member Author

milancurcic commented Feb 14, 2021

I would prefer to avoid run-time checking. This would make the procedures impure so I don't see any benefit. Having the routine return without warning also make no sense.

Yes, I would too. We can error stop from a pure a procedure (F2018) but it'd still need a check which would hinder performance.

@milancurcic
Copy link
Member Author

My heart prefers the elemental approach (it's one of my favorite features of Fortran) but my head prefers the generic where clamp approach (compile-time enforcing of argument shape). Let's discuss it more broadly on the upcoming call.

@ivan-pi
Copy link
Member

ivan-pi commented Feb 15, 2021

My heart prefers the elemental approach (it's one of my favorite features of Fortran) but my head prefers the generic where clamp approach (compile-time enforcing of argument shape). Let's discuss it more broadly on the upcoming call.

I wonder if this is something the standard should address - being able to explicitly enforce that elemental affects only certain variables (perhaps through an attribute such as nonelemental or scalar).

@art-rasa
Copy link

Hello all,
I was just thinking if there would be any need for a procedure that would scale the value of an input parameter into an output value defined by min/max parameters. This could be used, for example, scaling an input scalar (or array) into a scaled value describing some physical quantity, such as temperature between defined limits (say -50 ℃ and 100 ℃).

@MarDiehl
Copy link
Contributor

MarDiehl commented Feb 25, 2021

My current implementation with optional upper and lower bounds and 'hint' to the user if lower>upper (inspired by numpy). Probably error stop would be better than NaN, but I'm not clear about performance implications.

real(pReal) pure elemental function math_clip(a, left, right)

  real(pReal), intent(in) :: a
  real(pReal), intent(in), optional :: left, right

  math_clip = a
  if (present(left))  math_clip = max(left,math_clip)
  if (present(right)) math_clip = min(right,math_clip)
  if (present(left) .and. present(right)) &
    math_clip = merge (IEEE_value(1.0_pReal,IEEE_quiet_NaN),math_clip, left>right

end function math_clip

@milancurcic
Copy link
Member Author

@art-rasa Yes, I think so, I've done this many times in the past (home cooked) and is very common in machine learning. Can you propose it in a separate issue? Ideally with an API, name suggestion, and example implementation.

@milancurcic
Copy link
Member Author

milancurcic commented Mar 3, 2021

After discussing the elemental vs. where-clamp on the monthly call, I'm now more convinced that the elemental approach is just fine. The thing is, a where-clamp is a not so pretty hack to constrain how the user can invoke the function, and we're assuming that no user will want to use it exactly in the way that we would be preventing. So I'd rather have a more liberal and elegant solution, and point this out to the user in the docs.

@arjenmarkus
Copy link
Member

arjenmarkus commented Mar 4, 2021 via email

@milancurcic milancurcic added the good first issue Good for newcomers label Mar 11, 2021
@aman-godara
Copy link
Member

aman-godara commented Mar 11, 2021

What if user doesn't know for sure that the 2nd argument (xmin) is <= 3rd argument (xmax) but knows that this is the clip range in which the output should be?
for example:
x = 9
xmin= 11
xmax = 10
answer expected by user = 10
I don't have specific example of an application where user might need to do it, but it occurred to me as a possibility.

@MarDiehl
Copy link
Contributor

What if user doesn't know for sure that the 2nd argument (xmin) is <= 3rd argument (xmax) but knows that this is the clip range in which he/she might the output to be?
for example:
x = 9
xmin= 11
xmax = 10
answer expected by user = 10
I don't have specific example of an application where user might need to do it, but it occurred to me as a possibility.

I would say that this is invalid use of the function, the user would simply need to ensure that this does not happen. It's not an uncommon situation that things are forbidden (square root of a negative real number, division by zero, exceeding the range of a certain datatype, ...).
To me, the more interesting question is how to handle this. Numpy does not check, probably for performance reasons. My implementation exits with error stop because I think that the time I spend on debugging is much more valuable then the time my computer spends on doing calculations.

@aman-godara
Copy link
Member

aman-godara commented Mar 12, 2021

OK, let's leave it on user to provide smaller number as xmin and larger number as xmax.

I liked the suggestions @MarDiehl has given above:

My current implementation with optional upper and lower bounds and 'hint' to the user if lower>upper

I think this provides user more enhanced feature in one function i.e. a user can give one of the two optional argument as None indicating that he/she doesn't want to use one of the two clips (xmin and xmax). If None is passed in both then it doesn't make any sense (we can raise a warning or return the input as the output)

@MarDiehl can you please explain what this line does?

if (present(left) .and. present(right)) &
math_clip = merge (IEEE_value(1.0_pReal,IEEE_quiet_NaN),math_clip, left>right

To me, the more interesting question is how to handle this.

I thought of doing it by making a comparator function (func1) which gives -1, 0 or 1 as the output.
If we multiply func1(x, xmin) with func1(x, xmax) and the answer is 1, then we know that the output will NOT be x.
Otherwise output will be x.
Please also let me know if you are thinking of any other implementation than this.
OR we can just set xmin as min(xmin, xmax) and xmax as max(xmin, xmax)

@milancurcic
Copy link
Member Author

To me, the more interesting question is how to handle this. Numpy does not check, probably for performance reasons. My implementation exits with error stop because I think that the time I spend on debugging is much more valuable then the time my computer spends on doing calculations.

I think it's important to have both--allow it to error stop in debug mode, and don't check in release mode. Tiny numerical functions like this should be HPC-useful. We can do this with a preprocessor.

@aman-godara
Copy link
Member

Hey!, how can I contribute now to get this issue closed?

@milancurcic
Copy link
Member Author

Okay, here are my suggestions:

  1. Name the function clip.
  2. This will go to a new module, so decide on a module name. I think it either stdlib_math or stdlib_numeric are appropriate, and I have slight preference for stdlib_math.
  3. clip should work on integers and reals of all kinds (int8, int16, int32, int64, real32, real64, real128), so there will be 7 specific functions, and one generic name. Specific functions will be generated by the fypp preprocessor. For example, take a look at stdlib_stats.fypp. Similarly, the source file for this module will be stdlib_math.fypp (instead of .f90).
  4. Pick an implementation, I prefer the min/max one.
  5. For now, just go for the simplest, smallest PR that implements this correctly. We can look at run-time checking in debug mode like @MarDiehl does in his implementation later.
  6. Have tests for both valid and invalid inputs.
  7. Implement the specification doc (docstrings in code), for one function it should be straightforward.
  8. Make sure the new module builds and tests out using both CMake and Makefile builds.

@aman-godara

This comment has been minimized.

@awvwgk
Copy link
Member

awvwgk commented Mar 20, 2021

@aman-godara Feel free to submit a patch for this issue.

@awvwgk awvwgk linked a pull request Mar 23, 2021 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
easy Difficulty level is easy and good for starting into this project good first issue Good for newcomers idea Proposition of an idea and opening an issue to discuss it
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants