Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for quadrature #112

Closed
nshaffer opened this issue Jan 18, 2020 · 18 comments
Closed

Proposal for quadrature #112

nshaffer opened this issue Jan 18, 2020 · 18 comments
Labels
implementation Implementation in experimental and submission of a PR topic: mathematics linear algebra, sparse matrices, special functions, FFT, random numbers, statistics, ...

Comments

@nshaffer
Copy link
Contributor

Overview

Let's get numerical integration into stdlib. As a baseline, here's what I'd want in a general-purpose integration module.

  • Interfaces to QUADPACK routines, both simple wrappers and a more modern interface a la SciPy's scipy.integrate.quad.
  • Routines to integrate arrays as if they are tabulated function values. At a minimum, we'd want trapezoid rule and Simpson's rule.
  • Some niche or less well-known rules that are still useful. Examples might be endpoint-corrected discrete Fourier transforms (see Numerical Recipes Sec. 13.9) and double-exponential rules.

The idea would be to provide easy-to-use and reasonably general numerical integration routines that cover 95% of what people need. Some things which are out of scope in my opinion:

  • ODE integration
  • Monte Carlo integration
  • Basis representations of abstract integral operators

I envision having two modules: one being a light f90 wrapper for QUADPACK which expert users can use, the other being the main module with a nicer interface to QUADPACK's functionality, as well as array-integrating routines.

Integrating functions

I've already written wrappers and tests for "non-expert" QUADPACK routines (link). All these do is create explicit interfaces and give names to magic numbers, e.g., error codes and integrand weight selectors. It will need a little refactoring to port to stdlib, but I am pretty confident that my module quadpack_interface uses all the right types, kinds, and intents to agree with the F77 routines. No need to duplicate that effort here.

From a modern programmer's perspective, QUADPACK has some unattractive qualities:

  • Explicitly passed workspace arrays with sizes requirements that are documented but not enforced in code. These mattered in the days of tight memory constraints, but nowadays, we can relieve the caller of this burden.
  • Out-parameters that are not usually useful should be made optional.
  • Technical-detail in-parameters should be made optional with sensible default values.
  • The integrand function must be a scalar function of one variable. This can make it clumsy to integrate functions with additional parameters, and doing multiple integrals by nested QUADPACK calls gets cumbersome.
  • Routines for weighted integrands can be greatly simplified and clarified. For instance, qaws implements weight functions that remove algebraic/logarithmic singularities at the integration endpoints. The type of singularity is controlled by an integer argument, whose value affects the meaning of other arguments.

Solving these issues should be the motivating goal of a module that delivers high-level integration functions that call down to the appropriate QUADPACK subroutines, internally handling the magic numbers, workspace array allocation, and translating return codes to proper runtime errors if necessary. I have some design sketches that I'll talk about in a later post.

Integrating arrays

Integrating arrays is a simpler task. We just need to decide how featureful we want to be with this. For reference, NumPy/SciPy provide

  • numpy.trapz: trapezoidal rule with either equidistant or arbitrary abscissas.
  • scipy.simps: Simpson's rule with either equidistant or arbitrary abscissas.
  • scipy.romb: Romberg integration, which is restricted to equidistant abscissas whose length is 2**k + 1, e.g., 1, 3, 5, 9, 17...
  • scipy.cumtrapz: cumulative trapezoidal rule with optional initial value

We should have all these, except maybe Romberg integration. I don't think I have ever used it "for real", but maybe other people can make a case for keeping it. To this basic set of routines, I'd also like to add companion functions to trapz and simps that return the weights for a particular array of abscissas. This is useful for integrating many different arrays that represent functions tabulated on a common grid. Once you have the weights from, e.g., w = trapz_weights(x), then computing the integral is as simple as integral = dot_product(y, w).

Details

  • Are there any licensing show-stoppers with QUADPACK? You always wonder with these Netlib libraries.
  • We need to decide (elsewhere) what conventions to adopt for wrapping F77 codes. Relevant aspects here are
    • translating real and double precision to appropriate real kinds
    • explicit interfaces for external procedure
    • managing workspace arrays (Module-level or subprogram-level? With or without save?)
@certik
Copy link
Member

certik commented Jan 19, 2020

I would suggest to concentrate on the API. We can use QUADPACK as an implementation, or we can use something else; either way I would consider that secondary. The API however, is essential.

Here is an implementation of Gaussian quadrature that I have been using in all my codes: https://github.com/certik/hfsolver/blob/b4c50c1979fb7e468b1852b144ba756f5a51788d/src/quadrature.f90, it returns the points x and weights w, and the user does the actual integration using sum(f(x)*w).

@ivan-pi
Copy link
Member

ivan-pi commented Jan 20, 2020

Are there any licensing show-stoppers with QUADPACK? You always wonder with these Netlib libraries.

Scipy is using the QUADPACK routines and is itself under the BSD license, so it should be possible.
https://github.com/scipy/scipy/tree/v1.4.1/scipy/integrate/quadpack

There is a great Python package with tons of references for different quadratures: https://github.com/nschloe/quadpy

For generating classic Gaussian quadratures there are a lot of legacy codes available, unfortunately none of them have permissive licenses:

Given the large palette of Gaussian quadratures perhaps they would fit better into a separate package, but simple rules like the trapezoidal and Simpson methods could be part of stdlib.

@nshaffer
Copy link
Contributor Author

nshaffer commented Jan 20, 2020

@ivan-pi Thanks for the research! I am in the process of writing up my "vision" for this module, but I want to respond to one specific point. I think the huge variety of existing implementations should not stop us from defining canonical interfaces. For instance, we should not provide routines that are explicitly Gauss-Kronrod integrators or Clenshaw-Curtis integrators, or Bill-and-Ted integrators. We should have routines for "integrate f(x) from a to b, with both a and b finite" and "integrate w(x)*f(x) from a to infinity with w(x)=sin(c*x)". Then specific implementations can decide how to map the different cases to actual quadrature rules. Obviously, we will have to provide reference implementations, but that's OK.

On the other hand, I agree that fancy quadrature routines are a good candidate for splitting off into a separate package once stdlib matures. At that point, we can open the doors to specializations like "clenshaw_curtis_quadrature.f90" and "gauss_kronrod_quadrature.f90".

@nshaffer
Copy link
Contributor Author

Here's a first cut at some interfaces.

Trapezoidal rule for arrays

Integrate arrays with trapezoidal rule, either with implicit abscissas (assumed equally spaced) or explicit abscissas

interface trapz
    module procedure trapz_dx
    module procedure trapz_x
end interface trapz

...

contains

...

function trapz_dx(y, dx) result(integral)
    real, dimension(:), intent(in) :: y
    real, intent(in) :: dx
    real :: integral
    ...
end function trapz_dx

function trapz_x(y, x) result(integral)
    real, dimension(:), intent(in) :: y
    real, dimension(size(y)), intent(in) :: x
    real :: integral
    ...
end function trapz_x

Trapezoidal weights for given abscissas

function trapz_weights(x) result(w)
    real, dimension(:), intent(in) :: x
    real, dimension(size(x)) :: w
    ...
end function trapz_weights

Simpson's rule for arrays

Integrating arrays with Simpson's rule should look the same as the trapezoidal rule. The only catch is the need to do something for the case of even-length arrays. There are three approaches to choose from:

  1. Disallow even-length arrays (throw a runtime error).
  2. Do Simpson's rule for an odd-length section of the array and then patch on a different rule for the remainder (e.g, Simpson's 3/8 rule).
  3. Use an alternative O(h⁴) rule on the whole array.

Option 1 is the simplest but also most restrictive. Option 2 is the most flexible, but a good API probably requires that all the simps_* routines take a third optional argument that gives the user some control over how the patching occurs. Option 3 is simple but may not count as "Simpson's rule" from a purist's perspective.

Adaptive integration of functions

Adaptive integration of functions can be implemented in many, many ways. This is where it's most useful to specify a fairly general API and let specific implementations decide what algorithms to use. I think adaptive 1-D integration should let users request

  • infinite upper and/or lower integration bounds
  • weight functions
  • interior breakpoints
  • absolute and/or relative error tolerance
  • recursion limit, or some equivalent way to let the integrator "bail out" of a badly behaved integral
  • estimate of the error in computed integral
  • diagnostic information (e.g., a return code and and/or a message)

An actual implementation does not need to honor all requests literally. For instance, if an integrator detects that the integrand decays rapidly as x->infinity, it should be allowed to truncate at a "big enough" x that it thinks will satisfy the requested tolerance. Or, if the user specifies a weight function, it should not be required that the implementation actually use that information in deciding what specific quadrature rule to use.

Here are some examples of what I think adaptive integration calls should look like (not being careful about real kinds for now)

! Integrate f(x) from x=-1 to 1 with given absolute and relative error tolerance
integral = quad(f, -1.0, 1.0, abstol=1e-12, reltol=1e-6)

! Integrate f(x) from x=5 to x=12, with breakpoints at x=6.0 and 9.2
integral = quad(f, 5.0, 12.0, points=[6.0, 9.2])

! Integrate f(x) from x=0 to infinity with weight function sin(8x)
integral = quad(f, 0.0, +inf, weight=sin_weight(8.0))

! Integrate f(x) from x=-infinity to infinity with weight function 1/(x-4),
! putting an error bound in the variable "delta"
integral = quad(f, -inf, +inf, weight=pole_weight(4.0), errbnd=delta)

In all cases, f is a function with the interface

function f(x)
    real, intent(in) :: x
    real :: f
end function

This is the simplest thing to start with and can be changed depending on the findings in #117.

There are two further non-obvious things illustrated: weights and infinite integration limits.

Weighted integrands

I suggest that weights be specified as instances of class(weight_t), whose extensions allow for parameterization of the weight. For instance, the definition of sin_weight might be

type, extends(weight_t) :: sin_weight(k)
    ! w(x) = sin(omega*x)
    integer, kind :: k
    real(k) :: omega
end type sin_weight

and the definition of pole_weight might be

type, extends(weight_t) :: pole_weight(k)
    ! w(x) = 1/(x-c)
    integer, kind :: k
    real(k) :: c
end type pole_weight

There are lots of other weight functions one could consider including, e.g., the weights corresponding to the classical orthogonal polynomials. Making each one a distinct type makes it easier to implement weight-specific quadrature rules.

Infinite integration bounds

"Infinity" is a non-standard concept outside the IEEE modules. I opened some discussion on different mechanisms for representing infinite values in #118. Some consensus needs to be reached there before we can decide what do to for (semi-)infinite integration domains here.

Non-adaptive integration of functions

The approach @certik shared for non-adaptive quadrature looks good to me. The strategy of returning the abscissas and weights make a lot of sense. Not only does it avoid recalculation for the case of doing many similar integrals, but it also allows more flexibility in the integrand function. I think we should also offer corresponding all-in-one functions that generate the weights and abscissas internally, akin to having both trapz and trapz_weights.

One case that doesn't work well with the "explicit summation" approach is when the caller wants a rule for infinite or semi-infinite domains. This is because there is no standard way to refer to "infinity" (again, see #118).

@certik
Copy link
Member

certik commented Jan 27, 2020

I think generally this API looks good to me.

Regarding your last point, I agree we should do both --- returning the points and weights as well as have a higher level API that hides it from the user and instead requests the user to provide a callback.

Regarding returning points and weights for an infinite domain, I've used exactly the same API as for a finite domain. Here is how it works:

https://github.com/certik/hfsolver/blob/b4c50c1979fb7e468b1852b144ba756f5a51788d/src/quad_laguerre.f90#L11

I only implemented a few orders. It returns the points on an infinite interval [0, oo).

@fiolj
Copy link
Contributor

fiolj commented Jan 27, 2020

Hi, I've just come across this proposal. I've been working on trying to produce something similar with some of the methods I've been in contact. My idea is/was to make an interface similar to numpy/scipy in a modern fortran, with good documentation and examples. The idea is that the project is open and driven by a community, but at this point is just starting with my vision of it. So, If possible, I'd rather merge/contribute in this ongoing project with a shared vision.

Currently, for integration I implemented:

  • A version of trapz, simps for functions and data-values, with a single interface.
  • A version of integrators using quadpack routines.
  • A version of tanh-sinh (double exponential) integration
  • A version of simpson adaptive integrator
    All integrators, work for real and complex functions, for functions f(x) and f(x,args) (a single interface)

The project is at https://github.com/numericfor/numfor with documentation at https://numericfor.github.io/numfor/

Of course, working alone, I've made some decision about some of the topics that are being discussed here (API, treatment of infinite, etc).
If something of it is useful, I'd be glad to help to integrate it to stdlib and work with the community to adapt it to a consensus API.
Regards,
Juan

@certik
Copy link
Member

certik commented Jan 27, 2020

Hi @fiolj, welcome! Yes, that would be awesome if you could contribute. Indeed, the main value of stdlib is that we agree on the API as a community. The actual implementation is straightforward, as we have all done that already or can relatively easily do.

If you want to help, go ahead and help us reach an agreement regarding APIs and feel free to propose everything that you already implemented. You can follow the WORKFLOW.

@fiolj
Copy link
Contributor

fiolj commented Jan 28, 2020

Thanks @certik, I agree with all points by @nshaffer, the above API seems sensible.

I would also add that may be useful to integrate not only real but also complex functions

function f(x)
  real, intent(in) :: x
  complex :: f
end function

Other decisions (infinite limits, f(x, *args)) depend on other issues.

Regarding the implementation, besides wrapping, I've been doing some refactoring of the QUADPACK routines. Probably there are some things to modify but I could help to put them in the preferred way starting from the version already in
https://github.com/numericfor/numfor/tree/master/src/integrate/quadpack

@certik
Copy link
Member

certik commented Jan 28, 2020

@fiolj thank you. Looking forward!

@nshaffer it looks like you will get an agreement on the API, so it seems the next step is to create an implementation and then we can discuss the details once you send a PR.

@nshaffer
Copy link
Contributor Author

@fiolj That is an impressive effort! I don't have the time right now to look carefully through the repo, but it looks like there are good ideas in there I haven't considered yet. Is it the case that the real "meat" of your computational routines lives inside cpp include files? Do you think those include files will be easy to reuse with a different preprocessor, or are they written in a very cpp-reliant way?

@certik I will get the ball rolling with a PR that includes the simplest things. For features we're still working out (e.g., adaptive quadrature), is it OK to have stub functions that just illustrate the proposed API?

@certik
Copy link
Member

certik commented Jan 29, 2020

For features we're still working out (e.g., adaptive quadrature), is it OK to have stub functions that just illustrate the proposed API?

I think so. It's about getting the ball rolling.

@fiolj
Copy link
Contributor

fiolj commented Jan 29, 2020

@nshaffer I was thinking on starting a project just like this...
I think that we can use my reformatted quadpack routines as a starting point if we agree to it. I didn't want to depend on a different program and just used cpp but using fypp will make the implementation really much easier, and I think it will be quite straightforward the change. Though I haven't tried to use it yet.
It would be good if you can start with the proposal for the API

@fiolj
Copy link
Contributor

fiolj commented Feb 4, 2020

Sorry about the noise, but rethinking the API for trapz (and may be simps) may be it would make sense to implement those functions for arrays of rank larger than one, like Numpy trapz.
Looking to the implementation of stats_mean, I think we could do something similar.
@nshaffer is working in an implementation, thoughts on this?

I was thinking that may be we could use something like this for the interface:

#:include "common.fypp"
#:set RANKS = range(1, MAXRANK + 1)
#:set RC_KINDS_TYPES = REAL_KINDS_TYPES + CMPLX_KINDS_TYPES

    interface trapz
    #:for k1, t1 in RC_KINDS_TYPES
      #:for rank in RANKS
        #:set RName = rname("trapz_dx",rank, t1, k1)
        pure module function ${RName}$(y, dx) result(res)
            ${t1}$, intent(in) :: y${ranksuffix(rank)}$
            real(${k1}$), intent(in) :: dx
            ${t1}$ :: res${reduced_shape('x', rank, 'dim')}$
        end function ${RName}$
      #:endfor
    #:endfor

    end interface trapz

end module stdlib_experimental_quadrature

and, similarly for the implementation:

#:set RANKS = range(1, MAXRANK + 1)
#:set RC_KINDS_TYPES = REAL_KINDS_TYPES + CMPLX_KINDS_TYPES

  #:for k1, t1 in RC_KINDS_TYPES
    #:for rank in RANKS
      #:set RName = rname("trapz_dx",rank, t1, k1)
      pure module function ${RName}$(y, dx, dim) result(res)
          ${t1}$, dimension(:), intent(in) :: y
          real(${k1}$), intent(in) :: dx
          ${t1}$ :: res${reduced_shape('y', rank, 'dim')}$
          integer :: L
          L = size(y)
          if (dim >= 1 .and. dim <= ${rank}$) then
              res = dx*(sum(y(2:L-1), dim) + 0.5_${k1}$*(y(1) + y(L)))
          else
              call error_stop("ERROR (trapz): wrong dimension")
          end if       
      end function ${RName}$
    #:endfor
  #:endfor

  #:for k1, t1 in RC_KINDS_TYPES
    #:for rank in RANKS
      #:set RName = rname("trapz_all_dx",rank, t1, k1)
      pure module function ${RName}$(y, dx) result(res)
          ${t1}$, dimension(:), intent(in) :: y
          real(${k1}$), intent(in) :: dx
          ${t1}$ :: res${reduced_shape('y', rank, 'dim')}$
          integer :: L
          L = size(y)
          res = dx*(sum(y(2:L-1)) + 0.5_${k1}$*(y(1) + y(L)))
      end function ${RName}$
    #:endfor
  #:endfor

@nshaffer
Copy link
Contributor Author

nshaffer commented Feb 4, 2020

@fiolj I've been thinking about that too. I support trapz (and simps) being rank-generic. In turn, trapz_weights should also be rank-generic, I would think. The use case is obvious.

In general, though, I think we should be judicious about which routines we make rank-generic, at least for now. Implementing generic ranks makes the source very difficult to read. One also has a great many more tests to write (in principle).

We are still in early stages of developing, trying different APIs and such. I think it is not a good use of our time and effort to add generic ranks to all routines that could conceivably take in a multi-dimensional array (but which cannot be made elemental). Again, for trapz and friends, the use case is obvious.

Even so, in my initial PR, I intend to only show the 1-d API. The code will be easy to read and review. With that in place, the subsequent PR (yours, if you like) to implement generic ranks will be easier to understand compared with doing it all in one shot.

@fiolj
Copy link
Contributor

fiolj commented Feb 5, 2020

Even so, in my initial PR, I intend to only show the 1-d API. The code will be easy to read and review.

I agree with that strategy. I just didn't think on doing trapz and simps rank-generic at the time we discussed the interface, and wanted to open it for discussion.

With that in place, the subsequent PR (yours, if you like) to implement generic ranks will be easier to understand compared with doing it all in one shot.

I still am not sure what is the best strategy for collaboration on medium-sizes branches like this. I don't know if it is possilbe/desirable to update two forks in parallell and make intermediate PRs between those, or just make several PRs with small pieces of code to the main repository.

@nshaffer
Copy link
Contributor Author

nshaffer commented Feb 7, 2020

Thinking on it some more, there are some API questions we need to resolve before implementing generic kinds for trapz and such. Before getting into the details, I am thinking that the most general API for, say, trapz would be

result = trapz(y, dx, dim [, mask])
result = trapz(y, dx [, mask])
result = trapz(y, x, dim [, mask])
result = trapz(y, x [, mask])

  • If dim is present, the rank of the result is one less than the rank of y.
  • The argument dx is either a scalar or, if rank(y) >= 2, then dx may instead be a 1-d array with size(dx) == rank(y).
  • The argument x has the same shape as y.

With that established, here are two questions I'd like opinions on:

  • When dim is absent what should be the result? Most intrisic reductions (sum, maxval, etc.) produce a scalar when no dim is specified. In the context of integration, that means performing a multi-dimensional integral. To me this is the natural thing to do, but I am open to other ideas.
  • What should mask do? In the intrinsic reduction functions, it's obvious. Here, I can think of three reasonable approaches:
    1. Treat elements of y where the mask is .false. as if they were zero. This parallels how sum behaves, but I do not think it would be useful.
    2. Use the mask to indicate the integration domain. For instance, if mask = [.true., .true., .false., .true., .true. true.], then trapz(y, x, mask) would be equivalent to trapz(y(1:2), x(1:2)) + trapz(y(4:6), x(4:6)). This is more useful, I think, but it has the potential to get complicated in higher dimensions.
    3. Don't support masked integration.

@fiolj
Copy link
Contributor

fiolj commented Feb 20, 2020

I like your API and implementation as it is. Regarding your two questions:

  • When dim is absent, Numpy integrates along the last dimension. In this case I like your idea of multidimensional integration and returning a scalar better.

  • On the use of mask:

What should mask do? In the intrinsic reduction functions, it's obvious. Here, I can think of three reasonable approaches:

1. Treat elements of `y` where the mask is `.false.` as if they were zero. This parallels how `sum` behaves, but I do not think it would be useful.

2. Use the mask to indicate the integration domain. For instance, if `mask = [.true., .true., .false., .true., .true. true.]`, then `trapz(y, x, mask)` would be equivalent to `trapz(y(1:2), x(1:2)) + trapz(y(4:6), x(4:6))`. This is more useful, I think, but it has the potential to get complicated in higher dimensions.

3. Don't support masked integration.

I think the combinations of option 2 are too many to make something universally useful and we, at least at this stage, keep it simple. I would not support masked integration now.

@jvdp1 jvdp1 added implementation Implementation in experimental and submission of a PR topic: mathematics linear algebra, sparse matrices, special functions, FFT, random numbers, statistics, ... labels Mar 1, 2020
@milancurcic
Copy link
Member

Closing this issue as implemented, with follow up active threads being #277 and #313.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
implementation Implementation in experimental and submission of a PR topic: mathematics linear algebra, sparse matrices, special functions, FFT, random numbers, statistics, ...
Projects
None yet
Development

No branches or pull requests

6 participants