Issue/Question about the output of `lstsq` #817

loiseaujc · 2024-05-16T12:30:50Z

Hej,

I'm really happy that the linalg module is on its way to being production-ready. I have already played a bit with the latest additions in v0.6.0 and observed something an undesirable (in my opinion) behaviour regarding the output of lstsq. Below is a MWE.

program main
  use iso_fortran_env, only: output_unit
  use stdlib_kinds, only: dp
  use stdlib_linalg, only: lstsq, solve_lstsq
  implicit none

  integer :: m, n
  real(dp), allocatable :: A(:, :), b(:), x(:), x_lstsq(:)

  ! Create the problem.
  m = 10 ; n = 5
  allocate(A(m, n)) ; call random_number(A)
  allocate(x(n)) ; call random_number(x)
  b = matmul(A, x)
 
  ! Solve the least-squares problem.
  x_lstsq = lstsq(A, b)

  write(output_unit, *) shape(A), shape(b), shape(x), shape(x_lstsq)
  write(output_unit, *) "Norm of the error :", norm2(x_lstsq(1:n)-x)

end program main

When running this code, the vector x_lstsq computed by lstsq is of size m rather than being of size n. I know that, by default, *gelsd takes A and b as input and overwrites the first n entries of b with the solution x. I believe this is quite misleading and can potentially cause issues if the user is directly using solve_lstsq. Calling solve_lstsq with x being an n-vector instead of an m-vector actually raises an error since the following test then fails

       if (lda<1 .or. n<1 .or. ldb<1 .or. ldb/=m .or. ldx/=m) then
          err0 = linalg_state_type(this,LINALG_VALUE_ERROR,'invalid sizes: a=',[lda,n], &
                                                           'b=',[ldb,nrhs],' x=',[ldx,nrhsx])
          call linalg_error_handling(err0,err)
          if (present(rank)) rank = 0
          return
       end if

in e.g. stdlib_linalg_s_solve_lstsq_one. Are there any reasons why we should keep the output vector as an m-vector rather than an n-vector ? From a mathematical and practical point of view, I would expect the following behaviour:

lstsq takes an m x n matrix A and an m-vector b as entry and returns an n-vector.
solve_lstsq takes as arguments m x n matrix A, an m-vector b and an n-vector x.

The text was updated successfully, but these errors were encountered:

jalvesz · 2024-05-16T16:25:39Z

Seems like the issue is here:

stdlib/src/stdlib_linalg_least_squares.fypp

Line 138 in 42182b0

allocate(x,mold=b)

x should not be allocated using mold=b but by extraction of size(A,dim=2), and if rank(b)>1 then also the size of b in the second dimension.

@perazz what are your thoughts?

perazz · 2024-05-16T17:40:48Z

Sorry I hadn't thought about this. *GELSD require that x is sized at least as b, because they take the rhs [n] on input and return x[m] on output.

I totally agree that this is misleading. So, there is no way to avoid an additional allocation, in order to return an appropriately sized x. The only thing we can do to save an allocation is maybe in the subroutine interface, to let the user provide a larger x, and only return the appropriate part in that case.

loiseaujc · 2024-05-16T19:05:22Z

Awesome. I'll try it out as soon as it's merged. Closing the issue. Thx.

perazz mentioned this issue May 16, 2024

lstsq: return correct array size #818

Merged

loiseaujc closed this as completed May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue/Question about the output of `lstsq` #817

Issue/Question about the output of `lstsq` #817

loiseaujc commented May 16, 2024

jalvesz commented May 16, 2024 •

edited

Loading

perazz commented May 16, 2024

loiseaujc commented May 16, 2024

Issue/Question about the output of lstsq #817

Issue/Question about the output of lstsq #817

Comments

loiseaujc commented May 16, 2024

jalvesz commented May 16, 2024 • edited Loading

perazz commented May 16, 2024

loiseaujc commented May 16, 2024

Issue/Question about the output of `lstsq` #817

Issue/Question about the output of `lstsq` #817

jalvesz commented May 16, 2024 •

edited

Loading