Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue/Question about the output of lstsq #817

Closed
loiseaujc opened this issue May 16, 2024 · 3 comments · Fixed by #818
Closed

Issue/Question about the output of lstsq #817

loiseaujc opened this issue May 16, 2024 · 3 comments · Fixed by #818

Comments

@loiseaujc
Copy link

Hej,

I'm really happy that the linalg module is on its way to being production-ready. I have already played a bit with the latest additions in v0.6.0 and observed something an undesirable (in my opinion) behaviour regarding the output of lstsq. Below is a MWE.

program main
  use iso_fortran_env, only: output_unit
  use stdlib_kinds, only: dp
  use stdlib_linalg, only: lstsq, solve_lstsq
  implicit none

  integer :: m, n
  real(dp), allocatable :: A(:, :), b(:), x(:), x_lstsq(:)

  ! Create the problem.
  m = 10 ; n = 5
  allocate(A(m, n)) ; call random_number(A)
  allocate(x(n)) ; call random_number(x)
  b = matmul(A, x)
 
  ! Solve the least-squares problem.
  x_lstsq = lstsq(A, b)

  write(output_unit, *) shape(A), shape(b), shape(x), shape(x_lstsq)
  write(output_unit, *) "Norm of the error :", norm2(x_lstsq(1:n)-x)

end program main

When running this code, the vector x_lstsq computed by lstsq is of size m rather than being of size n. I know that, by default, *gelsd takes A and b as input and overwrites the first n entries of b with the solution x. I believe this is quite misleading and can potentially cause issues if the user is directly using solve_lstsq. Calling solve_lstsq with x being an n-vector instead of an m-vector actually raises an error since the following test then fails

       if (lda<1 .or. n<1 .or. ldb<1 .or. ldb/=m .or. ldx/=m) then
          err0 = linalg_state_type(this,LINALG_VALUE_ERROR,'invalid sizes: a=',[lda,n], &
                                                           'b=',[ldb,nrhs],' x=',[ldx,nrhsx])
          call linalg_error_handling(err0,err)
          if (present(rank)) rank = 0
          return
       end if

in e.g. stdlib_linalg_s_solve_lstsq_one. Are there any reasons why we should keep the output vector as an m-vector rather than an n-vector ? From a mathematical and practical point of view, I would expect the following behaviour:

  • lstsq takes an m x n matrix A and an m-vector b as entry and returns an n-vector.
  • solve_lstsq takes as arguments m x n matrix A, an m-vector b and an n-vector x.
@jalvesz
Copy link
Contributor

jalvesz commented May 16, 2024

Seems like the issue is here:

allocate(x,mold=b)

x should not be allocated using mold=b but by extraction of size(A,dim=2), and if rank(b)>1 then also the size of b in the second dimension.

@perazz what are your thoughts?

@perazz
Copy link
Member

perazz commented May 16, 2024

Sorry I hadn't thought about this. *GELSD require that x is sized at least as b, because they take the rhs [n] on input and return x[m] on output.

I totally agree that this is misleading. So, there is no way to avoid an additional allocation, in order to return an appropriately sized x. The only thing we can do to save an allocation is maybe in the subroutine interface, to let the user provide a larger x, and only return the appropriate part in that case.

@loiseaujc
Copy link
Author

Awesome. I'll try it out as soon as it's merged. Closing the issue. Thx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants