ENH - check `datafit + penalty` compatibility with solver #137

PABannier · 2022-12-10T18:03:11Z

A quick proof-of-concept of a function that checks if the combination (solver, datafit, penalty) is supported. Currently we have some edge cases where one can pass ProxNewton solver with L0_5 penalty without any error being raised.

Pros of this design: the validation rules are centralized and validating a 3-uple is a one-liner in glm_fit.
Cons: we have to update the rules as we enhance the capabilities of the solver.

All in all, I think it is very valuable to have more verbose errors when fitting estimators (e.g. Ali Rahimi initially passed a combination Quadratic, L2_3, ProxNewton which cannot be optimized at the moment of writing).

Closes #101
Closes #90
Closes #109

PABannier · 2022-12-10T18:07:20Z

With this PR, the errors are more verbose:

In [1]: from skglm.estimators import GeneralizedLinearEstimator
           from skglm.penalties import L0_5
           from skglm.datafits import Quadratic, Logistic
           from skglm.solvers import ProxNewton, AndersonCD
           import numpy as np

In [2]: X = np.random.normal(0, 1, (30, 50))
           y = np.random.normal(0, 1, (30,))

In [3]: clf = GeneralizedLinearEstimator(Quadratic(), L0_5(1.), ProxNewton())

In [4]: clf.fit(X, y)
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
Input In [4], in <cell line: 1>()
----> 1 clf.fit(X, y)

File ~/Documents/skglm/skglm/estimators.py:241, in GeneralizedLinearEstimator.fit(self, X, y)
    238 self.datafit = self.datafit if self.datafit else Quadratic()
    239 self.solver = self.solver if self.solver else AndersonCD()
--> 241 return _glm_fit(X, y, self, self.datafit, self.penalty, self.solver)

File ~/Documents/skglm/skglm/estimators.py:29, in _glm_fit(X, y, model, datafit, penalty, solver)
     27 is_classif = isinstance(datafit, (Logistic, QuadraticSVC))
     28 fit_intercept = solver.fit_intercept
---> 29 validate_solver(solver, datafit, penalty)
     31 if is_classif:
     32     check_classification_targets(y)

File ~/Documents/skglm/skglm/utils/dispatcher.py:21, in validate_solver(solver, datafit, penalty)
      6 """Ensure the solver is suited for the `datafit` + `penalty` problem.
      7
      8 Parameters
   (...)
     17     Penalty.
     18 """
     19 if (isinstance(solver, ProxNewton)
     20     and not set(("raw_grad", "raw_hessian")) <= set(dir(datafit))):
---> 21     raise Exception(
     22         f"ProwNewton cannot optimize {datafit.__class__.__name__}, since `raw_grad`"
     23         " and `raw_hessian` are not implemented.")
     24 if ("ws_strategy" in dir(solver) and solver.ws_strategy == "subdiff"
     25     and isinstance(penalty, (L0_5, L2_3))):
     26     raise Exception(
     27         "ws_strategy=`subdiff` is not available for Lp penalties (p < 1). "
     28         "Set ws_strategy to `fixpoint`.")

Exception: ProwNewton cannot optimize Quadratic, since `raw_grad` and `raw_hessian` are not implemented.

mathurinm · 2022-12-12T09:09:49Z

Looks nice @PABannier, this will definitely improve UX!

From an API point of view, shouldn't this check be delegated to each solver? This way we don't have one big function, but Solver.validate(datafit, penalty), in the spirit of what @Badr-MOUFAD implemented here : https://github.com/scikit-learn-contrib/skglm/blob/main/skglm/experimental/pdcd_ws.py#L201

Such functions could also take care of the initialization (e.g. stepsize computation) which is done on a solver basis. WDYT?

PABannier · 2023-01-07T12:02:32Z

@mathurinm Yes I think it's cleaner, currently refining the POC.

mathurinm · 2023-06-20T14:28:10Z

This would be a nice addition if we can ship it in the 0.3 release @Badr-MOUFAD , given that we added a few datafits, penalties and solvers !

mathurinm · 2023-10-06T13:32:41Z

@Badr-MOUFAD the issue popped up in #188, do you have time to take this over ? A simple check, at the beginning of each solver, that the datafit and penalty are supported (eg AndersonCD does not support Gamma datafit)

Badr-MOUFAD · 2023-10-18T08:34:42Z

@Badr-MOUFAD the issue popped up in #188, do you have time to take this over ? A simple check, at the beginning of each solver, that the datafit and penalty are supported (eg AndersonCD does not support Gamma datafit)

Sure, I will resume this PR.

…into solver_dispatcher

mathurinm · 2023-10-24T08:35:01Z

Requires #191 to be implemented fit to allow for better checks

…olver_dispatcher

skglm/solvers/gram_cd.py

skglm/solvers/prox_newton.py

mathurinm · 2024-05-30T16:26:24Z

I’m +1 with having _solve method and adding run_checks argument to solve method.
I feel it give us more freedom to standardize the behavior and make less verbose the solve method.
I don’t think the check have a big overhead, though I didn’t check that in practice

Let's perform checks all the time for now, this will simplify our lives. WDYT?

The support of sparse data is covered in check_obj_solver_attr function, so I don’t think it hurts us to cover it in this PR

OK

I have no strong opion about the names, I’m +1 with your proposed name @mathurinm

@QB3 any opinion?

…atcher

Co-authored-by: mathurinm <[email protected]>

Badr-MOUFAD

LGTM, thanks for everyone 💪

We should be careful in internalizing the jit-compilation of dataffits and penalties as jit-compilation undoes the datafit initialization.

I believe this PR brings several contributions to the API and touches many parts of the codebase. we better merge it and tackle the aforementioned issue in that in a separate PR.

initial commit

0154215

PABannier changed the title ~~POC Add validation logic passing datafit, penalty and solver~~ POC Add validation logic when passing datafit, penalty and solver to _glm_fit Dec 10, 2022

PABannier marked this pull request as draft December 11, 2022 19:02

PABannier added 2 commits January 7, 2023 16:43

delegated to solve

3c70b85

call solver.validate

ee0d29c

mathurinm mentioned this pull request Oct 6, 2023

ENH - implement intercept update for Poisson and Gamma datafits #189

Merged

Badr-MOUFAD closed this in #189 Oct 18, 2023

mathurinm reopened this Oct 18, 2023

Badr-MOUFAD added 11 commits October 18, 2023 10:48

Merge branch 'main' of https://github.com/scikit-learn-contrib/skglm …

8831cd2

…into solver_dispatcher

add validate method to solvers

9abd149

implem validation logic

d20229a

implem attribute validation for solvers

310c572

validation PDCD_WS

1872e4e

fix trailing spaces

cd1ba1c

add docs to check_obj_solver_compatibility

4c9b4d4

add validation glm_fit

c3d01c4

fix Error logs

eacea14

fix prox solvers attribute names

573cb78

add initialize to required attributes

8dedf18

Badr-MOUFAD changed the title ~~POC Add validation logic when passing datafit, penalty and solver to _glm_fit~~ ENH - check datafit + penalty compatibility with solver Oct 18, 2023

Badr-MOUFAD added 2 commits October 18, 2023 15:18

add change to what's new

19fbe63

formatting & Fista validation

3b65445

Badr-MOUFAD requested a review from mathurinm October 18, 2023 13:47

Merge branch 'solver_dispatcher' of github.com:PABannier/skglm into s…

49661d7

…olver_dispatcher

mathurinm reviewed May 30, 2024

View reviewed changes

skglm/solvers/gram_cd.py Outdated Show resolved Hide resolved

mathurinm reviewed May 30, 2024

View reviewed changes

skglm/solvers/gram_cd.py Outdated Show resolved Hide resolved

change version because this is a large change

f3fae3e

mathurinm reviewed May 30, 2024

View reviewed changes

skglm/solvers/prox_newton.py Outdated Show resolved Hide resolved

Merge branch 'main' into solver_dispatcher

ccd7cd7

Badr-MOUFAD added Work In Progress and removed Ready for review labels Jul 14, 2024

Badr-MOUFAD and others added 9 commits July 14, 2024 22:19

Merge remote-tracking branch 'PAB/solver_dispatcher' into solver_disp…

bb15aea

…atcher

Update skglm/solvers/gram_cd.py

c484aa4

Co-authored-by: mathurinm <[email protected]>

more on remarks

3b19672

check_obj_solver_attr ---> check_attrs

352aa3f

implement _solve and solve

6c3ddbc

forgotten PDCD_WS solver

de1f9ae

fix GramCD checks

1c1b258

linter happy & fix validation

bf19944

cleanups and comments

290879b

PABannier marked this pull request as ready for review July 15, 2024 08:18

Badr-MOUFAD added 7 commits July 15, 2024 10:32

more on docs

4f5781e

custom_compatibility_check ---> custom_checks

83f10d5

rm unimplemented methods

e2bcdcc

correct error message

3448e52

more on validation unit tests

b7b4627

update what's new

d626e36

handle subdiff_distance in custom checks

a02262b

Badr-MOUFAD approved these changes Jul 15, 2024

View reviewed changes

Badr-MOUFAD removed the Work In Progress label Jul 15, 2024

Badr-MOUFAD merged commit ed7bf2d into scikit-learn-contrib:main Jul 15, 2024
4 checks passed

Badr-MOUFAD mentioned this pull request Jul 15, 2024

ENH -jit-compile datafits and penalties inside solver #270

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH - check `datafit + penalty` compatibility with solver #137

ENH - check `datafit + penalty` compatibility with solver #137

PABannier commented Dec 10, 2022 •

edited

Loading

PABannier commented Dec 10, 2022 •

edited

Loading

mathurinm commented Dec 12, 2022

PABannier commented Jan 7, 2023

mathurinm commented Jun 20, 2023

mathurinm commented Oct 6, 2023

Badr-MOUFAD commented Oct 18, 2023

mathurinm commented Oct 24, 2023

mathurinm commented May 30, 2024 •

edited by Badr-MOUFAD

Loading

Badr-MOUFAD left a comment

ENH - check datafit + penalty compatibility with solver #137

ENH - check datafit + penalty compatibility with solver #137

Conversation

PABannier commented Dec 10, 2022 • edited Loading

PABannier commented Dec 10, 2022 • edited Loading

mathurinm commented Dec 12, 2022

PABannier commented Jan 7, 2023

mathurinm commented Jun 20, 2023

mathurinm commented Oct 6, 2023

Badr-MOUFAD commented Oct 18, 2023

mathurinm commented Oct 24, 2023

mathurinm commented May 30, 2024 • edited by Badr-MOUFAD Loading

Badr-MOUFAD left a comment

Choose a reason for hiding this comment

ENH - check `datafit + penalty` compatibility with solver #137

ENH - check `datafit + penalty` compatibility with solver #137

PABannier commented Dec 10, 2022 •

edited

Loading

PABannier commented Dec 10, 2022 •

edited

Loading

mathurinm commented May 30, 2024 •

edited by Badr-MOUFAD

Loading