input validation added to roc #230

yafimvo · 2023-01-22T10:44:50Z

Describe your changes

Input validation added to roc
Merged with the latest version (issue 146)
Removed the ROC testing image base to test_roc

plot.ROC.from_raw_data and plot.roc take 2 inputs: y_test and y_score in the following formats

`y_test`

"classes"

[0, 1, 2, 0, 1, ...]

['virginica', 'versicolor', 'virginica', 'setosa', ...]

"one-hot encoded classes"

[[0, 0, 1],
 [1, 0, 0]]

`y_score`

"scores"

[[0.1, 0.1, 0.8],
 [0.7, 0.15, 0.15]]

Issue ticket number and link

Closes #98

Checklist before requesting a review

I have performed a self-review of my code
I have added thorough tests (when necessary).
I have added the right documentation (when needed). Product update? If yes, write one line about this update.

📚 Documentation preview 📚: https://sklearn-evaluation--230.org.readthedocs.build/en/230/

Updated Validating Elbow Curve Input Model (ploomber#219) * added parameterized tests added to changelog better changelog message validating input model * conf.py edit * updated validation for elbow curve * conditional to check for python 3.7 linting conditional to check for python 3.7 * changed conditional * try-catch block, better skipif lint * Empty commit Empty commit * skipif failing fix * resolving name error * fixed conditional updates newsletter url changelog and docs updated

coveralls · 2023-01-22T10:50:22Z

Pull Request Test Coverage Report for Build 4005213428

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

56 of 73 (76.71%) changed or added relevant lines in 3 files are covered.
4 unchanged lines in 2 files lost coverage.
Overall coverage decreased (-0.2%) to 94.014%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/sklearn_evaluation/grid/random_forest_classifier_grid.py	6	10	60.0%
src/sklearn_evaluation/util.py	11	16	68.75%
src/sklearn_evaluation/plot/roc.py	39	47	82.98%

Files with Coverage Reduction	New Missed Lines	%
src/sklearn_evaluation/plot/roc.py	1	94.97%
src/sklearn_evaluation/grid/random_forest_classifier_grid.py	3	88.52%

Totals
Change from base Build 4001524496:	-0.2%
Covered Lines:	2780
Relevant Lines:	2957

💛 - Coveralls

idomic · 2023-01-24T15:25:28Z

@yafimvo please resolve conflicts

…into 98_roc_input_validation

edublancas

found two issues:

gives this error:

ValueError: classes [0 1] mismatch with the labels [0 1 2] found in the data

looks like it thinks it only has two classes, but it has three. but where is the 2 coming from?

to reproduce:

import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn_evaluation import plot
from sklearn import preprocessing

iris = load_iris()
X, y = iris.data, iris.target
y_ = iris.target_names[y]

random_state = np.random.RandomState(0)
n_samples, n_features = X.shape

X = np.concatenate([X, random_state.randn(n_samples, 200 * n_features)], axis=1)
(
    X_train,
    X_test,
    y_train,
    y_test,
) = train_test_split(X, y_, test_size=0.5, stratify=y, random_state=0)

classifier = LogisticRegression()
y_score = classifier.fit(X_train, y_train).predict_proba(X_test)

lb = preprocessing.LabelBinarizer()
lb.fit(y_test)
y_test_bin = lb.transform(y_test)

roc = plot.ROC.from_raw_data(y_test_bin, y_score)

also, when calling this (same code as above):

# try to break it by passing y_test_bin in y_score
roc = plot.ROC.from_raw_data(y_test, y_test_bin)

the error shows:

ValueError: Please check y_score values. 
Expected scores array-like. got: [[0 0 1]
 [0 1 0]
 [0 0 1]
 [1 0 0]
 [0 0 1]
 [1 0 0]
 [0 1 0]
 [1 0 0]
 [0 0 1]
 [0 1 0]
 [1 0 0]
[really long error]

i see that the error is displaying the whole input, however, it's very long. It'd be better to just display the first few characters:

probably something like this:

…into 98_roc_input_validation

yafimvo · 2023-01-25T12:10:25Z

@edublancas

In one of our examples we try to plot a roc curve using the DecisionTreeClassifier and as a result, y_score is in an invalid format

tree_score, forest_score = [
    est.fit(X_train, y_train).predict_proba(X_test)
    for est in [DecisionTreeClassifier(), RandomForestClassifier()]
]

Do we need to support it?

If not, we can change it to LogisticRegression and it will work fine.

neelasha23 · 2023-01-25T12:30:10Z

src/sklearn_evaluation/plot/roc.py

+        _is_binary = is_binary(array)
+
+        if _is_binary:
+            _is_1d_array = len(array.shape) == 1


can we use switch case here for better readability?

I agree with the readability but I don't see how a switch case would help here since there is 1 if else. I exported the inner if else to a method (_get_number_of_elements). what do you think?

yes that's fine I guess. Maybe some short comments can also help.

src/sklearn_evaluation/plot/roc.py

src/sklearn_evaluation/util.py

tests/test_roc.py

edublancas · 2023-01-26T13:32:30Z

In one of our examples we try to plot a roc curve using the DecisionTreeClassifier and as a result, y_score is in an invalid format

good catch. technically speaking, this is in the right format since these are valid scores (they are floats, and 0.0. and 1.0 are valid values). the "issue" here is the nature of the decision tree model, which tends to produce these types of scores.

I think let's change it to logistic regression so the scores are more meaningful

…into 98_roc_input_validation

idomic · 2023-01-26T17:05:39Z

I think let's change it to logistic regression so the scores are more meaningful

In this specific case for roc or the whole guide? let's open a different issue for that?

@edublancas @yafimvo Also what else is missing here?

yafimvo · 2023-01-26T17:30:01Z

@idomic
I changed the example only for roc.

Ready for review

idomic · 2023-01-26T18:23:49Z

tests/test_roc.py

+
+@image_comparison(baseline_images=["roc_add_roc"])
+def test_roc_add_to_roc(roc_values):
+    fpr1, tpr1 = roc_values


Why do we need this? it seems like those values get overwritten?
I see similar stuff in other test cases as well

If we can parameterize the inputs I think we should do it as the test seems pretty much the same for the roc add (might not be possible due to the image comparison)

Good point. This was from the previous code where roc tests were scattered in different places. I removed all roc resources from conftest.py to test_roc.py and parameterized the "short" values that are easy to use. The more complex values I kept as a fixture.

idomic

looks good, just added a comment on the tests

…into 98_roc_input_validation

idomic · 2023-01-29T14:18:51Z

Nice job @yafimvo !

y_true validation added

0058cc4

yafimvo requested review from edublancas and idomic January 22, 2023 11:59

idomic requested a review from neelasha23 January 24, 2023 15:25

Merge branch 'master' of https://github.com/yafimvo/sklearn-evaluation …

1eb2056

…into 98_roc_input_validation

edublancas suggested changes Jan 24, 2023

View reviewed changes

yafimvo added 2 commits January 25, 2023 12:56

test added, binary input bug fixed

711d227

Merge branch 'master' of https://github.com/yafimvo/sklearn-evaluation …

4ad5b87

…into 98_roc_input_validation

neelasha23 reviewed Jan 25, 2023

View reviewed changes

src/sklearn_evaluation/plot/roc.py Outdated Show resolved Hide resolved

neelasha23 reviewed Jan 25, 2023

View reviewed changes

src/sklearn_evaluation/util.py Show resolved Hide resolved

neelasha23 reviewed Jan 25, 2023

View reviewed changes

tests/test_roc.py Outdated Show resolved Hide resolved

review fixes

a15c375

yafimvo added 2 commits January 26, 2023 15:40

Merge branch 'master' of https://github.com/yafimvo/sklearn-evaluation …

7163f4f

…into 98_roc_input_validation

example updated

77a488b

yafimvo requested a review from edublancas January 26, 2023 14:41

idomic reviewed Jan 26, 2023

View reviewed changes

idomic suggested changes Jan 26, 2023

View reviewed changes

yafimvo added 4 commits January 29, 2023 13:01

review fixes

6573de8

Merge branch 'master' of https://github.com/yafimvo/sklearn-evaluation …

d9f21d7

…into 98_roc_input_validation

roc test resources merged to one file

fa93099

lint

52c85cc

idomic merged commit 401e2fb into ploomber:master Jan 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

input validation added to roc #230

input validation added to roc #230

yafimvo commented Jan 22, 2023 •

edited

Loading

coveralls commented Jan 22, 2023 •

edited

Loading

idomic commented Jan 24, 2023

edublancas left a comment

yafimvo commented Jan 25, 2023 •

edited

Loading

neelasha23 Jan 25, 2023

yafimvo Jan 25, 2023

neelasha23 Jan 25, 2023

edublancas commented Jan 26, 2023

idomic commented Jan 26, 2023 •

edited

Loading

yafimvo commented Jan 26, 2023

idomic Jan 26, 2023

idomic Jan 26, 2023 •

edited

Loading

yafimvo Jan 29, 2023

idomic left a comment

idomic commented Jan 29, 2023

input validation added to roc #230

input validation added to roc #230

Conversation

yafimvo commented Jan 22, 2023 • edited Loading

Describe your changes

y_test

"classes"

"one-hot encoded classes"

y_score

"scores"

Issue ticket number and link

Checklist before requesting a review

coveralls commented Jan 22, 2023 • edited Loading

Pull Request Test Coverage Report for Build 4005213428

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

idomic commented Jan 24, 2023

edublancas left a comment

Choose a reason for hiding this comment

yafimvo commented Jan 25, 2023 • edited Loading

neelasha23 Jan 25, 2023

Choose a reason for hiding this comment

yafimvo Jan 25, 2023

Choose a reason for hiding this comment

neelasha23 Jan 25, 2023

Choose a reason for hiding this comment

edublancas commented Jan 26, 2023

idomic commented Jan 26, 2023 • edited Loading

yafimvo commented Jan 26, 2023

idomic Jan 26, 2023

Choose a reason for hiding this comment

idomic Jan 26, 2023 • edited Loading

Choose a reason for hiding this comment

yafimvo Jan 29, 2023

Choose a reason for hiding this comment

idomic left a comment

Choose a reason for hiding this comment

idomic commented Jan 29, 2023

yafimvo commented Jan 22, 2023 •

edited

Loading

`y_test`

`y_score`

coveralls commented Jan 22, 2023 •

edited

Loading

yafimvo commented Jan 25, 2023 •

edited

Loading

idomic commented Jan 26, 2023 •

edited

Loading

idomic Jan 26, 2023 •

edited

Loading