-
Notifications
You must be signed in to change notification settings - Fork 53
Conversation
Updated Validating Elbow Curve Input Model (ploomber#219) * added parameterized tests added to changelog better changelog message validating input model * conf.py edit * updated validation for elbow curve * conditional to check for python 3.7 linting conditional to check for python 3.7 * changed conditional * try-catch block, better skipif lint * Empty commit Empty commit * skipif failing fix * resolving name error * fixed conditional updates newsletter url changelog and docs updated
Pull Request Test Coverage Report for Build 4005213428Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
@yafimvo please resolve conflicts |
…into 98_roc_input_validation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
found two issues:
gives this error:
ValueError: classes [0 1] mismatch with the labels [0 1 2] found in the data
looks like it thinks it only has two classes, but it has three. but where is the 2 coming from?
to reproduce:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn_evaluation import plot
from sklearn import preprocessing
iris = load_iris()
X, y = iris.data, iris.target
y_ = iris.target_names[y]
random_state = np.random.RandomState(0)
n_samples, n_features = X.shape
X = np.concatenate([X, random_state.randn(n_samples, 200 * n_features)], axis=1)
(
X_train,
X_test,
y_train,
y_test,
) = train_test_split(X, y_, test_size=0.5, stratify=y, random_state=0)
classifier = LogisticRegression()
y_score = classifier.fit(X_train, y_train).predict_proba(X_test)
lb = preprocessing.LabelBinarizer()
lb.fit(y_test)
y_test_bin = lb.transform(y_test)
roc = plot.ROC.from_raw_data(y_test_bin, y_score)
also, when calling this (same code as above):
# try to break it by passing y_test_bin in y_score
roc = plot.ROC.from_raw_data(y_test, y_test_bin)
the error shows:
ValueError: Please check y_score values.
Expected scores array-like. got: [[0 0 1]
[0 1 0]
[0 0 1]
[1 0 0]
[0 0 1]
[1 0 0]
[0 1 0]
[1 0 0]
[0 0 1]
[0 1 0]
[1 0 0]
[really long error]
i see that the error is displaying the whole input, however, it's very long. It'd be better to just display the first few characters:
probably something like this:
…into 98_roc_input_validation
In one of our examples we try to plot a roc curve using the
Do we need to support it? If not, we can change it to |
src/sklearn_evaluation/plot/roc.py
Outdated
_is_binary = is_binary(array) | ||
|
||
if _is_binary: | ||
_is_1d_array = len(array.shape) == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use switch case here for better readability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the readability but I don't see how a switch case would help here since there is 1 if else. I exported the inner if else to a method (_get_number_of_elements
). what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that's fine I guess. Maybe some short comments can also help.
good catch. technically speaking, this is in the right format since these are valid scores (they are floats, and 0.0. and 1.0 are valid values). the "issue" here is the nature of the decision tree model, which tends to produce these types of scores. I think let's change it to logistic regression so the scores are more meaningful |
…into 98_roc_input_validation
In this specific case for roc or the whole guide? let's open a different issue for that? @edublancas @yafimvo Also what else is missing here? |
@idomic Ready for review |
tests/test_roc.py
Outdated
|
||
@image_comparison(baseline_images=["roc_add_roc"]) | ||
def test_roc_add_to_roc(roc_values): | ||
fpr1, tpr1 = roc_values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this? it seems like those values get overwritten?
I see similar stuff in other test cases as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can parameterize the inputs I think we should do it as the test seems pretty much the same for the roc add (might not be possible due to the image comparison)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. This was from the previous code where roc tests were scattered in different places. I removed all roc resources from conftest.py
to test_roc.py
and parameterized the "short" values that are easy to use. The more complex values I kept as a fixture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, just added a comment on the tests
Nice job @yafimvo ! |
Describe your changes
test_roc
plot.ROC.from_raw_data
andplot.roc
take 2 inputs:y_test
andy_score
in the following formatsy_test
"classes"
"one-hot encoded classes"
y_score
"scores"
Issue ticket number and link
Closes #98
Checklist before requesting a review
📚 Documentation preview 📚: https://sklearn-evaluation--230.org.readthedocs.build/en/230/