Multiclass ROC curve #98

idomic · 2022-12-21T20:10:19Z

ROC curve is commonly used to compare the performance of models. It is usually used in binary classification, but it can also be used in multiclass classification using averaging methods.

When running plot.roc(y_test, prediction1) on more than 2 classes it fails, we should add support to it:

ValueError: multiclass format is not supported

I tried it for ROC curve, but this issue also applies to PR curve. (plot.precision_recall(y_test, prediction1)

The text was updated successfully, but these errors were encountered:

idomic · 2022-12-21T20:19:30Z

This also affects on report = ce.make_report() given that ROC/PR are embedded in the report

edublancas · 2022-12-21T20:22:30Z

we support multi-class for both (https://sklearn-evaluation.readthedocs.io/en/latest/api/plot.html#sklearn_evaluation.plot.precision_recall)

I remember implementing it, but the docs don't have examples. so either I really never implemented it or the input format you have is incorrect.

in any case, we should address the multiclass format is not supported error.

can you paste the full traceback?

idomic · 2022-12-21T20:32:11Z

Full stack:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [79], line 1
----> 1 plot.roc(y_test, prediction1)

File ~/opt/miniconda3/lib/python3.9/site-packages/sklearn_evaluation/telemetry.py:38, in SKLearnEvaluationLogger.log.<locals>.wrapper.<locals>.inner(*args, **kwargs)
     35     metadata['exception'] = str(e)
     36     telemetry.log_api(
     37         'sklearn-evaluation-error', metadata=metadata)
---> 38     raise e
     40 return result

File ~/opt/miniconda3/lib/python3.9/site-packages/sklearn_evaluation/telemetry.py:33, in SKLearnEvaluationLogger.log.<locals>.wrapper.<locals>.inner(*args, **kwargs)
     30 telemetry.log_api('sklearn-evaluation', metadata=metadata)
     32 try:
---> 33     result = func(*args, **kwargs)
     34 except Exception as e:
     35     metadata['exception'] = str(e)

File ~/opt/miniconda3/lib/python3.9/site-packages/sklearn_evaluation/plot/roc.py:65, in roc(y_true, y_score, ax)
     63 else:
     64     if y_score_is_vector:
---> 65         _roc(y_true, y_score, ax)
     66     else:
     67         _roc(y_true, y_score[:, 1], ax)

File ~/opt/miniconda3/lib/python3.9/site-packages/sklearn_evaluation/plot/roc.py:94, in _roc(y_true, y_score, ax)
     74 """
     75 Plot ROC curve for binary classification.
     76 
   (...)
     90 
     91 """
     92 # check dimensions
---> 94 fpr, tpr, _ = roc_curve(y_true, y_score)
     95 roc_auc = auc(fpr, tpr)
     97 ax.plot(fpr, tpr, label=('ROC curve (area = {0:0.2f})'.format(roc_auc)))

File ~/opt/miniconda3/lib/python3.9/site-packages/sklearn/metrics/_ranking.py:981, in roc_curve(y_true, y_score, pos_label, sample_weight, drop_intermediate)
    892 def roc_curve(
    893     y_true, y_score, *, pos_label=None, sample_weight=None, drop_intermediate=True
    894 ):
    895     """Compute Receiver operating characteristic (ROC).
    896 
    897     Note: this implementation is restricted to the binary classification task.
   (...)
    979 
    980     """
--> 981     fps, tps, thresholds = _binary_clf_curve(
    982         y_true, y_score, pos_label=pos_label, sample_weight=sample_weight
    983     )
    985     # Attempt to drop thresholds corresponding to points in between and
    986     # collinear with other points. These are always suboptimal and do not
    987     # appear on a plotted ROC curve (and thus do not affect the AUC).
   (...)
    992     # but does not drop more complicated cases like fps = [1, 3, 7],
    993     # tps = [1, 2, 4]; there is no harm in keeping too many thresholds.
    994     if drop_intermediate and len(fps) > 2:

File ~/opt/miniconda3/lib/python3.9/site-packages/sklearn/metrics/_ranking.py:740, in _binary_clf_curve(y_true, y_score, pos_label, sample_weight)
    738 y_type = type_of_target(y_true, input_name="y_true")
    739 if not (y_type == "binary" or (y_type == "multiclass" and pos_label is not None)):
--> 740     raise ValueError("{0} format is not supported".format(y_type))
    742 check_consistent_length(y_true, y_score, sample_weight)
    743 y_true = column_or_1d(y_true)

ValueError: multiclass format is not supported

idomic · 2022-12-21T20:33:11Z

these were the inputs

idomic · 2022-12-21T20:33:57Z

Regardless to the error, there should be a concrete usage example in the user guide (better docs).

edublancas · 2022-12-21T20:36:12Z

yep, we're missing an example here.

the problem is the format, y_test should be one-hot encoded:

[0, 0, 1]
[1, 0, 0]
etc

so we need both an example, and one-hot encode the user's input in case they pass one like yours. (and possibly show a warning saying we did the one-hot implicitly)

idomic · 2023-01-12T19:09:58Z

There user story has 3 action items here:

Check this feature is actually implemented.
Make sure it’s well documented and has examples for the user.
Tackle the error to be more informative. (probably something around the format of the user input).

yafimvo · 2023-01-15T17:02:20Z

@edublancas I'm not sure I completely understood this.

the problem is the format, y_test should be one-hot encoded:

I tried to one-hot encode y_test but since is_row_vector(y_score) returns True for the given value (prediction1), it sets n_classes to 2 and tries to use sklearn.metrics.roc_curve with a multi-class input (that's where the original error comes from ValueError: multiclass format is not supported).

Tried to skip these checks by hardcoding some variables but eventually, this fails here (y_score is prediction1 which is a one-dimensional array)

fpr_, tpr_, _ = roc_curve(y_true_bin[:, i], y_score[:, i])

Based on this, I tried to one-hot encode the prediction1 array and it returned the following result which seems to be correct.

Please let me know your thoughts on this

edublancas · 2023-01-16T13:25:08Z

please open a PR so I can run your code

yafimvo · 2023-01-16T15:40:48Z

@edublancas I just used the same example @idomic used but one-hot encoded prediction array.

current example

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn import tree
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
from sklearn_evaluation import plot, table
from sklearn.metrics import roc_curve, auc
    
df = sns.load_dataset('penguins')

df.isnull().sum()
df.dropna(inplace=True)
Y = df.species
Y = Y.map({'Adelie': 0, 'Chinstrap': 1, 'Gentoo': 2})
df.drop('species', inplace=True, axis=1)
se = pd.get_dummies(df['sex'], drop_first=True)
df = pd.concat([df, se], axis=1)
df.drop('sex', axis=1, inplace=True)
le = LabelEncoder()
df['island'] = le.fit_transform(df['island'])

X = df
X_train, X_test, y_train, y_test = train_test_split(
    X, Y, test_size=0.3, random_state=40)

dtc = tree.DecisionTreeClassifier()
dt_model = dtc.fit(X_train, y_train)
predictions = dt_model.predict(X_test)

one hot encode prediction and run plot.roc

predictions_one_hot = np.zeros((a.size, a.max() + 1))
predictions_one_hot[np.arange(a.size), a] = 1

plot.ROC.from_raw_data(y_test, predictions_one_hot)

I'm not sure how to implement this in the code since I don't know when and if should I one-hot the input.
I can compare the inputs and return some suggestions or if we get this ValueError from roc, I can encode the input and re-run the function but I think it is not the best approach.

edublancas · 2023-01-16T16:59:13Z

Ok, so what we want is this format:

[0, 1, 2, 0, 1, ...]

this one:

[[0, 0, 1],
 [1, 0, 0]]

and this one (I didn't remember this one) - these are output probabilities per class

[[0.1, 0.1, 0.8],
 [0.7, 0.15, 0.15]]

To produce the same output.

If I understand correctly, the first one breaks because if falls under is_row_vector. So it looks like we need another condition to distinguish this format. I think we should check if it's a row vector AND it only has 1 and 0 as unique values (or True/False)

let me know if this clarifies things @yafimvo

yafimvo · 2023-01-16T17:02:56Z

I think we should check if it's a row vector AND it only has 1 and 0 as unique values (or True/False)

@edublancas yes, I wasn't sure if this check is valid or not.

edublancas · 2023-01-16T18:11:54Z

I see you built a one hot encoding function, and I just found sklearn has one (which I think is the same one that we have in our codebase label_binarize). check out this example: https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html - let's use that one as it probably handles all sorts of edge cases. these will allow us to convert between the first format and the second one, and also handle cases where the labels are strings

I think we should check if it's a row vector AND it only has 1 and 0 as unique values (or True/False)

then, we can change this check for a more generic version that verifies if the number of unique values is 2

edublancas · 2023-01-16T18:12:43Z

let's also document in the docstring that all the three formats are valid. I think if we put a print statement in the examples, they will show up in the docs, this way it will be even more clear for users that we support all formats

edublancas · 2023-01-18T13:30:44Z

I just realized I made a mistake when describing the issue. ROC takes two inputs: y_test, and y_scores.

For y_test, we should accept (let's call this format "classes"):

[0, 1, 2, 0, 1, ...]

and (let's call this format "one-hot encoded classes")

[[0, 0, 1],
 [1, 0, 0]]

However, for y_score, the only valid format is (lets call this "scores"):

[[0.1, 0.1, 0.8],
 [0.7, 0.15, 0.15]]

since ROC needs the raw scores (0-1) for plotting.

I think we should also do some validation. For example, if user passes "scores" to y_test, we should throw an error. and if the user passes "classes" or "one-hot encoded classes" to ROC, we should also throw an error (and tell the user that they can generate the scores with model.predict_proba

yafimvo · 2023-01-18T17:10:45Z

@edublancas I pushed yesterday a PR that allows passing all these inputs without the errors you mentioned. Change it?

edublancas · 2023-01-18T18:47:02Z

Yes, please change. Passing classes or one-hot encoded to ROC in the y_scores is a methodological mistake - that's why we're seeing these weird ROC curves. The methods you implemented are still valuable; we can use them in other plots.

So please don't remove them and ensure they're documented. For example, in the confusion matrix, we can use them. Since there we don't require scores, but predictions (y_pred), so if the user passes data in any of the three formats, we can convert it.

one question, how are you converting scores to the binary format? (which threshold are you using)

yafimvo · 2023-01-22T11:37:14Z

@edublancas

one question, how are you converting scores to the binary format? (which threshold are you using)

I used from sklearn.preprocessing import LabelBinarizer to binarize the scores by one-hot-encoding in an OvR fashion.
took it from here

idomic mentioned this issue Jan 10, 2023

minor fix to ROC #121

Closed

yafimvo mentioned this issue Jan 10, 2023

y_test, y_true args removed from roc constructor #135

Merged

3 tasks

yafimvo mentioned this issue Jan 17, 2023

Support different inputs for ROC #221

Closed

3 tasks

yafimvo mentioned this issue Jan 22, 2023

input validation added to roc #230

Merged

3 tasks

idomic closed this as completed in #230 Jan 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiclass ROC curve #98

Multiclass ROC curve #98

idomic commented Dec 21, 2022 •

edited

Loading

idomic commented Dec 21, 2022

edublancas commented Dec 21, 2022

idomic commented Dec 21, 2022

idomic commented Dec 21, 2022

idomic commented Dec 21, 2022 •

edited

Loading

edublancas commented Dec 21, 2022 •

edited

Loading

idomic commented Jan 12, 2023 •

edited

Loading

yafimvo commented Jan 15, 2023

edublancas commented Jan 16, 2023

yafimvo commented Jan 16, 2023 •

edited

Loading

edublancas commented Jan 16, 2023

yafimvo commented Jan 16, 2023

edublancas commented Jan 16, 2023

edublancas commented Jan 16, 2023

edublancas commented Jan 18, 2023

yafimvo commented Jan 18, 2023

edublancas commented Jan 18, 2023

yafimvo commented Jan 22, 2023

Multiclass ROC curve #98

Multiclass ROC curve #98

Comments

idomic commented Dec 21, 2022 • edited Loading

idomic commented Dec 21, 2022

edublancas commented Dec 21, 2022

idomic commented Dec 21, 2022

idomic commented Dec 21, 2022

idomic commented Dec 21, 2022 • edited Loading

edublancas commented Dec 21, 2022 • edited Loading

idomic commented Jan 12, 2023 • edited Loading

yafimvo commented Jan 15, 2023

edublancas commented Jan 16, 2023

yafimvo commented Jan 16, 2023 • edited Loading

edublancas commented Jan 16, 2023

yafimvo commented Jan 16, 2023

edublancas commented Jan 16, 2023

edublancas commented Jan 16, 2023

edublancas commented Jan 18, 2023

yafimvo commented Jan 18, 2023

edublancas commented Jan 18, 2023

yafimvo commented Jan 22, 2023

idomic commented Dec 21, 2022 •

edited

Loading

idomic commented Dec 21, 2022 •

edited

Loading

edublancas commented Dec 21, 2022 •

edited

Loading

idomic commented Jan 12, 2023 •

edited

Loading

yafimvo commented Jan 16, 2023 •

edited

Loading