Skip to content
This repository was archived by the owner on Feb 28, 2025. It is now read-only.

Multiclass ROC curve #98

Closed
idomic opened this issue Dec 21, 2022 · 18 comments · Fixed by #230
Closed

Multiclass ROC curve #98

idomic opened this issue Dec 21, 2022 · 18 comments · Fixed by #230

Comments

@idomic
Copy link
Contributor

idomic commented Dec 21, 2022

ROC curve is commonly used to compare the performance of models. It is usually used in binary classification, but it can also be used in multiclass classification using averaging methods.

When running plot.roc(y_test, prediction1) on more than 2 classes it fails, we should add support to it:

ValueError: multiclass format is not supported

I tried it for ROC curve, but this issue also applies to PR curve. (plot.precision_recall(y_test, prediction1)

@idomic
Copy link
Contributor Author

idomic commented Dec 21, 2022

This also affects on report = ce.make_report() given that ROC/PR are embedded in the report

@edublancas
Copy link
Contributor

we support multi-class for both (https://sklearn-evaluation.readthedocs.io/en/latest/api/plot.html#sklearn_evaluation.plot.precision_recall)

I remember implementing it, but the docs don't have examples. so either I really never implemented it or the input format you have is incorrect.

in any case, we should address the multiclass format is not supported error.

can you paste the full traceback?

@idomic
Copy link
Contributor Author

idomic commented Dec 21, 2022

Full stack:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [79], line 1
----> 1 plot.roc(y_test, prediction1)

File ~/opt/miniconda3/lib/python3.9/site-packages/sklearn_evaluation/telemetry.py:38, in SKLearnEvaluationLogger.log.<locals>.wrapper.<locals>.inner(*args, **kwargs)
     35     metadata['exception'] = str(e)
     36     telemetry.log_api(
     37         'sklearn-evaluation-error', metadata=metadata)
---> 38     raise e
     40 return result

File ~/opt/miniconda3/lib/python3.9/site-packages/sklearn_evaluation/telemetry.py:33, in SKLearnEvaluationLogger.log.<locals>.wrapper.<locals>.inner(*args, **kwargs)
     30 telemetry.log_api('sklearn-evaluation', metadata=metadata)
     32 try:
---> 33     result = func(*args, **kwargs)
     34 except Exception as e:
     35     metadata['exception'] = str(e)

File ~/opt/miniconda3/lib/python3.9/site-packages/sklearn_evaluation/plot/roc.py:65, in roc(y_true, y_score, ax)
     63 else:
     64     if y_score_is_vector:
---> 65         _roc(y_true, y_score, ax)
     66     else:
     67         _roc(y_true, y_score[:, 1], ax)

File ~/opt/miniconda3/lib/python3.9/site-packages/sklearn_evaluation/plot/roc.py:94, in _roc(y_true, y_score, ax)
     74 """
     75 Plot ROC curve for binary classification.
     76 
   (...)
     90 
     91 """
     92 # check dimensions
---> 94 fpr, tpr, _ = roc_curve(y_true, y_score)
     95 roc_auc = auc(fpr, tpr)
     97 ax.plot(fpr, tpr, label=('ROC curve (area = {0:0.2f})'.format(roc_auc)))

File ~/opt/miniconda3/lib/python3.9/site-packages/sklearn/metrics/_ranking.py:981, in roc_curve(y_true, y_score, pos_label, sample_weight, drop_intermediate)
    892 def roc_curve(
    893     y_true, y_score, *, pos_label=None, sample_weight=None, drop_intermediate=True
    894 ):
    895     """Compute Receiver operating characteristic (ROC).
    896 
    897     Note: this implementation is restricted to the binary classification task.
   (...)
    979 
    980     """
--> 981     fps, tps, thresholds = _binary_clf_curve(
    982         y_true, y_score, pos_label=pos_label, sample_weight=sample_weight
    983     )
    985     # Attempt to drop thresholds corresponding to points in between and
    986     # collinear with other points. These are always suboptimal and do not
    987     # appear on a plotted ROC curve (and thus do not affect the AUC).
   (...)
    992     # but does not drop more complicated cases like fps = [1, 3, 7],
    993     # tps = [1, 2, 4]; there is no harm in keeping too many thresholds.
    994     if drop_intermediate and len(fps) > 2:

File ~/opt/miniconda3/lib/python3.9/site-packages/sklearn/metrics/_ranking.py:740, in _binary_clf_curve(y_true, y_score, pos_label, sample_weight)
    738 y_type = type_of_target(y_true, input_name="y_true")
    739 if not (y_type == "binary" or (y_type == "multiclass" and pos_label is not None)):
--> 740     raise ValueError("{0} format is not supported".format(y_type))
    742 check_consistent_length(y_true, y_score, sample_weight)
    743 y_true = column_or_1d(y_true)

ValueError: multiclass format is not supported

@idomic
Copy link
Contributor Author

idomic commented Dec 21, 2022

these were the inputs
Screen Shot 2022-12-21 at 3 32 50 PM

@idomic
Copy link
Contributor Author

idomic commented Dec 21, 2022

Regardless to the error, there should be a concrete usage example in the user guide (better docs).

@edublancas
Copy link
Contributor

edublancas commented Dec 21, 2022

yep, we're missing an example here.

the problem is the format, y_test should be one-hot encoded:

[0, 0, 1]
[1, 0, 0]
etc

so we need both an example, and one-hot encode the user's input in case they pass one like yours. (and possibly show a warning saying we did the one-hot implicitly)

@idomic
Copy link
Contributor Author

idomic commented Jan 12, 2023

There user story has 3 action items here:

  1. Check this feature is actually implemented.
  2. Make sure it’s well documented and has examples for the user.
  3. Tackle the error to be more informative. (probably something around the format of the user input).

@yafimvo
Copy link
Contributor

yafimvo commented Jan 15, 2023

@edublancas I'm not sure I completely understood this.

the problem is the format, y_test should be one-hot encoded:

I tried to one-hot encode y_test but since is_row_vector(y_score) returns True for the given value (prediction1), it sets n_classes to 2 and tries to use sklearn.metrics.roc_curve with a multi-class input (that's where the original error comes from ValueError: multiclass format is not supported).

Tried to skip these checks by hardcoding some variables but eventually, this fails here (y_score is prediction1 which is a one-dimensional array)

fpr_, tpr_, _ = roc_curve(y_true_bin[:, i], y_score[:, i])

Based on this, I tried to one-hot encode the prediction1 array and it returned the following result which seems to be correct.

image

Please let me know your thoughts on this

@edublancas
Copy link
Contributor

please open a PR so I can run your code

@yafimvo
Copy link
Contributor

yafimvo commented Jan 16, 2023

@edublancas I just used the same example @idomic used but one-hot encoded prediction array.

current example

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn import tree
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
from sklearn_evaluation import plot, table
from sklearn.metrics import roc_curve, auc
    
df = sns.load_dataset('penguins')

df.isnull().sum()
df.dropna(inplace=True)
Y = df.species
Y = Y.map({'Adelie': 0, 'Chinstrap': 1, 'Gentoo': 2})
df.drop('species', inplace=True, axis=1)
se = pd.get_dummies(df['sex'], drop_first=True)
df = pd.concat([df, se], axis=1)
df.drop('sex', axis=1, inplace=True)
le = LabelEncoder()
df['island'] = le.fit_transform(df['island'])

X = df
X_train, X_test, y_train, y_test = train_test_split(
    X, Y, test_size=0.3, random_state=40)

dtc = tree.DecisionTreeClassifier()
dt_model = dtc.fit(X_train, y_train)
predictions = dt_model.predict(X_test)

one hot encode prediction and run plot.roc

predictions_one_hot = np.zeros((a.size, a.max() + 1))
predictions_one_hot[np.arange(a.size), a] = 1

plot.ROC.from_raw_data(y_test, predictions_one_hot)

I'm not sure how to implement this in the code since I don't know when and if should I one-hot the input.
I can compare the inputs and return some suggestions or if we get this ValueError from roc, I can encode the input and re-run the function but I think it is not the best approach.

@edublancas
Copy link
Contributor

Ok, so what we want is this format:

[0, 1, 2, 0, 1, ...]

this one:

[[0, 0, 1],
 [1, 0, 0]]

and this one (I didn't remember this one) - these are output probabilities per class

[[0.1, 0.1, 0.8],
 [0.7, 0.15, 0.15]]

To produce the same output.

If I understand correctly, the first one breaks because if falls under is_row_vector. So it looks like we need another condition to distinguish this format. I think we should check if it's a row vector AND it only has 1 and 0 as unique values (or True/False)

let me know if this clarifies things @yafimvo

@yafimvo
Copy link
Contributor

yafimvo commented Jan 16, 2023

I think we should check if it's a row vector AND it only has 1 and 0 as unique values (or True/False)

@edublancas yes, I wasn't sure if this check is valid or not.

@edublancas
Copy link
Contributor

I see you built a one hot encoding function, and I just found sklearn has one (which I think is the same one that we have in our codebase label_binarize). check out this example: https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html - let's use that one as it probably handles all sorts of edge cases. these will allow us to convert between the first format and the second one, and also handle cases where the labels are strings

I think we should check if it's a row vector AND it only has 1 and 0 as unique values (or True/False)

then, we can change this check for a more generic version that verifies if the number of unique values is 2

@edublancas
Copy link
Contributor

let's also document in the docstring that all the three formats are valid. I think if we put a print statement in the examples, they will show up in the docs, this way it will be even more clear for users that we support all formats

@edublancas
Copy link
Contributor

I just realized I made a mistake when describing the issue. ROC takes two inputs: y_test, and y_scores.

For y_test, we should accept (let's call this format "classes"):

[0, 1, 2, 0, 1, ...]

and (let's call this format "one-hot encoded classes")

[[0, 0, 1],
 [1, 0, 0]]

However, for y_score, the only valid format is (lets call this "scores"):

[[0.1, 0.1, 0.8],
 [0.7, 0.15, 0.15]]

since ROC needs the raw scores (0-1) for plotting.

I think we should also do some validation. For example, if user passes "scores" to y_test, we should throw an error. and if the user passes "classes" or "one-hot encoded classes" to ROC, we should also throw an error (and tell the user that they can generate the scores with model.predict_proba

@yafimvo
Copy link
Contributor

yafimvo commented Jan 18, 2023

@edublancas I pushed yesterday a PR that allows passing all these inputs without the errors you mentioned. Change it?

@edublancas
Copy link
Contributor

Yes, please change. Passing classes or one-hot encoded to ROC in the y_scores is a methodological mistake - that's why we're seeing these weird ROC curves. The methods you implemented are still valuable; we can use them in other plots.

So please don't remove them and ensure they're documented. For example, in the confusion matrix, we can use them. Since there we don't require scores, but predictions (y_pred), so if the user passes data in any of the three formats, we can convert it.

one question, how are you converting scores to the binary format? (which threshold are you using)

@yafimvo
Copy link
Contributor

yafimvo commented Jan 22, 2023

@edublancas

one question, how are you converting scores to the binary format? (which threshold are you using)

I used from sklearn.preprocessing import LabelBinarizer to binarize the scores by one-hot-encoding in an OvR fashion.
took it from here

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants