Metrics

Contents

class datalib.metrics.CAPCurveDisplay(*, cumulative_gains, thresholds, positive_rate=None, gini=None, estimator_name=None, pos_label=None)[source]

Bases: _BinaryClassifierCurveDisplayMixin

CAP Curve visualization.

Parameters

cumulative_gainsndarray

Cumulative gain with each threshold (percentage of class 1).

thresholdsndarray

Increasing thresholds (percentage of examples) on the decision function used to compute cap curve.

positive_ratendarray

Rate of positive class examples to compute the perfect curve.

ginifloat, default=None

Gini score. If None, the gini score is not shown.

estimator_namestr, default=None

Name of estimator. If None, the estimator name is not shown.

pos_labelstr or int, default=None

The class considered as the positive class when computing the CAP curve. By default, estimators.classes_[1] is considered as the positive class.

Attributes

line_matplotlib Artist

CAP Curve.

ax_matplotlib Axes

Axes with CAP Curve.

figure_matplotlib Figure

Figure containing the curve.

classmethod from_estimator(estimator, X, y, *, sample_weight=None, response_method='auto', pos_label=None, plot_random=False, plot_perfect=False, name=None, ax=None, **kwargs)[source]

Create a CAP Curve display from an estimator.

Parameters

estimatorestimator instance

Fitted classifier or a fitted Pipeline in which the last estimator is a classifier.

X{array-like, sparse matrix} of shape (n_samples, n_features)

Input values.

yarray-like of shape (n_samples,)

Target values.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

response_method{‘predict_proba’, ‘decision_function’, ‘auto’} default=’auto’

Specifies whether to use predict_proba or decision_function as the target response. If set to ‘auto’, predict_proba is tried first and if it does not exist decision_function is tried next.

pos_labelstr or int, default=None

The class considered as the positive class when computing the ROC-AUC metric. By default, estimators.classes_[1] is considered as the positive class.

plot_randomboolean, default = False

Flag indicating whether to plot the baseline random curve (True) or not (False).

plot_perfectboolean, default = False

Flag indicating whether to plot the baseline perfect curve (True) or not (False).

namestr, default=None

Name of CAP Curve for labeling. If None, use the name of the estimator.

axmatplotlib axes, default=None

Axes object to plot on. If None, a new figure and axes is

created.

**kwargsdict

Keyword arguments to be passed to matplotlib’s plot.

Returns

displayCAPCurveDisplay

The ROC Curve display.

Examples

>>> import matplotlib.pyplot as plt
>>> from datalib.metrics import CAPCurveDisplay
>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.svm import SVC
>>> X, y = make_classification(random_state=0)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y)
>>> clf = SVC(random_state=0).fit(X_train, y_train)
>>> CAPCurveDisplay.from_estimator(clf, X_test, y_test)
>>> plt.show()
classmethod from_predictions(y_true, y_score, *, sample_weight=None, pos_label=None, plot_random=False, plot_perfect=False, name=None, ax=None, **kwargs)[source]

Plot CAP curve given the true and predicted score.

Parameters

y_truearray-like of shape (n_samples,)

True labels.

y_scorearray-like of shape (n_samples,)

Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

pos_labelstr or int, default=None

The label of the positive class. When pos_label=None, if y_true is in {-1, 1} or {0, 1}, pos_label is set to 1, otherwise an error will be raised.

plot_randomboolean, default = False

Flag indicating whether to plot the baseline random curve (True) or not (False).

plot_perfectboolean, default = False

Flag indicating whether to plot the baseline perfect curve (True) or not (False).

namestr, default=None

Name of ROC curve for labeling. If None, name will be set to “Classifier”.

axmatplotlib axes, default=None

Axes object to plot on. If None, a new figure and axes is created.

**kwargsdict

Additional keywords arguments passed to matplotlib plot function.

Returns

displayCAPCurveDisplay

Object that stores computed values.

Examples

>>> import matplotlib.pyplot as plt
>>> from datalib.metrics import CAPCurveDisplay
>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.svm import SVC
>>> X, y = make_classification(random_state=0)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y)
>>> clf = SVC(random_state=0, probability=True).fit(X_train, y_train)
>>> y_pred = clf.predict_proba(X_test)[:, 1]
>>> CAPCurveDisplay.from_predictions(y_test, y_pred)
>>> plt.show()
plot(*, plot_random=False, plot_perfect=False, name=None, ax=None, **kwargs)[source]

Plot visualization Extra keyword arguments will be passed to matplotlib’s plot.

Parameters

axmatplotlib axes, default=None

Axes object to plot on. If None, a new figure and axes is created.

namestr, default=None

Name of CAP Curve for labeling. If None, use estimator_name if not None, otherwise no labeling is shown.

plot_randomboolean, default = False

Flag indicating whether to plot the baseline random curve (True) or not (False).

plot_perfectboolean, default = False

Flag indicating whether to plot the baseline perfect curve (True) or not (False).

Returns

displayCAPCurveDisplay

Object that stores computed values.

class datalib.metrics.DeliquencyDisplay(approval_rate, default_rate, optimal_rate, *, estimator_name=None, pos_label=None)[source]

Bases: _BinaryClassifierCurveDisplayMixin

Deliquency curve visualization. It is recommended to use from_estimator() or from_predictions() to create a DeliquencyDisplay. All parameters are stored as attributes.

Parameters

approval_ratearray-like, shape (n_samples,)

The relative percentage population approved.

default_ratearray-like, shape (n_samples,)

The default rate, a.k.a relative percentage of positives on the sample.

optimal_ratearray-like, shape (n_samples,)

The optimal default rate.

estimator_namestr, default=None

Name of estimator. If None, the estimator name is not shown.

pos_labelstr or int, default=None

The positive class when computing the deliquency curve. By default, estimators.classes_[1] is considered as the positive class.

Attributes

line_matplotlib Artist

Deliquency curve.

ax_matplotlib Axes

Axes with deliquency curve.

figure_matplotlib Figure

Figure containing the curve.

See Also

delinquency_curve : The main method to calculate needed curves for the delinquency analysis.

DeliquencyDisplay.from_predictions : Plot deliquency curve using approval, default, and optimal rates.

DeliquencyDisplay.from_estimator : Plot deliquency curve using an estimator and data.

Examples

>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.linear_model import LogisticRegression
>>> from datalib import delinquency_curve, DeliquencyDisplay
>>> X, y = make_classification(random_state=0)
>>> X_train, X_test, y_train, y_test = train_test_split(
...     X, y, random_state=0)
>>> clf = LogisticRegression(random_state=0)
>>> clf.fit(X_train, y_train)
LogisticRegression(random_state=0)
>>> y_prob = clf.predict_proba(X_test)[:, 1]
>>> approval_rate, default_rate, optimal_rate =
...     delinquency_curve(y_test, y_prob)
>>> disp = DeliquencyDisplay(approval_rate, default_rate, optimal_rate)
>>> disp.plot()
<...>
classmethod from_estimator(estimator, X, y, *, pos_label=None, name=None, ref_line=True, ax=None, **kwargs)[source]

Plot deliquency curve using a binary classifier and data. A deliquency curve leverages inputs from a binary classifier and plots the default rates over unique approval rates, a.k.a fractions of the population, on the y-axis. Extra keyword arguments will be passed to matplotlib.pyplot.plot().

Parameters

estimatorestimator instance

Fitted classifier or a fitted Pipeline in which the last estimator is a classifier. The classifier must have a predict_proba method.

X{array-like, sparse matrix} of shape (n_samples, n_features)

Input values.

yarray-like of shape (n_samples,)

Binary target values.

pos_labelstr or int, default=None

The positive class when computing the calibration curve. By default, estimators.classes_[1] is considered as the positive class.

namestr, default=None
Name for labeling curve. If None, the name of the estimator

is used.

ref_linebool, default=True

If True, plots a reference line representing a perfectly calibrated classifier.

axmatplotlib axes, default=None

Axes object to plot on. If None, a new figure and axes is created.

**kwargsdict

Keyword arguments to be passed to matplotlib.pyplot.plot().

Returns

displayDeliquencyDisplay.

Object that stores computed values.

See Also

DeliquencyDisplay.from_predictionsPlot deliquency curve using

true and predicted labels.

Examples

>>> import matplotlib.pyplot as plt
>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.linear_model import LogisticRegression
>>> from datalib import DeliquencyDisplay
>>> X, y = make_classification(random_state=0)
>>> X_train, X_test, y_train, y_test = train_test_split(
...     X, y, random_state=0)
>>> clf = LogisticRegression(random_state=0)
>>> clf.fit(X_train, y_train)
LogisticRegression(random_state=0)
>>> disp = DeliquencyDisplay.from_estimator(clf, X_test, y_test)
>>> plt.show()
classmethod from_predictions(y_true, y_prob, *, pos_label=None, name=None, ref_line=True, ax=None, **kwargs)[source]

Plot deliquency curve using true labels and predicted probabilities.

Parameters

y_truearray-like of shape (n_samples,)

True labels.

y_probarray-like of shape (n_samples,)

The predicted probabilities of the positive class.

pos_labelstr or int, default=None

The positive class when computing the calibration curve. By default, estimators.classes_[1] is considered as the positive class.

namestr, default=None

Name for labeling curve.

ref_linebool, default=True

If True, plots a reference line representing a perfectly calibrated classifier.

axmatplotlib axes, default=None

Axes object to plot on. If None, a new figure and axes is created.

**kwargsdict

Keyword arguments to be passed to matplotlib.pyplot.plot().

Returns

displayDeliquencyDisplay.

Object that stores computed values.

See Also

DeliquencyDisplay.from_estimatorPlot calibration curve using an

estimator and data.

Examples

>>> import matplotlib.pyplot as plt
>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.linear_model import LogisticRegression
>>> from datalib import DeliquencyDisplay
>>> X, y = make_classification(random_state=0)
>>> X_train, X_test, y_train, y_test = train_test_split(
...     X, y, random_state=0)
>>> clf = LogisticRegression(random_state=0)
>>> clf.fit(X_train, y_train)
LogisticRegression(random_state=0)
>>> y_prob = clf.predict_proba(X_test)[:, 1]
>>> disp = DeliquencyDisplay.from_predictions(y_test, y_prob)
>>> plt.show()
plot(*, ax=None, name=None, ref_line=True, **kwargs)[source]

Plot visualization. Extra keyword arguments will be passed to matplotlib.pyplot.plot().

Parameters

axMatplotlib Axes, default=None

Axes object to plot on. If None, a new figure and axes is created.

namestr, default=None

Name for labeling curve. If None, use estimator_name if not None, otherwise no labeling is shown.

ref_linebool, default=True

If True, plots a reference line representing a perfectly calibrated classifier.

**kwargsdict

Keyword arguments to be passed to matplotlib.pyplot.plot().

Returns

displayDeliquencyDisplay

Object that stores computed values.

datalib.metrics.cap_curve(y_true, y_score, sample_weight=None)[source]

Base method to calculate the Cumulative Accuracy Profile Curve (CAP Curve). This metric ponders the rate of positive samples and the percentage of the dataset covered by each sequential cut-off threshold.

Parameters

y_truendarray of shape (n_samples,)

True targets of binary classification.

y_scorendarray of shape (n_samples,)

Estimated probabilities or output of a model / decision function.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights. If None, all samples are given the same weight.

Returns

cumulative_gainsndarray of shape (n_samples,)

Cumulative gain with each threshold (percentage of class 1).

thresholdsndarray of shape (n_samples,)

Increasing thresholds (percentage of examples) on the decision function used to compute cap curve.

gini: ndarray of shape (n_samples,)

The normalized gini coefficient, calculated from the AUC.

datalib.metrics.delinquency_curve(y_true, y_score, pos_label=None)[source]

Delinquency curve for a binary classification.

The delinquency curve shows how the default rate (proportion of pos_labels) changes with different approval rates. The curve is typically plotted on a graph, with the approval rate on the x-axis and the default rate on the y-axis. The curve is created by sorting the samples by score and calculating the default rate for subsequently larger population getting the best scores at first.

Deliquency curve is key on many actuarial operations, where grasping the relative percentage of misclassification on approval levels is vital.

Parameters

y_truearray-like, shape (n_samples,)

True binary labels. If labels are not either {-1, 1} or {0, 1}, then pos_label should be explicitly given.

y_scorearray-like, shape (n_samples,)

Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).

pos_labelint or str, default=None

The label of the positive class.

Returns

approval_rate: array-like, shape (n_samples,).

Increasing approval rate (percentage of approved best scores) used to compute default_rate. It lies in the support (0, 1).

default_rate: array-like, shape (n_samples,).

Default rates values for the approval rates such that the element i it proportion of delinquents when approving approval_rate[i] of the population.

optimal_rate: array-like, shape (n_samples,).

Optimal default rates for a perfect model.