Metrics

Contents

class datalib.metrics.CAPCurveDisplay(*, cumulative_gains, thresholds, positive_rate=None, gini=None, estimator_name=None, pos_label=None)[source]

Bases: _BinaryClassifierCurveDisplayMixin

CAP Curve visualization.

Parameters

cumulative_gainsndarray: Cumulative gain with each threshold (percentage of class 1).
thresholdsndarray: Increasing thresholds (percentage of examples) on the decision function used to compute cap curve.
positive_ratendarray: Rate of positive class examples to compute the perfect curve.
ginifloat, default=None: Gini score. If None, the gini score is not shown.
estimator_namestr, default=None: Name of estimator. If None, the estimator name is not shown.
pos_labelstr or int, default=None: The class considered as the positive class when computing the CAP curve. By default, estimators.classes_[1] is considered as the positive class.

Attributes

line_matplotlib Artist: CAP Curve.
ax_matplotlib Axes: Axes with CAP Curve.
figure_matplotlib Figure: Figure containing the curve.

classmethod from_estimator(estimator, X, y, *, sample_weight=None, response_method='auto', pos_label=None, plot_random=False, plot_perfect=False, name=None, ax=None, **kwargs)[source]

Create a CAP Curve display from an estimator.

Parameters

estimatorestimator instance: Fitted classifier or a fitted Pipeline in which the last estimator is a classifier.
X{array-like, sparse matrix} of shape (n_samples, n_features): Input values.
yarray-like of shape (n_samples,): Target values.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.
response_method{‘predict_proba’, ‘decision_function’, ‘auto’} default=’auto’: Specifies whether to use predict_proba or decision_function as the target response. If set to ‘auto’, predict_proba is tried first and if it does not exist decision_function is tried next.
pos_labelstr or int, default=None: The class considered as the positive class when computing the ROC-AUC metric. By default, estimators.classes_[1] is considered as the positive class.
plot_randomboolean, default = False: Flag indicating whether to plot the baseline random curve (True) or not (False).
plot_perfectboolean, default = False: Flag indicating whether to plot the baseline perfect curve (True) or not (False).
namestr, default=None: Name of CAP Curve for labeling. If None, use the name of the estimator.
axmatplotlib axes, default=None: Axes object to plot on. If None, a new figure and axes is

created.

**kwargsdict: Keyword arguments to be passed to matplotlib’s plot.

Returns

displayCAPCurveDisplay: The ROC Curve display.

Examples

>>> import matplotlib.pyplot as plt
>>> from datalib.metrics import CAPCurveDisplay
>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.svm import SVC
>>> X, y = make_classification(random_state=0)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y)
>>> clf = SVC(random_state=0).fit(X_train, y_train)
>>> CAPCurveDisplay.from_estimator(clf, X_test, y_test)
>>> plt.show()

classmethod from_predictions(y_true, y_score, *, sample_weight=None, pos_label=None, plot_random=False, plot_perfect=False, name=None, ax=None, **kwargs)[source]

Plot CAP curve given the true and predicted score.

Parameters

y_truearray-like of shape (n_samples,): True labels.
y_scorearray-like of shape (n_samples,): Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.
pos_labelstr or int, default=None: The label of the positive class. When pos_label=None, if y_true is in {-1, 1} or {0, 1}, pos_label is set to 1, otherwise an error will be raised.
plot_randomboolean, default = False: Flag indicating whether to plot the baseline random curve (True) or not (False).
plot_perfectboolean, default = False: Flag indicating whether to plot the baseline perfect curve (True) or not (False).
namestr, default=None: Name of ROC curve for labeling. If None, name will be set to “Classifier”.
axmatplotlib axes, default=None: Axes object to plot on. If None, a new figure and axes is created.
**kwargsdict: Additional keywords arguments passed to matplotlib plot function.

Returns

displayCAPCurveDisplay: Object that stores computed values.

Examples

>>> import matplotlib.pyplot as plt
>>> from datalib.metrics import CAPCurveDisplay
>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.svm import SVC
>>> X, y = make_classification(random_state=0)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y)
>>> clf = SVC(random_state=0, probability=True).fit(X_train, y_train)
>>> y_pred = clf.predict_proba(X_test)[:, 1]
>>> CAPCurveDisplay.from_predictions(y_test, y_pred)
>>> plt.show()

plot(*, plot_random=False, plot_perfect=False, name=None, ax=None, **kwargs)[source]

Plot visualization Extra keyword arguments will be passed to matplotlib’s plot.

Parameters

axmatplotlib axes, default=None: Axes object to plot on. If None, a new figure and axes is created.
namestr, default=None: Name of CAP Curve for labeling. If None, use estimator_name if not None, otherwise no labeling is shown.
plot_randomboolean, default = False: Flag indicating whether to plot the baseline random curve (True) or not (False).
plot_perfectboolean, default = False: Flag indicating whether to plot the baseline perfect curve (True) or not (False).

Returns

displayCAPCurveDisplay: Object that stores computed values.

class datalib.metrics.DeliquencyDisplay(approval_rate, default_rate, optimal_rate, *, estimator_name=None, pos_label=None)[source]

Bases: _BinaryClassifierCurveDisplayMixin

Deliquency curve visualization. It is recommended to use from_estimator() or from_predictions() to create a DeliquencyDisplay. All parameters are stored as attributes.

Parameters

approval_ratearray-like, shape (n_samples,): The relative percentage population approved.
default_ratearray-like, shape (n_samples,): The default rate, a.k.a relative percentage of positives on the sample.
optimal_ratearray-like, shape (n_samples,): The optimal default rate.
estimator_namestr, default=None: Name of estimator. If None, the estimator name is not shown.
pos_labelstr or int, default=None: The positive class when computing the deliquency curve. By default, estimators.classes_[1] is considered as the positive class.

Attributes

line_matplotlib Artist: Deliquency curve.
ax_matplotlib Axes: Axes with deliquency curve.
figure_matplotlib Figure: Figure containing the curve.

Examples

>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.linear_model import LogisticRegression
>>> from datalib import delinquency_curve, DeliquencyDisplay
>>> X, y = make_classification(random_state=0)
>>> X_train, X_test, y_train, y_test = train_test_split(
...     X, y, random_state=0)
>>> clf = LogisticRegression(random_state=0)
>>> clf.fit(X_train, y_train)
LogisticRegression(random_state=0)
>>> y_prob = clf.predict_proba(X_test)[:, 1]
>>> approval_rate, default_rate, optimal_rate =
...     delinquency_curve(y_test, y_prob)
>>> disp = DeliquencyDisplay(approval_rate, default_rate, optimal_rate)
>>> disp.plot()
<...>

classmethod from_estimator(estimator, X, y, *, pos_label=None, name=None, ref_line=True, ax=None, **kwargs)[source]

Plot deliquency curve using a binary classifier and data. A deliquency curve leverages inputs from a binary classifier and plots the default rates over unique approval rates, a.k.a fractions of the population, on the y-axis. Extra keyword arguments will be passed to matplotlib.pyplot.plot().

Parameters

estimatorestimator instance

Fitted classifier or a fitted Pipeline in which the last estimator is a classifier. The classifier must have a predict_proba method.

X{array-like, sparse matrix} of shape (n_samples, n_features)

Input values.

yarray-like of shape (n_samples,)

Binary target values.

pos_labelstr or int, default=None

The positive class when computing the calibration curve. By default, estimators.classes_[1] is considered as the positive class.

namestr, default=None

Name for labeling curve. If None, the name of the estimator: is used.

ref_linebool, default=True

If True, plots a reference line representing a perfectly calibrated classifier.

axmatplotlib axes, default=None

Axes object to plot on. If None, a new figure and axes is created.

**kwargsdict

Keyword arguments to be passed to matplotlib.pyplot.plot().

Returns

displayDeliquencyDisplay.: Object that stores computed values.

Examples

>>> import matplotlib.pyplot as plt
>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.linear_model import LogisticRegression
>>> from datalib import DeliquencyDisplay
>>> X, y = make_classification(random_state=0)
>>> X_train, X_test, y_train, y_test = train_test_split(
...     X, y, random_state=0)
>>> clf = LogisticRegression(random_state=0)
>>> clf.fit(X_train, y_train)
LogisticRegression(random_state=0)
>>> disp = DeliquencyDisplay.from_estimator(clf, X_test, y_test)
>>> plt.show()

classmethod from_predictions(y_true, y_prob, *, pos_label=None, name=None, ref_line=True, ax=None, **kwargs)[source]

Plot deliquency curve using true labels and predicted probabilities.

Parameters

y_truearray-like of shape (n_samples,): True labels.
y_probarray-like of shape (n_samples,): The predicted probabilities of the positive class.
pos_labelstr or int, default=None: The positive class when computing the calibration curve. By default, estimators.classes_[1] is considered as the positive class.
namestr, default=None: Name for labeling curve.
ref_linebool, default=True: If True, plots a reference line representing a perfectly calibrated classifier.
axmatplotlib axes, default=None: Axes object to plot on. If None, a new figure and axes is created.
**kwargsdict: Keyword arguments to be passed to matplotlib.pyplot.plot().

Returns

displayDeliquencyDisplay.: Object that stores computed values.

Examples

>>> import matplotlib.pyplot as plt
>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.linear_model import LogisticRegression
>>> from datalib import DeliquencyDisplay
>>> X, y = make_classification(random_state=0)
>>> X_train, X_test, y_train, y_test = train_test_split(
...     X, y, random_state=0)
>>> clf = LogisticRegression(random_state=0)
>>> clf.fit(X_train, y_train)
LogisticRegression(random_state=0)
>>> y_prob = clf.predict_proba(X_test)[:, 1]
>>> disp = DeliquencyDisplay.from_predictions(y_test, y_prob)
>>> plt.show()

plot(*, ax=None, name=None, ref_line=True, **kwargs)[source]

Plot visualization. Extra keyword arguments will be passed to matplotlib.pyplot.plot().

Parameters

axMatplotlib Axes, default=None: Axes object to plot on. If None, a new figure and axes is created.
namestr, default=None: Name for labeling curve. If None, use estimator_name if not None, otherwise no labeling is shown.
ref_linebool, default=True: If True, plots a reference line representing a perfectly calibrated classifier.
**kwargsdict: Keyword arguments to be passed to matplotlib.pyplot.plot().

Returns

displayDeliquencyDisplay: Object that stores computed values.

datalib.metrics.cap_curve(y_true, y_score, sample_weight=None)[source]

Base method to calculate the Cumulative Accuracy Profile Curve (CAP Curve). This metric ponders the rate of positive samples and the percentage of the dataset covered by each sequential cut-off threshold.

Parameters

y_truendarray of shape (n_samples,): True targets of binary classification.
y_scorendarray of shape (n_samples,): Estimated probabilities or output of a model / decision function.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights. If None, all samples are given the same weight.

Returns

cumulative_gainsndarray of shape (n_samples,): Cumulative gain with each threshold (percentage of class 1).
thresholdsndarray of shape (n_samples,): Increasing thresholds (percentage of examples) on the decision function used to compute cap curve.
gini: ndarray of shape (n_samples,): The normalized gini coefficient, calculated from the AUC.

datalib.metrics.delinquency_curve(y_true, y_score, pos_label=None)[source]

Delinquency curve for a binary classification.

The delinquency curve shows how the default rate (proportion of pos_labels) changes with different approval rates. The curve is typically plotted on a graph, with the approval rate on the x-axis and the default rate on the y-axis. The curve is created by sorting the samples by score and calculating the default rate for subsequently larger population getting the best scores at first.

Deliquency curve is key on many actuarial operations, where grasping the relative percentage of misclassification on approval levels is vital.

Parameters

y_truearray-like, shape (n_samples,): True binary labels. If labels are not either {-1, 1} or {0, 1}, then pos_label should be explicitly given.
y_scorearray-like, shape (n_samples,): Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).
pos_labelint or str, default=None: The label of the positive class.

Returns

approval_rate: array-like, shape (n_samples,).: Increasing approval rate (percentage of approved best scores) used to compute default_rate. It lies in the support (0, 1).
default_rate: array-like, shape (n_samples,).: Default rates values for the approval rates such that the element i it proportion of delinquents when approving approval_rate[i] of the population.
optimal_rate: array-like, shape (n_samples,).: Optimal default rates for a perfect model.

Metrics

Contents

Parameters

Attributes

Parameters

Returns

Examples

Parameters

Returns

Examples

Parameters

Returns

Parameters

Attributes

See Also

Examples

Parameters

Returns

See Also

Examples

Parameters

Returns

See Also

Examples

Parameters

Returns

Parameters

Returns

Parameters

Returns