Metrics
Contents
- class datalib.metrics.CAPCurveDisplay(*, cumulative_gains, thresholds, positive_rate=None, gini=None, estimator_name=None, pos_label=None)[source]
Bases:
_BinaryClassifierCurveDisplayMixinCAP Curve visualization.
Parameters
- cumulative_gainsndarray
Cumulative gain with each threshold (percentage of class 1).
- thresholdsndarray
Increasing thresholds (percentage of examples) on the decision function used to compute cap curve.
- positive_ratendarray
Rate of positive class examples to compute the perfect curve.
- ginifloat, default=None
Gini score. If None, the gini score is not shown.
- estimator_namestr, default=None
Name of estimator. If None, the estimator name is not shown.
- pos_labelstr or int, default=None
The class considered as the positive class when computing the CAP curve. By default, estimators.classes_[1] is considered as the positive class.
Attributes
- line_matplotlib Artist
CAP Curve.
- ax_matplotlib Axes
Axes with CAP Curve.
- figure_matplotlib Figure
Figure containing the curve.
- classmethod from_estimator(estimator, X, y, *, sample_weight=None, response_method='auto', pos_label=None, plot_random=False, plot_perfect=False, name=None, ax=None, **kwargs)[source]
Create a CAP Curve display from an estimator.
Parameters
- estimatorestimator instance
Fitted classifier or a fitted
Pipelinein which the last estimator is a classifier.- X{array-like, sparse matrix} of shape (n_samples, n_features)
Input values.
- yarray-like of shape (n_samples,)
Target values.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- response_method{‘predict_proba’, ‘decision_function’, ‘auto’} default=’auto’
Specifies whether to use predict_proba or decision_function as the target response. If set to ‘auto’, predict_proba is tried first and if it does not exist decision_function is tried next.
- pos_labelstr or int, default=None
The class considered as the positive class when computing the ROC-AUC metric. By default, estimators.classes_[1] is considered as the positive class.
- plot_randomboolean, default = False
Flag indicating whether to plot the baseline random curve (True) or not (False).
- plot_perfectboolean, default = False
Flag indicating whether to plot the baseline perfect curve (True) or not (False).
- namestr, default=None
Name of CAP Curve for labeling. If None, use the name of the estimator.
- axmatplotlib axes, default=None
Axes object to plot on. If None, a new figure and axes is
created.
- **kwargsdict
Keyword arguments to be passed to matplotlib’s plot.
Returns
- display
CAPCurveDisplay The ROC Curve display.
Examples
>>> import matplotlib.pyplot as plt >>> from datalib.metrics import CAPCurveDisplay >>> from sklearn.datasets import make_classification >>> from sklearn.model_selection import train_test_split >>> from sklearn.svm import SVC >>> X, y = make_classification(random_state=0) >>> X_train, X_test, y_train, y_test = train_test_split(X, y) >>> clf = SVC(random_state=0).fit(X_train, y_train) >>> CAPCurveDisplay.from_estimator(clf, X_test, y_test) >>> plt.show()
- classmethod from_predictions(y_true, y_score, *, sample_weight=None, pos_label=None, plot_random=False, plot_perfect=False, name=None, ax=None, **kwargs)[source]
Plot CAP curve given the true and predicted score.
Parameters
- y_truearray-like of shape (n_samples,)
True labels.
- y_scorearray-like of shape (n_samples,)
Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- pos_labelstr or int, default=None
The label of the positive class. When pos_label=None, if y_true is in {-1, 1} or {0, 1}, pos_label is set to 1, otherwise an error will be raised.
- plot_randomboolean, default = False
Flag indicating whether to plot the baseline random curve (True) or not (False).
- plot_perfectboolean, default = False
Flag indicating whether to plot the baseline perfect curve (True) or not (False).
- namestr, default=None
Name of ROC curve for labeling. If None, name will be set to “Classifier”.
- axmatplotlib axes, default=None
Axes object to plot on. If None, a new figure and axes is created.
- **kwargsdict
Additional keywords arguments passed to matplotlib plot function.
Returns
- display
CAPCurveDisplay Object that stores computed values.
Examples
>>> import matplotlib.pyplot as plt >>> from datalib.metrics import CAPCurveDisplay >>> from sklearn.datasets import make_classification >>> from sklearn.model_selection import train_test_split >>> from sklearn.svm import SVC >>> X, y = make_classification(random_state=0) >>> X_train, X_test, y_train, y_test = train_test_split(X, y) >>> clf = SVC(random_state=0, probability=True).fit(X_train, y_train) >>> y_pred = clf.predict_proba(X_test)[:, 1] >>> CAPCurveDisplay.from_predictions(y_test, y_pred) >>> plt.show()
- plot(*, plot_random=False, plot_perfect=False, name=None, ax=None, **kwargs)[source]
Plot visualization Extra keyword arguments will be passed to matplotlib’s
plot.Parameters
- axmatplotlib axes, default=None
Axes object to plot on. If None, a new figure and axes is created.
- namestr, default=None
Name of CAP Curve for labeling. If None, use estimator_name if not None, otherwise no labeling is shown.
- plot_randomboolean, default = False
Flag indicating whether to plot the baseline random curve (True) or not (False).
- plot_perfectboolean, default = False
Flag indicating whether to plot the baseline perfect curve (True) or not (False).
Returns
- display
CAPCurveDisplay Object that stores computed values.
- class datalib.metrics.DeliquencyDisplay(approval_rate, default_rate, optimal_rate, *, estimator_name=None, pos_label=None)[source]
Bases:
_BinaryClassifierCurveDisplayMixinDeliquency curve visualization. It is recommended to use
from_estimator()orfrom_predictions()to create a DeliquencyDisplay. All parameters are stored as attributes.Parameters
- approval_ratearray-like, shape (n_samples,)
The relative percentage population approved.
- default_ratearray-like, shape (n_samples,)
The default rate, a.k.a relative percentage of positives on the sample.
- optimal_ratearray-like, shape (n_samples,)
The optimal default rate.
- estimator_namestr, default=None
Name of estimator. If None, the estimator name is not shown.
- pos_labelstr or int, default=None
The positive class when computing the deliquency curve. By default, estimators.classes_[1] is considered as the positive class.
Attributes
See Also
delinquency_curve : The main method to calculate needed curves for the delinquency analysis.
DeliquencyDisplay.from_predictions : Plot deliquency curve using approval, default, and optimal rates.
DeliquencyDisplay.from_estimator : Plot deliquency curve using an estimator and data.
Examples
>>> from sklearn.datasets import make_classification >>> from sklearn.model_selection import train_test_split >>> from sklearn.linear_model import LogisticRegression >>> from datalib import delinquency_curve, DeliquencyDisplay >>> X, y = make_classification(random_state=0) >>> X_train, X_test, y_train, y_test = train_test_split( ... X, y, random_state=0) >>> clf = LogisticRegression(random_state=0) >>> clf.fit(X_train, y_train) LogisticRegression(random_state=0) >>> y_prob = clf.predict_proba(X_test)[:, 1] >>> approval_rate, default_rate, optimal_rate = ... delinquency_curve(y_test, y_prob) >>> disp = DeliquencyDisplay(approval_rate, default_rate, optimal_rate) >>> disp.plot() <...>
- classmethod from_estimator(estimator, X, y, *, pos_label=None, name=None, ref_line=True, ax=None, **kwargs)[source]
Plot deliquency curve using a binary classifier and data. A deliquency curve leverages inputs from a binary classifier and plots the default rates over unique approval rates, a.k.a fractions of the population, on the y-axis. Extra keyword arguments will be passed to
matplotlib.pyplot.plot().Parameters
- estimatorestimator instance
Fitted classifier or a fitted
Pipelinein which the last estimator is a classifier. The classifier must have a predict_proba method.- X{array-like, sparse matrix} of shape (n_samples, n_features)
Input values.
- yarray-like of shape (n_samples,)
Binary target values.
- pos_labelstr or int, default=None
The positive class when computing the calibration curve. By default, estimators.classes_[1] is considered as the positive class.
- namestr, default=None
- Name for labeling curve. If None, the name of the estimator
is used.
- ref_linebool, default=True
If True, plots a reference line representing a perfectly calibrated classifier.
- axmatplotlib axes, default=None
Axes object to plot on. If None, a new figure and axes is created.
- **kwargsdict
Keyword arguments to be passed to
matplotlib.pyplot.plot().
Returns
- display
DeliquencyDisplay. Object that stores computed values.
See Also
- DeliquencyDisplay.from_predictionsPlot deliquency curve using
true and predicted labels.
Examples
>>> import matplotlib.pyplot as plt >>> from sklearn.datasets import make_classification >>> from sklearn.model_selection import train_test_split >>> from sklearn.linear_model import LogisticRegression >>> from datalib import DeliquencyDisplay >>> X, y = make_classification(random_state=0) >>> X_train, X_test, y_train, y_test = train_test_split( ... X, y, random_state=0) >>> clf = LogisticRegression(random_state=0) >>> clf.fit(X_train, y_train) LogisticRegression(random_state=0) >>> disp = DeliquencyDisplay.from_estimator(clf, X_test, y_test) >>> plt.show()
- classmethod from_predictions(y_true, y_prob, *, pos_label=None, name=None, ref_line=True, ax=None, **kwargs)[source]
Plot deliquency curve using true labels and predicted probabilities.
Parameters
- y_truearray-like of shape (n_samples,)
True labels.
- y_probarray-like of shape (n_samples,)
The predicted probabilities of the positive class.
- pos_labelstr or int, default=None
The positive class when computing the calibration curve. By default, estimators.classes_[1] is considered as the positive class.
- namestr, default=None
Name for labeling curve.
- ref_linebool, default=True
If True, plots a reference line representing a perfectly calibrated classifier.
- axmatplotlib axes, default=None
Axes object to plot on. If None, a new figure and axes is created.
- **kwargsdict
Keyword arguments to be passed to
matplotlib.pyplot.plot().
Returns
- display
DeliquencyDisplay. Object that stores computed values.
See Also
- DeliquencyDisplay.from_estimatorPlot calibration curve using an
estimator and data.
Examples
>>> import matplotlib.pyplot as plt >>> from sklearn.datasets import make_classification >>> from sklearn.model_selection import train_test_split >>> from sklearn.linear_model import LogisticRegression >>> from datalib import DeliquencyDisplay >>> X, y = make_classification(random_state=0) >>> X_train, X_test, y_train, y_test = train_test_split( ... X, y, random_state=0) >>> clf = LogisticRegression(random_state=0) >>> clf.fit(X_train, y_train) LogisticRegression(random_state=0) >>> y_prob = clf.predict_proba(X_test)[:, 1] >>> disp = DeliquencyDisplay.from_predictions(y_test, y_prob) >>> plt.show()
- plot(*, ax=None, name=None, ref_line=True, **kwargs)[source]
Plot visualization. Extra keyword arguments will be passed to
matplotlib.pyplot.plot().Parameters
- axMatplotlib Axes, default=None
Axes object to plot on. If None, a new figure and axes is created.
- namestr, default=None
Name for labeling curve. If None, use estimator_name if not None, otherwise no labeling is shown.
- ref_linebool, default=True
If True, plots a reference line representing a perfectly calibrated classifier.
- **kwargsdict
Keyword arguments to be passed to
matplotlib.pyplot.plot().
Returns
- display
DeliquencyDisplay Object that stores computed values.
- datalib.metrics.cap_curve(y_true, y_score, sample_weight=None)[source]
Base method to calculate the Cumulative Accuracy Profile Curve (CAP Curve). This metric ponders the rate of positive samples and the percentage of the dataset covered by each sequential cut-off threshold.
Parameters
- y_truendarray of shape (n_samples,)
True targets of binary classification.
- y_scorendarray of shape (n_samples,)
Estimated probabilities or output of a model / decision function.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights. If None, all samples are given the same weight.
Returns
- cumulative_gainsndarray of shape (n_samples,)
Cumulative gain with each threshold (percentage of class 1).
- thresholdsndarray of shape (n_samples,)
Increasing thresholds (percentage of examples) on the decision function used to compute cap curve.
- gini: ndarray of shape (n_samples,)
The normalized gini coefficient, calculated from the AUC.
- datalib.metrics.delinquency_curve(y_true, y_score, pos_label=None)[source]
Delinquency curve for a binary classification.
The delinquency curve shows how the default rate (proportion of pos_labels) changes with different approval rates. The curve is typically plotted on a graph, with the approval rate on the x-axis and the default rate on the y-axis. The curve is created by sorting the samples by score and calculating the default rate for subsequently larger population getting the best scores at first.
Deliquency curve is key on many actuarial operations, where grasping the relative percentage of misclassification on approval levels is vital.
Parameters
- y_truearray-like, shape (n_samples,)
True binary labels. If labels are not either {-1, 1} or {0, 1}, then pos_label should be explicitly given.
- y_scorearray-like, shape (n_samples,)
Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).
- pos_labelint or str, default=None
The label of the positive class.
Returns
- approval_rate: array-like, shape (n_samples,).
Increasing approval rate (percentage of approved best scores) used to compute default_rate. It lies in the support (0, 1).
- default_rate: array-like, shape (n_samples,).
Default rates values for the approval rates such that the element i it proportion of delinquents when approving approval_rate[i] of the population.
- optimal_rate: array-like, shape (n_samples,).
Optimal default rates for a perfect model.