savings_score#

empulse.metrics.savings_score(y_true, y_pred, *, baseline='zero_one', tp_cost=0.0, fp_cost=0.0, tn_cost=0.0, fn_cost=0.0, check_input=True)[source]#

Cost savings of a classifier compared to using a baseline.

The cost savings of a classifiers is the cost the classifier saved over a baseline classification model. By default, a naive algorithm is used (predicting all ones or zeros whichever is better). With 1 being the perfect model, 0 being as good as the baseline model, and values smaller than 0 being worse than the baseline model.

Modified from costcla.metrics.savings_score.

See also

expected_savings_score : Expected savings of a classifier compared to using a naive algorithm.

cost_loss : Cost of a classifier.

Parameters:
y_true1D array-like, shape=(n_samples,)

Binary target values (‘positive’: 1, ‘negative’: 0).

y_pred1D array-like, shape=(n_samples,)

Predicted labels or calibrated probabilities. If the predictions are calibrated probabilities, the optimal decision threshold is calculated for each instance as [2]:

\[t^*_i = \frac{C_i(1|0) - C_i(0|0)}{C_i(1|0) - C_i(0|0) + C_i(0|1) - C_i(1|1)}\]

Note

The optimal decision threshold is only accurate when the probabilities are well-calibrated. See scikit-learn’s user guide for more information.

baseline‘zero_one’ or 1D array-like, shape=(n_samples,), default=’zero_one’

Predicted labels or calibrated probabilities of the baseline model. If 'zero_one', the baseline model is a naive model that predicts all zeros or all ones depending on which is better.

tp_costfloat or array-like, shape=(n_samples,), default=0.0

Cost of true positives. If float, then all true positives have the same cost. If array-like, then it is the cost of each true positive classification.

fp_costfloat or array-like, shape=(n_samples,), default=0.0

Cost of false positives. If float, then all false positives have the same cost. If array-like, then it is the cost of each false positive classification.

tn_costfloat or array-like, shape=(n_samples,), default=0.0

Cost of true negatives. If float, then all true negatives have the same cost. If array-like, then it is the cost of each true negative classification.

fn_costfloat or array-like, shape=(n_samples,), default=0.0

Cost of false negatives. If float, then all false negatives have the same cost.

check_inputbool, default=True

Perform input validation. Turning off improves performance, useful when using this metric as a loss function.

Returns:
scorefloat

Cost savings of a classifier compared to using a baseline.

Notes

The cost of each instance \(C_i\) is calculated as [1]:

\[C_i(s_i) = y_i \cdot (\hat y_i \cdot C_i(1|1) + (1 - \hat y_i) \cdot C_i(0|1)) + (1 - \hat y_i) \cdot (\hat y_i \cdot C_i(1|0) + (1 - \hat y_i) \cdot C_i(0|0))\]

The savings over a naive model is calculated as:

\[\text{Savings} = 1 - \frac{\sum_{i=1}^N C_i(s_i)}{\min(\sum_{i=1}^N C_i(0), \sum_{i=1}^N C_i(1))}\]

The savings over a baseline model is calculated as:

\[\text{Savings} = 1 - \frac{\sum_{i=1}^N C_i(s_i)}{\sum_{i=1}^N C_i(s_i^*)}\]

where

  • \(y_i\) is the true label,

  • \(\hat y_i\) is the predicted label,

  • \(C_i(1|1)\) is the cost of a true positive tp_cost,

  • \(C_i(0|1)\) is the cost of a false positive fp_cost,

  • \(C_i(1|0)\) is the cost of a false negative fn_cost, and

  • \(C_i(0|0)\) is the cost of a true negative tn_cost.

  • \(N\) is the number of samples.

Code modified from costcla.metrics.cost_loss.

References

[1]

A. Correa Bahnsen, A. Stojanovic, D.Aouada, B, Ottersten, “Improving Credit Card Fraud Detection with Calibrated Probabilities”, in Proceedings of the fourteenth SIAM International Conference on Data Mining, 677-685, 2014.

[2]

Höppner, S., Baesens, B., Verbeke, W., & Verdonck, T. (2022). Instance-dependent cost-sensitive learning for detecting transfer fraud. European Journal of Operational Research, 297(1), 291-300.

Examples

import numpy as np
from empulse.metrics import savings_score
y_pred = [0, 1, 0, 0]
y_true = [0, 1, 1, 0]
fp_cost = np.array([4, 1, 2, 2])
fn_cost = np.array([1, 3, 3, 1])
savings_score(y_true, y_pred, fp_cost=fp_cost, fn_cost=fn_cost)