savings_score#
- empulse.metrics.savings_score(y_true, y_pred, *, baseline='zero_one', tp_cost=0.0, fp_cost=0.0, tn_cost=0.0, fn_cost=0.0, check_input=True)[source]#
Cost savings of a classifier compared to using a baseline.
The cost savings of a classifiers is the cost the classifier saved over a baseline classification model. By default, a naive algorithm is used (predicting all ones or zeros whichever is better). With 1 being the perfect model, 0 being as good as the baseline model, and values smaller than 0 being worse than the baseline model.
Modified from costcla.metrics.savings_score.
See also
expected_savings_score
: Expected savings of a classifier compared to using a naive algorithm.cost_loss
: Cost of a classifier.- Parameters:
- y_true1D array-like, shape=(n_samples,)
Binary target values (‘positive’: 1, ‘negative’: 0).
- y_pred1D array-like, shape=(n_samples,)
Predicted labels or calibrated probabilities. If the predictions are calibrated probabilities, the optimal decision threshold is calculated for each instance as [2]:
\[t^*_i = \frac{C_i(1|0) - C_i(0|0)}{C_i(1|0) - C_i(0|0) + C_i(0|1) - C_i(1|1)}\]Note
The optimal decision threshold is only accurate when the probabilities are well-calibrated. See scikit-learn’s user guide for more information.
- baseline‘zero_one’ or 1D array-like, shape=(n_samples,), default=’zero_one’
Predicted labels or calibrated probabilities of the baseline model. If
'zero_one'
, the baseline model is a naive model that predicts all zeros or all ones depending on which is better.- tp_costfloat or array-like, shape=(n_samples,), default=0.0
Cost of true positives. If
float
, then all true positives have the same cost. If array-like, then it is the cost of each true positive classification.- fp_costfloat or array-like, shape=(n_samples,), default=0.0
Cost of false positives. If
float
, then all false positives have the same cost. If array-like, then it is the cost of each false positive classification.- tn_costfloat or array-like, shape=(n_samples,), default=0.0
Cost of true negatives. If
float
, then all true negatives have the same cost. If array-like, then it is the cost of each true negative classification.- fn_costfloat or array-like, shape=(n_samples,), default=0.0
Cost of false negatives. If
float
, then all false negatives have the same cost.- check_inputbool, default=True
Perform input validation. Turning off improves performance, useful when using this metric as a loss function.
- Returns:
- scorefloat
Cost savings of a classifier compared to using a baseline.
Notes
The cost of each instance \(C_i\) is calculated as [1]:
\[C_i(s_i) = y_i \cdot (\hat y_i \cdot C_i(1|1) + (1 - \hat y_i) \cdot C_i(0|1)) + (1 - \hat y_i) \cdot (\hat y_i \cdot C_i(1|0) + (1 - \hat y_i) \cdot C_i(0|0))\]The savings over a naive model is calculated as:
\[\text{Savings} = 1 - \frac{\sum_{i=1}^N C_i(s_i)}{\min(\sum_{i=1}^N C_i(0), \sum_{i=1}^N C_i(1))}\]The savings over a baseline model is calculated as:
\[\text{Savings} = 1 - \frac{\sum_{i=1}^N C_i(s_i)}{\sum_{i=1}^N C_i(s_i^*)}\]where
\(y_i\) is the true label,
\(\hat y_i\) is the predicted label,
\(C_i(1|1)\) is the cost of a true positive
tp_cost
,\(C_i(0|1)\) is the cost of a false positive
fp_cost
,\(C_i(1|0)\) is the cost of a false negative
fn_cost
, and\(C_i(0|0)\) is the cost of a true negative
tn_cost
.\(N\) is the number of samples.
Code modified from costcla.metrics.cost_loss.
References
[1]A. Correa Bahnsen, A. Stojanovic, D.Aouada, B, Ottersten, “Improving Credit Card Fraud Detection with Calibrated Probabilities”, in Proceedings of the fourteenth SIAM International Conference on Data Mining, 677-685, 2014.
[2]Höppner, S., Baesens, B., Verbeke, W., & Verdonck, T. (2022). Instance-dependent cost-sensitive learning for detecting transfer fraud. European Journal of Operational Research, 297(1), 291-300.
Examples
import numpy as np from empulse.metrics import savings_score y_pred = [0, 1, 0, 0] y_true = [0, 1, 1, 0] fp_cost = np.array([4, 1, 2, 2]) fn_cost = np.array([1, 3, 3, 1]) savings_score(y_true, y_pred, fp_cost=fp_cost, fn_cost=fn_cost)