expected_savings_score#
- empulse.metrics.expected_savings_score(y_true, y_proba, *, baseline='zero_one', tp_cost=0.0, fp_cost=0.0, tn_cost=0.0, fn_cost=0.0, check_input=True)[source]#
Expected savings of a classifier compared to a baseline.
The expected cost savings of a classifiers is the expected cost the classifier saved over a baseline classification model. By default, a naive model is used (predicting all ones or zeros whichever is better). With 1 being the perfect model, 0 being as good as the baseline model, and values smaller than 0 being worse than the baseline model.
See also
savings_score
: Cost savings of a classifier compared to a baseline.expected_cost_loss
: Expected cost of a classifier.- Parameters:
- y_true1D array-like, shape=(n_samples,)
Binary target values (‘positive’: 1, ‘negative’: 0).
- y_proba1D array-like, shape=(n_samples,)
Target probabilities, should lie between 0 and 1.
- baseline{‘zero_one’, ‘prior’} or 1D array-like, shape=(n_samples,), default=’zero_one’
If
'zero_one'
, the baseline model is a naive model that predicts all zeros or all ones depending on which is better.If
'prior'
, the baseline model is a model that predicts the prior probability of the majority or minority class depending on which is better.If array-like, target probabilities of the baseline model.
- tp_costfloat or array-like, shape=(n_samples,), default=0.0
Cost of true positives. If
float
, then all true positives have the same cost. If array-like, then it is the cost of each true positive classification.- fp_costfloat or array-like, shape=(n_samples,), default=0.0
Cost of false positives. If
float
, then all false positives have the same cost. If array-like, then it is the cost of each false positive classification.- tn_costfloat or array-like, shape=(n_samples,), default=0.0
Cost of true negatives. If
float
, then all true negatives have the same cost. If array-like, then it is the cost of each true negative classification.- fn_costfloat or array-like, shape=(n_samples,), default=0.0
Cost of false negatives. If
float
, then all false negatives have the same cost.- check_inputbool, default=True
Perform input validation. Turning off improves performance, useful when using this metric as a loss function.
- Returns:
- scorefloat
Expected savings of a classifier compared to a baseline.
Notes
The expected cost of each instance \(\mathbb{E}[C_i]\) is calculated as [1]:
\[\mathbb{E}[C_i(s_i)] = y_i \cdot (s_i \cdot C_i(1|1) + (1 - s_i) \cdot C_i(0|1)) + (1 - s_i) \cdot (s_i \cdot C_i(1|0) + (1 - s_i) \cdot C_i(0|0))\]The expected savings over a naive model is calculated as:
\[\text{Expected Savings} = 1 - \frac{\sum_{i=1}^N \mathbb{E}[C_i(s_i)]}{\min(\sum_{i=1}^N C_i(0), \sum_{i=1}^N C_i(1))}\]The expected savings over a baseline model is calculated as:
\[\text{Expected Savings} = 1 - \frac{\sum_{i=1}^N \mathbb{E}[C_i(s_i)]}{\sum_{i=1}^N \mathbb{E}[C_i(s_i^*)]}\]where
\(y_i\) is the true label,
\(\hat y_i\) is the predicted label,
\(C_i(1|1)\) is the cost of a true positive
tp_cost
,\(C_i(0|1)\) is the cost of a false positive
fp_cost
,\(C_i(1|0)\) is the cost of a false negative
fn_cost
, and\(C_i(0|0)\) is the cost of a true negative
tn_cost
.\(N\) is the number of samples.
References
[1]Höppner, S., Baesens, B., Verbeke, W., & Verdonck, T. (2022). Instance-dependent cost-sensitive learning for detecting transfer fraud. European Journal of Operational Research, 297(1), 291-300.
Examples
import numpy as np from empulse.metrics import expected_savings_score y_pred = [0.4, 0.8, 0.75, 0.1] y_true = [0, 1, 1, 0] fp_cost = np.array([4, 1, 2, 2]) fn_cost = np.array([1, 3, 3, 1]) expected_savings_score(y_true, y_pred, fp_cost=fp_cost, fn_cost=fn_cost)