expected_savings_score#

empulse.metrics.expected_savings_score(y_true, y_proba, *, baseline='zero_one', tp_cost=0.0, fp_cost=0.0, tn_cost=0.0, fn_cost=0.0, check_input=True)[source]#

Expected savings of a classifier compared to a baseline.

The expected cost savings of a classifiers is the expected cost the classifier saved over a baseline classification model. By default, a naive model is used (predicting all ones or zeros whichever is better). With 1 being the perfect model, 0 being as good as the baseline model, and values smaller than 0 being worse than the baseline model.

See also

savings_score : Cost savings of a classifier compared to a baseline.

expected_cost_loss : Expected cost of a classifier.

Parameters:
y_true1D array-like, shape=(n_samples,)

Binary target values (‘positive’: 1, ‘negative’: 0).

y_proba1D array-like, shape=(n_samples,)

Target probabilities, should lie between 0 and 1.

baseline{‘zero_one’, ‘prior’} or 1D array-like, shape=(n_samples,), default=’zero_one’
  • If 'zero_one', the baseline model is a naive model that predicts all zeros or all ones depending on which is better.

  • If 'prior', the baseline model is a model that predicts the prior probability of the majority or minority class depending on which is better.

  • If array-like, target probabilities of the baseline model.

tp_costfloat or array-like, shape=(n_samples,), default=0.0

Cost of true positives. If float, then all true positives have the same cost. If array-like, then it is the cost of each true positive classification.

fp_costfloat or array-like, shape=(n_samples,), default=0.0

Cost of false positives. If float, then all false positives have the same cost. If array-like, then it is the cost of each false positive classification.

tn_costfloat or array-like, shape=(n_samples,), default=0.0

Cost of true negatives. If float, then all true negatives have the same cost. If array-like, then it is the cost of each true negative classification.

fn_costfloat or array-like, shape=(n_samples,), default=0.0

Cost of false negatives. If float, then all false negatives have the same cost.

check_inputbool, default=True

Perform input validation. Turning off improves performance, useful when using this metric as a loss function.

Returns:
scorefloat

Expected savings of a classifier compared to a baseline.

Notes

The expected cost of each instance \(\mathbb{E}[C_i]\) is calculated as [1]:

\[\mathbb{E}[C_i(s_i)] = y_i \cdot (s_i \cdot C_i(1|1) + (1 - s_i) \cdot C_i(0|1)) + (1 - s_i) \cdot (s_i \cdot C_i(1|0) + (1 - s_i) \cdot C_i(0|0))\]

The expected savings over a naive model is calculated as:

\[\text{Expected Savings} = 1 - \frac{\sum_{i=1}^N \mathbb{E}[C_i(s_i)]}{\min(\sum_{i=1}^N C_i(0), \sum_{i=1}^N C_i(1))}\]

The expected savings over a baseline model is calculated as:

\[\text{Expected Savings} = 1 - \frac{\sum_{i=1}^N \mathbb{E}[C_i(s_i)]}{\sum_{i=1}^N \mathbb{E}[C_i(s_i^*)]}\]

where

  • \(y_i\) is the true label,

  • \(\hat y_i\) is the predicted label,

  • \(C_i(1|1)\) is the cost of a true positive tp_cost,

  • \(C_i(0|1)\) is the cost of a false positive fp_cost,

  • \(C_i(1|0)\) is the cost of a false negative fn_cost, and

  • \(C_i(0|0)\) is the cost of a true negative tn_cost.

  • \(N\) is the number of samples.

References

[1]

Höppner, S., Baesens, B., Verbeke, W., & Verdonck, T. (2022). Instance-dependent cost-sensitive learning for detecting transfer fraud. European Journal of Operational Research, 297(1), 291-300.

Examples

import numpy as np
from empulse.metrics import expected_savings_score
y_pred = [0.4, 0.8, 0.75, 0.1]
y_true = [0, 1, 1, 0]
fp_cost = np.array([4, 1, 2, 2])
fn_cost = np.array([1, 3, 3, 1])
expected_savings_score(y_true, y_pred, fp_cost=fp_cost, fn_cost=fn_cost)