expected_cost_loss#

empulse.metrics.expected_cost_loss(y_true, y_proba, *, tp_cost=0.0, fp_cost=0.0, tn_cost=0.0, fn_cost=0.0, normalize=False, check_input=True)[source]#

Expected cost of a classifier.

The expected cost of a classifier is the sum of the expected costs of each instance. This allows you to give attribute specific costs (or benefits in case of negative costs) to each type of classification. For example, in a credit card fraud detection problem, the cost of a false negative (not detecting a fraudulent transaction) is higher than the cost of a false positive (flagging a non-fraudulent transaction as fraudulent).

See also

cost_loss : Cost of a classifier.

expected_savings_score : Expected savings of a classifier compared to using a baseline.

Parameters:
y_true1D array-like, shape=(n_samples,)

Binary target values (‘positive’: 1, ‘negative’: 0).

y_proba1D array-like, shape=(n_samples,)

Target probabilities, should lie between 0 and 1.

tp_costfloat or array-like, shape=(n_samples,), default=0.0

Cost of true positives. If float, then all true positives have the same cost. If array-like, then it is the cost of each true positive classification.

fp_costfloat or array-like, shape=(n_samples,), default=0.0

Cost of false positives. If float, then all false positives have the same cost. If array-like, then it is the cost of each false positive classification.

tn_costfloat or array-like, shape=(n_samples,), default=0.0

Cost of true negatives. If float, then all true negatives have the same cost. If array-like, then it is the cost of each true negative classification.

fn_costfloat or array-like, shape=(n_samples,), default=0.0

Cost of false negatives. If float, then all false negatives have the same cost.

normalizebool, default=False

Normalize the cost by the number of samples. If True, return the average expected cost [3].

check_inputbool, default=True

Perform input validation. Turning off improves performance, useful when using this metric as a loss function.

Returns:
cost_lossfloat

Cost of a classifier.

Notes

The expected cost of each instance \(\mathbb{E}[C_i]\) is calculated as [3]:

\[\mathbb{E}[C_i] = y_i \cdot (s_i \cdot C_i(1|1) + (1 - s_i) \cdot C_i(0|1)) + (1 - y_i) \cdot (s_i \cdot C_i(1|0) + (1 - s_i) \cdot C_i(0|0))\]

where

  • \(y_i\) is the true label,

  • \(s_i\) is the predicted probability,

  • \(C_i(1|1)\) is the cost of a true positive tp_cost,

  • \(C_i(0|1)\) is the cost of a false positive fp_cost,

  • \(C_i(1|0)\) is the cost of a false negative fn_cost, and

  • \(C_i(0|0)\) is the cost of a true negative tn_cost.

Code modified from costcla.metrics.cost_loss.

References

[1]

C. Elkan, “The foundations of Cost-Sensitive Learning”, in Seventeenth International Joint Conference on Artificial Intelligence, 973-978, 2001.

[2]

A. Correa Bahnsen, A. Stojanovic, D.Aouada, B, Ottersten, “Improving Credit Card Fraud Detection with Calibrated Probabilities”, in Proceedings of the fourteenth SIAM International Conference on Data Mining, 677-685, 2014.

[3] (1,2)

Höppner, S., Baesens, B., Verbeke, W., & Verdonck, T. (2022). Instance-dependent cost-sensitive learning for detecting transfer fraud. European Journal of Operational Research, 297(1), 291-300.

Examples

>>> import numpy as np
>>> from empulse.metrics import expected_cost_loss
>>> y_proba = [0.2, 0.9, 0.1, 0.2]
>>> y_true = [0, 1, 1, 0]
>>> fp_cost = np.array([4, 1, 2, 2])
>>> fn_cost = np.array([1, 3, 3, 1])
>>> expected_cost_loss(y_true, y_proba, fp_cost=fp_cost, fn_cost=fn_cost)
4.2