cost_loss#

empulse.metrics.cost_loss(y_true, y_pred, *, tp_cost=0.0, fp_cost=0.0, tn_cost=0.0, fn_cost=0.0, normalize=False, check_input=True)[source]#

Cost of a classifier.

The cost of a classifier is the sum of the costs of each instance. This allows you to give attribute specific costs (or benefits in case of negative costs) to each type of classification. For example, in a credit card fraud detection problem, the cost of a false negative (not detecting a fraudulent transaction) is higher than the cost of a false positive (flagging a non-fraudulent transaction as fraudulent).

See also

expected_cost_loss : Expected cost of a classifier.

savings_score : Cost savings of a classifier compared to using a baseline.

Parameters:

y_true1D array-like, shape=(n_samples,): Binary target values (‘positive’: 1, ‘negative’: 0).
y_pred1D array-like, shape=(n_samples,): Predicted labels or calibrated probabilities. If the predictions are calibrated probabilities, the optimal decision threshold is calculated for each instance as [3]:

\[t^*_i = \frac{C_i(1|0) - C_i(0|0)}{C_i(1|0) - C_i(0|0) + C_i(0|1) - C_i(1|1)}\]

Note

The optimal decision threshold is only accurate when the probabilities are well-calibrated. See scikit-learn’s user guide for more information.
tp_costfloat or array-like, shape=(n_samples,), default=0.0: Cost of true positives. If float, then all true positives have the same cost. If array-like, then it is the cost of each true positive classification.
fp_costfloat or array-like, shape=(n_samples,), default=0.0: Cost of false positives. If float, then all false positives have the same cost. If array-like, then it is the cost of each false positive classification.
tn_costfloat or array-like, shape=(n_samples,), default=0.0: Cost of true negatives. If float, then all true negatives have the same cost. If array-like, then it is the cost of each true negative classification.
fn_costfloat or array-like, shape=(n_samples,), default=0.0: Cost of false negatives. If float, then all false negatives have the same cost.
normalizebool, default=False: Normalize the cost by the number of samples. If True, return the average cost.
check_inputbool, default=True: Perform input validation. Turning off improves performance, useful when using this metric as a loss function.

Returns:

cost_lossfloat: Cost of a classifier.

Notes

The cost of each instance \(C_i\) is calculated as [3]:

\[C_i = y_i \cdot (\hat y_i \cdot C_i(1|1) + (1 - \hat y_i) \cdot C_i(0|1)) + (1 - \hat y_i) \cdot (\hat y_i \cdot C_i(1|0) + (1 - \hat y_i) \cdot C_i(0|0))\]

where

\(y_i\) is the true label,

\(\hat y_i\) is the predicted label,

\(C_i(1|1)\) is the cost of a true positive tp_cost,

\(C_i(0|1)\) is the cost of a false positive fp_cost,

\(C_i(1|0)\) is the cost of a false negative fn_cost, and

\(C_i(0|0)\) is the cost of a true negative tn_cost.

Code modified from costcla.metrics.cost_loss.

References

[1]

C. Elkan, “The foundations of Cost-Sensitive Learning”, in Seventeenth International Joint Conference on Artificial Intelligence, 973-978, 2001.

[2]

A. Correa Bahnsen, A. Stojanovic, D.Aouada, B, Ottersten, “Improving Credit Card Fraud Detection with Calibrated Probabilities”, in Proceedings of the fourteenth SIAM International Conference on Data Mining, 677-685, 2014.

[3] (1,2)

Höppner, S., Baesens, B., Verbeke, W., & Verdonck, T. (2022). Instance-dependent cost-sensitive learning for detecting transfer fraud. European Journal of Operational Research, 297(1), 291-300.

Examples

>>> import numpy as np
>>> from empulse.metrics import cost_loss
>>> y_pred = [0, 1, 0, 0]
>>> y_true = [0, 1, 1, 0]
>>> fp_cost = np.array([4, 1, 2, 2])
>>> fn_cost = np.array([1, 3, 3, 1])
>>> cost_loss(y_true, y_pred, fp_cost=fp_cost, fn_cost=fn_cost)
3.0