expected_cost_loss#
- empulse.metrics.expected_cost_loss(y_true, y_proba, *, tp_cost=0.0, fp_cost=0.0, tn_cost=0.0, fn_cost=0.0, normalize=False, check_input=True)[source]#
Expected cost of a classifier.
The expected cost of a classifier is the sum of the expected costs of each instance. This allows you to give attribute specific costs (or benefits in case of negative costs) to each type of classification. For example, in a credit card fraud detection problem, the cost of a false negative (not detecting a fraudulent transaction) is higher than the cost of a false positive (flagging a non-fraudulent transaction as fraudulent).
See also
cost_loss
: Cost of a classifier.expected_savings_score
: Expected savings of a classifier compared to using a baseline.- Parameters:
- y_true1D array-like, shape=(n_samples,)
Binary target values (‘positive’: 1, ‘negative’: 0).
- y_proba1D array-like, shape=(n_samples,)
Target probabilities, should lie between 0 and 1.
- tp_costfloat or array-like, shape=(n_samples,), default=0.0
Cost of true positives. If
float
, then all true positives have the same cost. If array-like, then it is the cost of each true positive classification.- fp_costfloat or array-like, shape=(n_samples,), default=0.0
Cost of false positives. If
float
, then all false positives have the same cost. If array-like, then it is the cost of each false positive classification.- tn_costfloat or array-like, shape=(n_samples,), default=0.0
Cost of true negatives. If
float
, then all true negatives have the same cost. If array-like, then it is the cost of each true negative classification.- fn_costfloat or array-like, shape=(n_samples,), default=0.0
Cost of false negatives. If
float
, then all false negatives have the same cost.- normalizebool, default=False
Normalize the cost by the number of samples. If
True
, return the average expected cost [3].- check_inputbool, default=True
Perform input validation. Turning off improves performance, useful when using this metric as a loss function.
- Returns:
- cost_lossfloat
Cost of a classifier.
Notes
The expected cost of each instance \(\mathbb{E}[C_i]\) is calculated as [3]:
\[\mathbb{E}[C_i] = y_i \cdot (s_i \cdot C_i(1|1) + (1 - s_i) \cdot C_i(0|1)) + (1 - y_i) \cdot (s_i \cdot C_i(1|0) + (1 - s_i) \cdot C_i(0|0))\]where
\(y_i\) is the true label,
\(s_i\) is the predicted probability,
\(C_i(1|1)\) is the cost of a true positive
tp_cost
,\(C_i(0|1)\) is the cost of a false positive
fp_cost
,\(C_i(1|0)\) is the cost of a false negative
fn_cost
, and\(C_i(0|0)\) is the cost of a true negative
tn_cost
.
Code modified from costcla.metrics.cost_loss.
References
[1]C. Elkan, “The foundations of Cost-Sensitive Learning”, in Seventeenth International Joint Conference on Artificial Intelligence, 973-978, 2001.
[2]A. Correa Bahnsen, A. Stojanovic, D.Aouada, B, Ottersten, “Improving Credit Card Fraud Detection with Calibrated Probabilities”, in Proceedings of the fourteenth SIAM International Conference on Data Mining, 677-685, 2014.
Examples
>>> import numpy as np >>> from empulse.metrics import expected_cost_loss >>> y_proba = [0.2, 0.9, 0.1, 0.2] >>> y_true = [0, 1, 1, 0] >>> fp_cost = np.array([4, 1, 2, 2]) >>> fn_cost = np.array([1, 3, 3, 1]) >>> expected_cost_loss(y_true, y_proba, fp_cost=fp_cost, fn_cost=fn_cost) 4.2