expected_log_cost_loss#
- empulse.metrics.expected_log_cost_loss(y_true, y_proba, *, tp_cost=0.0, tn_cost=0.0, fn_cost=0.0, fp_cost=0.0, normalize=False, check_input=True)[source]#
Expected log cost of a classifier.
The expected log cost of a classifier is the sum of the expected log costs of each instance. This allows you to give attribute specific costs (or benefits in case of negative costs) to each type of classification. For example, in a credit card fraud detection problem, the cost of a false negative (not detecting a fraudulent transaction) is higher than the cost of a false positive (flagging a non-fraudulent transaction as fraudulent).
See also
expected_cost_loss
: Expected cost of a classifier.- Parameters:
- y_true1D array-like, shape=(n_samples,)
Binary target values (‘positive’: 1, ‘negative’: 0).
- y_proba1D array-like, shape=(n_samples,)
Target probabilities, should lie between 0 and 1.
- tp_costfloat or array-like, shape=(n_samples,), default=0.0
Cost of true positives. If
float
, then all true positives have the same cost. If array-like, then it is the cost of each true positive classification.- fp_costfloat or array-like, shape=(n_samples,), default=0.0
Cost of false positives. If
float
, then all false positives have the same cost. If array-like, then it is the cost of each false positive classification.- tn_costfloat or array-like, shape=(n_samples,), default=0.0
Cost of true negatives. If
float
, then all true negatives have the same cost. If array-like, then it is the cost of each true negative classification.- fn_costfloat or array-like, shape=(n_samples,), default=0.0
Cost of false negatives. If
float
, then all false negatives have the same cost.- normalizebool, default=False
Normalize the cost by the number of samples. If
True
, return the log average expected cost.- check_inputbool, default=True
Perform input validation. Turning off improves performance, useful when using this metric as a loss function.
- Returns:
- log_expected_costfloat
Log expected cost.
Notes
The expected log cost of each instance \(\mathbb{E}[C^l_i]\) is calculated as:
\[\mathbb{E}[C^l_i] = y_i \cdot (\log(s_i) \cdot C_i(1|1) + \log(1 - s_i) \cdot C_i(0|1)) + (1 - y_i) \cdot (\log(s_i) \cdot C_i(1|0) + \log(1 - s_i) \cdot C_i(0|0))\]where
\(y_i\) is the true label,
\(s_i\) is the predicted probability,
\(C_i(1|1)\) is the cost of a true positive
tp_cost
,\(C_i(0|1)\) is the cost of a false positive
fp_cost
,\(C_i(1|0)\) is the cost of a false negative
fn_cost
, and\(C_i(0|0)\) is the cost of a true negative
tn_cost
.
When
tp_cost
andtn_cost
equal -1, and fp_cost` andtn_cost
equal 0, the expected log cost is equivalent to the log losssklearn.metrics.log_loss
.Examples
import numpy as np from empulse.metrics import expected_log_cost_loss y_proba = [0.1, 0.9, 0.8, 0.2] y_true = [0, 1, 1, 0] fp_cost = np.array([4, 1, 2, 2]) fn_cost = np.array([1, 3, 3, 1]) expected_log_cost_loss(y_true, y_proba, fp_cost=fp_cost, fn_cost=fn_cost)