expected_log_cost_loss#

empulse.metrics.expected_log_cost_loss(y_true, y_proba, *, tp_cost=0.0, tn_cost=0.0, fn_cost=0.0, fp_cost=0.0, normalize=False, check_input=True)[source]#

Expected log cost of a classifier.

The expected log cost of a classifier is the sum of the expected log costs of each instance. This allows you to give attribute specific costs (or benefits in case of negative costs) to each type of classification. For example, in a credit card fraud detection problem, the cost of a false negative (not detecting a fraudulent transaction) is higher than the cost of a false positive (flagging a non-fraudulent transaction as fraudulent).

See also

expected_cost_loss : Expected cost of a classifier.

Parameters:
y_true1D array-like, shape=(n_samples,)

Binary target values (‘positive’: 1, ‘negative’: 0).

y_proba1D array-like, shape=(n_samples,)

Target probabilities, should lie between 0 and 1.

tp_costfloat or array-like, shape=(n_samples,), default=0.0

Cost of true positives. If float, then all true positives have the same cost. If array-like, then it is the cost of each true positive classification.

fp_costfloat or array-like, shape=(n_samples,), default=0.0

Cost of false positives. If float, then all false positives have the same cost. If array-like, then it is the cost of each false positive classification.

tn_costfloat or array-like, shape=(n_samples,), default=0.0

Cost of true negatives. If float, then all true negatives have the same cost. If array-like, then it is the cost of each true negative classification.

fn_costfloat or array-like, shape=(n_samples,), default=0.0

Cost of false negatives. If float, then all false negatives have the same cost.

normalizebool, default=False

Normalize the cost by the number of samples. If True, return the log average expected cost.

check_inputbool, default=True

Perform input validation. Turning off improves performance, useful when using this metric as a loss function.

Returns:
log_expected_costfloat

Log expected cost.

Notes

The expected log cost of each instance \(\mathbb{E}[C^l_i]\) is calculated as:

\[\mathbb{E}[C^l_i] = y_i \cdot (\log(s_i) \cdot C_i(1|1) + \log(1 - s_i) \cdot C_i(0|1)) + (1 - y_i) \cdot (\log(s_i) \cdot C_i(1|0) + \log(1 - s_i) \cdot C_i(0|0))\]

where

  • \(y_i\) is the true label,

  • \(s_i\) is the predicted probability,

  • \(C_i(1|1)\) is the cost of a true positive tp_cost,

  • \(C_i(0|1)\) is the cost of a false positive fp_cost,

  • \(C_i(1|0)\) is the cost of a false negative fn_cost, and

  • \(C_i(0|0)\) is the cost of a true negative tn_cost.

When tp_cost and tn_cost equal -1, and fp_cost` and tn_cost equal 0, the expected log cost is equivalent to the log loss sklearn.metrics.log_loss.

Examples

import numpy as np
from empulse.metrics import expected_log_cost_loss
y_proba = [0.1, 0.9, 0.8, 0.2]
y_true = [0, 1, 1, 0]
fp_cost = np.array([4, 1, 2, 2])
fn_cost = np.array([1, 3, 3, 1])
expected_log_cost_loss(y_true, y_proba, fp_cost=fp_cost, fn_cost=fn_cost)