Metric#
- class empulse.metrics.Metric(cost_matrix, strategy)[source]#
 Class to create a custom value/cost-sensitive metric.
The metric is defined by a cost matrix and a strategy for computing the metric. The cost matrix defines the costs and benefits associated with each type of prediction outcome (true positive, true negative, false positive, false negative). The strategy defines how to compute the metric based on the cost matrix.
Read more in the User Guide.
- Parameters:
 - cost_matrixCostMatrix
 The cost matrix defining the costs and benefits associated with each type of prediction outcome.
- strategyMetricStrategy
 The strategy to use for computing the metric.
If
MaxProfit, the metric computes the maximum profit that can be achieved by a classifier. The metric determines the optimal threshold that maximizes the profit. This metric supports the use of stochastic variables.If
Cost, the metric computes the expected cost loss of a classifier. This metric supports passing instance-dependent costs in the form of array-likes. This metric does not support stochastic variables.If
Savings, the metric computes the savings that can be achieved by a classifier over a naive classifier which always predicts 0 or 1 (whichever is better). This metric supports passing instance-dependent costs in the form of array-likes. This metric does not support stochastic variables.
- Attributes:
 - tp_benefitsympy.Expr
 The benefit of a true positive. See
add_tp_benefitfor more details.- tn_benefitsympy.Expr
 The benefit of a true negative. See
add_tn_benefitfor more details.- fp_benefitsympy.Expr
 The benefit of a false positive. See
add_fp_benefitfor more details.- fn_benefitsympy.Expr
 The benefit of a false negative. See
add_fn_benefitfor more details.- tp_costsympy.Expr
 The cost of a true positive. See
add_tp_costfor more details.- tn_costsympy.Expr
 The cost of a true negative. See
add_tn_costfor more details.- fp_costsympy.Expr
 The cost of a false positive. See
add_fp_costfor more details.- fn_costsympy.Expr
 The cost of a false negative. See
add_fn_costfor more details.- direction: Direction
 Whether the metric is to be maximized or minimized.
Examples
Reimplementing
empc_scoreusing theMetricclass.import sympy as sp from empulse.metrics import Metric, MaxProfit, CostMatrix clv, d, f, alpha, beta = sp.symbols( 'clv d f alpha beta' ) # define deterministic variables gamma = sp.stats.Beta('gamma', alpha, beta) # define gamma to follow a Beta distribution cost_matrix = ( CostMatrix() .add_tp_benefit(gamma * (clv - d - f)) # when churner accepts offer .add_tp_benefit((1 - gamma) * -f) # when churner does not accept offer .add_fp_cost(d + f) # when you send an offer to a non-churner .alias({'incentive_cost': 'd', 'contact_cost': 'f'}) ) empc_score = Metric(cost_matrix, MaxProfit()) y_true = [1, 0, 1, 0, 1] y_proba = [0.9, 0.1, 0.8, 0.2, 0.7] empc_score(y_true, y_proba, clv=100, incentive_cost=10, contact_cost=1, alpha=6, beta=14)
Reimplementing
expected_cost_loss_churnusing theMetricclass.import sympy as sp from empulse.metrics import Metric, Cost, CostMatrix clv, delta, f, gamma = sp.symbols('clv delta f gamma') cost_matrix = ( CostMatrix() .add_tp_benefit(gamma * (clv - delta * clv - f)) # when churner accepts offer .add_tp_benefit((1 - gamma) * -f) # when churner does not accept offer .add_fp_cost(delta * clv + f) # when you send an offer to a non-churner .alias({'incentive_fraction': 'delta', 'contact_cost': 'f', 'accept_rate': 'gamma'}) ) cost_loss = Metric(cost_matrix, Cost()) y_true = [1, 0, 1, 0, 1] y_proba = [0.9, 0.1, 0.8, 0.2, 0.7] cost_loss( y_true, y_proba, clv=100, incentive_fraction=0.05, contact_cost=1, accept_rate=0.3 )
- __call__(y_true, y_score, **parameters)[source]#
 Compute the metric score or loss.
The
empulse.metrics.Metric.buildmethod should be called before calling this method.- Parameters:
 - y_true: array-like of shape (n_samples,)
 The ground truth labels.
- y_score: array-like of shape (n_samples,)
 The predicted labels, probabilities, or decision scores (based on the chosen metric).
- parameters: float or array-like of shape (n_samples,)
 The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.
If
float, the same value is used for all samples (class-dependent).If
array-like, the values are used for each sample (instance-dependent).
- Returns:
 - score: float
 The computed metric score or loss.
- optimal_rate(y_true, y_score, **parameters)[source]#
 Compute the optimal predicted positive rate.
i.e., the fraction of observations that should be classified as positive to optimize the metric.
- Parameters:
 - y_true: array-like of shape (n_samples,)
 The ground truth labels.
- y_score: array-like of shape (n_samples,)
 The predicted labels, probabilities, or decision scores (based on the chosen metric).
- parameters: float or array-like of shape (n_samples,)
 The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.
If
float, the same value is used for all samples (class-dependent).If
array-like, the values are used for each sample (instance-dependent).
- Returns:
 - optimal_rate: float
 The optimal predicted positive rate.
- optimal_threshold(y_true, y_score, **parameters)[source]#
 Compute the optimal classification threshold(s).
i.e., the score threshold at which an observation should be classified as positive to optimize the metric. For instance-dependent costs and benefits, this will return an array of thresholds, one for each sample. For class-dependent costs and benefits, this will return a single threshold value.
- Parameters:
 - y_true: array-like of shape (n_samples,)
 The ground truth labels.
- y_score: array-like of shape (n_samples,)
 The predicted labels, probabilities, or decision scores (based on the chosen metric).
- parameters: float or array-like of shape (n_samples,)
 The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.
If
float, the same value is used for all samples (class-dependent).If
array-like, the values are used for each sample (instance-dependent).
- Returns:
 - optimal_threshold: float or NDArray of shape (n_samples,)
 The optimal classification threshold(s).