Metric#
- class empulse.metrics.Metric(strategy)[source]#
Class to create a custom value/cost-sensitive metric.
The Metric class uses the Builder pattern to create a custom metric. You start by specifying the kind of metric you want to compute and then add the terms that make up the metric. These terms come from the cost-benefits matrix of the classification problem. After you have added all the terms, you can call the
build
method to create the metric function. Then you can call the metric function with the true labels and predicted probabilities to compute the metric value.The costs and benefits are specified using sympy symbols or expressions. Stochastic variables are supported and can be specified using sympy.stats random variables. Make sure that you add the parameters of the stochastic variables as keyword arguments when calling the metric function. Stochastic variables are assumed to be independent of each other.
Read more in the User Guide.
- Parameters:
- strategyMetricStrategy
The strategy to use for computing the metric.
If
MaxProfit
, the metric computes the maximum profit that can be achieved by a classifier. The metric determines the optimal threshold that maximizes the profit. This metric supports the use of stochastic variables.If
Cost
, the metric computes the expected cost loss of a classifier. This metric supports passing instance-dependent costs in the form of array-likes. This metric does not support stochastic variables.If
Savings
, the metric computes the savings that can be achieved by a classifier over a naive classifier which always predicts 0 or 1 (whichever is better). This metric supports passing instance-dependent costs in the form of array-likes. This metric does not support stochastic variables.
- Attributes:
- tp_benefitsympy.Expr
The benefit of a true positive. See
add_tp_benefit
for more details.- tn_benefitsympy.Expr
The benefit of a true negative. See
add_tn_benefit
for more details.- fp_benefitsympy.Expr
The benefit of a false positive. See
add_fp_benefit
for more details.- fn_benefitsympy.Expr
The benefit of a false negative. See
add_fn_benefit
for more details.- tp_costsympy.Expr
The cost of a true positive. See
add_tp_cost
for more details.- tn_costsympy.Expr
The cost of a true negative. See
add_tn_cost
for more details.- fp_costsympy.Expr
The cost of a false positive. See
add_fp_cost
for more details.- fn_costsympy.Expr
The cost of a false negative. See
add_fn_cost
for more details.- direction: Direction
Whether the metric is to be maximized or minimized.
Examples
Reimplementing
empc_score
using theMetric
class.import sympy as sp from empulse.metrics import Metric, MaxProfit clv, d, f, alpha, beta = sp.symbols( 'clv d f alpha beta' ) # define deterministic variables gamma = sp.stats.Beta('gamma', alpha, beta) # define gamma to follow a Beta distribution empc_score = ( Metric(MaxProfit()) .add_tp_benefit(gamma * (clv - d - f)) # when churner accepts offer .add_tp_benefit((1 - gamma) * -f) # when churner does not accept offer .add_fp_cost(d + f) # when you send an offer to a non-churner .alias({'incentive_cost': 'd', 'contact_cost': 'f'}) .build() ) y_true = [1, 0, 1, 0, 1] y_proba = [0.9, 0.1, 0.8, 0.2, 0.7] empc_score(y_true, y_proba, clv=100, incentive_cost=10, contact_cost=1, alpha=6, beta=14)
Reimplementing
expected_cost_loss_churn
using theMetric
class.import sympy as sp from empulse.metrics import Metric, Cost clv, delta, f, gamma = sp.symbols('clv delta f gamma') cost_loss = ( Metric(Cost()) .add_tp_benefit(gamma * (clv - delta * clv - f)) # when churner accepts offer .add_tp_benefit((1 - gamma) * -f) # when churner does not accept offer .add_fp_cost(delta * clv + f) # when you send an offer to a non-churner .alias({'incentive_fraction': 'delta', 'contact_cost': 'f', 'accept_rate': 'gamma'}) .build() ) y_true = [1, 0, 1, 0, 1] y_proba = [0.9, 0.1, 0.8, 0.2, 0.7] cost_loss( y_true, y_proba, clv=100, incentive_fraction=0.05, contact_cost=1, accept_rate=0.3 )
Using the Metric class as a context manager (automatically builds after assembling the components). Also adding default values for some parameters.
import sympy as sp from empulse.metrics import Metric, Cost clv, delta, f, gamma = sp.symbols('clv delta f gamma') with Metric(Cost()) as cost_loss: cost_loss.add_tp_benefit(gamma * (clv - delta * clv - f)) cost_loss.add_tp_benefit((1 - gamma) * -f) cost_loss.add_fp_cost(delta * clv + f) cost_loss.alias('incentive_fraction', delta) cost_loss.alias('contact_cost', f) cost_loss.alias('accept_rate', gamma) cost_loss.set_default(incentive_fraction=0.05, contact_cost=1, accept_rate=0.3) y_true = [1, 0, 1, 0, 1] y_proba = [0.9, 0.1, 0.8, 0.2, 0.7] cost_loss(y_true, y_proba, clv=100)
- __call__(y_true, y_score, **parameters)[source]#
Compute the metric score or loss.
The
empulse.metrics.Metric.build
method should be called before calling this method.- Parameters:
- y_true: array-like of shape (n_samples,)
The ground truth labels.
- y_score: array-like of shape (n_samples,)
The predicted labels, probabilities, or decision scores (based on the chosen metric).
- parameters: float or array-like of shape (n_samples,)
The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.
If
float
, the same value is used for all samples (class-dependent).If
array-like
, the values are used for each sample (instance-dependent).
- Returns:
- score: float
The computed metric score or loss.
- add_fn_benefit(term)[source]#
Add a term to the benefit of classifying a false negative.
- Parameters:
- term: sympy.Expr | str
The term to add to the benefit of classifying a false negative.
- Returns:
- Metric
- add_fn_cost(term)[source]#
Add a term to the cost of classifying a false negative.
- Parameters:
- term: sympy.Expr | str
The term to add to the cost of classifying a false negative.
- Returns:
- Metric
- add_fp_benefit(term)[source]#
Add a term to the benefit of classifying a false positive.
- Parameters:
- term: sympy.Expr | str
The term to add to the benefit of classifying a false positive.
- Returns:
- Metric
- add_fp_cost(term)[source]#
Add a term to the cost of classifying a false positive.
- Parameters:
- term: sympy.Expr | str
The term to add to the cost of classifying a false positive.
- Returns:
- Metric
- add_tn_benefit(term)[source]#
Add a term to the benefit of classifying a true negative.
- Parameters:
- term: sympy.Expr | str
The term to add to the benefit of classifying a true negative.
- Returns:
- Metric
- add_tn_cost(term)[source]#
Add a term to the cost of classifying a true negative.
- Parameters:
- term: sympy.Expr | str
The term to add to the cost of classifying a true negative.
- Returns:
- Metric
- add_tp_benefit(term)[source]#
Add a term to the benefit of classifying a true positive.
- Parameters:
- term: sympy.Expr | str
The term to add to the benefit of classifying a true positive.
- Returns:
- Metric
- add_tp_cost(term)[source]#
Add a term to the cost of classifying a true positive.
- Parameters:
- term: sympy.Expr | str
The term to add to the cost of classifying a true positive.
- Returns:
- Metric
- alias(alias, symbol=None)[source]#
Add an alias for a symbol.
- Parameters:
- alias: str | MutableMapping[str, sympy.Symbol | str]
The alias to add. If a MutableMapping (.e.g, dictionary) is passed, the keys are the aliases and the values are the symbols.
- symbol: sympy.Symbol, optional
The symbol to alias to.
- Returns:
- Metric
Examples
import sympy as sp from empulse.metrics import Metric clv, delta, f, gamma = sp.symbols('clv delta f gamma') cost_loss = ( Metric(kind='cost') .add_tp_benefit(gamma * (clv - delta * clv - f)) # when churner accepts offer .add_tp_benefit((1 - gamma) * -f) # when churner does not accept offer .add_fp_cost(delta * clv + f) # when you send an offer to a non-churner .alias({'incentive_fraction': 'delta', 'contact_cost': 'f', 'accept_rate': 'gamma'}) .build() ) y_true = [1, 0, 1, 0, 1] y_proba = [0.9, 0.1, 0.8, 0.2, 0.7] cost_loss( y_true, y_proba, clv=100, incentive_fraction=0.05, contact_cost=1, accept_rate=0.3 )
- build()[source]#
Build the metric function.
This function should be called last after adding all the cost-benefit terms. After calling this function, the metric function can be called with the true labels and predicted probabilities.
This function is automatically called when using the
Metric
class as a context manager.- Returns:
- Metric
- mark_outlier_sensitive(symbol)[source]#
Mark a symbol as outlier-sensitive.
This is used to indicate that the symbol is sensitive to outliers. When the metric is used as a loss function or criterion for training a model,
RobustCSClassifier
will impute outliers for this symbol’s value. This is ignored when not using aRobustCSClassifier
model.- Parameters:
- symbol: str | sympy.Symbol
The symbol to mark as outlier-sensitive.
- Returns:
- Metric
Examples
import numpy as np import sympy as sp from empulse.metrics import Metric, Cost from empulse.models import CSLogitClassifier, RobustCSClassifier from sklearn.datasets import make_classification X, y = make_classification() a, b = sp.symbols('a b') cost_loss = Metric(Cost()).add_fp_cost(a).add_fn_cost(b).mark_outlier_sensitive(a).build() fn_cost = np.random.rand(y.size) model = RobustCSClassifier(CSLogitClassifier(loss=cost_loss)) model.fit(X, y, a=np.random.rand(y.size), b=5)
- optimal_rate(y_true, y_score, **parameters)[source]#
Compute the optimal predicted positive rate.
i.e., the fraction of observations that should be classified as positive to optimize the metric.
- Parameters:
- y_true: array-like of shape (n_samples,)
The ground truth labels.
- y_score: array-like of shape (n_samples,)
The predicted labels, probabilities, or decision scores (based on the chosen metric).
- parameters: float or array-like of shape (n_samples,)
The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.
If
float
, the same value is used for all samples (class-dependent).If
array-like
, the values are used for each sample (instance-dependent).
- Returns:
- optimal_rate: float
The optimal predicted positive rate.
- optimal_threshold(y_true, y_score, **parameters)[source]#
Compute the optimal classification threshold(s).
i.e., the score threshold at which an observation should be classified as positive to optimize the metric. For instance-dependent costs and benefits, this will return an array of thresholds, one for each sample. For class-dependent costs and benefits, this will return a single threshold value.
- Parameters:
- y_true: array-like of shape (n_samples,)
The ground truth labels.
- y_score: array-like of shape (n_samples,)
The predicted labels, probabilities, or decision scores (based on the chosen metric).
- parameters: float or array-like of shape (n_samples,)
The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.
If
float
, the same value is used for all samples (class-dependent).If
array-like
, the values are used for each sample (instance-dependent).
- Returns:
- optimal_threshold: float | FloatNDArray
The optimal classification threshold(s).
- set_default(**defaults)[source]#
Set default values for symbols or their aliases.
- Parameters:
- defaults: float
Default values for symbols or their aliases. These default values will be used if not provided in __call__.
- Returns:
- Metric
Examples
import sympy as sp from empulse.metrics import Metric clv, delta, f, gamma = sp.symbols('clv delta f gamma') cost_loss = ( Metric(kind='cost') .add_tp_benefit(gamma * (clv - delta * clv - f)) # when churner accepts offer .add_tp_benefit((1 - gamma) * -f) # when churner does not accept offer .add_fp_cost(delta * clv + f) # when you send an offer to a non-churner .alias({'incentive_fraction': 'delta', 'contact_cost': 'f', 'accept_rate': 'gamma'}) .set_default(incentive_fraction=0.05, contact_cost=1, accept_rate=0.3) .build() ) y_true = [1, 0, 1, 0, 1] y_proba = [0.9, 0.1, 0.8, 0.2, 0.7] cost_loss(y_true, y_proba, clv=100, incentive_fraction=0.1)