Metric#

class empulse.metrics.Metric(kind)[source]#

Class to create a custom value/cost-sensitive metric.

The Metric class uses the Builder pattern to create a custom metric. You start by specifying the kind of metric you want to compute and then add the terms that make up the metric. These terms come from the cost-benefits matrix of the classification problem. After you have added all the terms, you can call the build method to create the metric function. Then you can call the metric function with the true labels and predicted probabilities to compute the metric value.

The costs and benefits are specified using sympy symbols or expressions. Stochastic variables are supported, and can be specified using sympy.stats random variables. Make sure that you add the parameters of the stochastic variables as keyword arguments when calling the metric function. Stochastic variables are assumed to be independent of each other.

Read more in the User Guide.

Parameters:
kind{‘max profit’, ‘cost’, ‘savings’}

The kind of metric to compute.

  • If max profit, the metric computes the maximum profit that can be achieved by a classifier. The metric determines the optimal threshold that maximizes the profit. This metric supports the use of stochastic variables.

  • If 'cost', the metric computes the expected cost loss of a classifier. This metric supports passing instance-dependent costs in the form of array-likes. This metric does not support stochastic variables.

  • If 'savings', the metric computes the savings that can be achieved by a classifier over a naive classifier which always predicts 0 or 1 (whichever is better). This metric supports passing instance-dependent costs in the form of array-likes. This metric does not support stochastic variables.

Attributes:
tp_benefitsympy.Expr

The benefit of a true positive. See add_tp_benefit for more details.

tn_benefitsympy.Expr

The benefit of a true negative. See add_tn_benefit for more details.

fp_benefitsympy.Expr

The benefit of a false positive. See add_fp_benefit for more details.

fn_benefitsympy.Expr

The benefit of a false negative. See add_fn_benefit for more details.

tp_costsympy.Expr

The cost of a true positive. See add_tp_cost for more details.

tn_costsympy.Expr

The cost of a true negative. See add_tn_cost for more details.

fp_costsympy.Expr

The cost of a false positive. See add_fp_cost for more details.

fn_costsympy.Expr

The cost of a false negative. See add_fn_cost for more details.

integration_method{‘auto’, ‘quad’, ‘monte-carlo’, ‘quasi-monte-carlo’}

The integration method to use when the metric has stochastic variables. See set_integration_method for more details.

n_mc_samplesint

The number of samples to use when using Monte Carlo methods to estimate the metric value. See set_n_mc_samples_exp for more details.

Examples

Reimplementing empc_score using the Metric class.

import sympy as sp
from empulse.metrics import Metric

clv, d, f, alpha, beta = sp.symbols(
    'clv d f alpha beta'
)  # define deterministic variables
gamma = sp.stats.Beta('gamma', alpha, beta)  # define gamma to follow a Beta distribution

empc_score = (
    Metric(kind='max profit')
    .add_tp_benefit(gamma * (clv - d - f))  # when churner accepts offer
    .add_tp_benefit((1 - gamma) * -f)  # when churner does not accept offer
    .add_fp_cost(d + f)  # when you send an offer to a non-churner
    .alias({'incentive_cost': 'd', 'contact_cost': 'f'})
    .build()
)

y_true = [1, 0, 1, 0, 1]
y_proba = [0.9, 0.1, 0.8, 0.2, 0.7]

empc_score(y_true, y_proba, clv=100, incentive_cost=10, contact_cost=1, alpha=6, beta=14)

Reimplementing expected_cost_loss_churn using the Metric class.

import sympy as sp
from empulse.metrics import Metric

clv, delta, f, gamma = sp.symbols('clv delta f gamma')

cost_loss = (
    Metric(kind='cost')
    .add_tp_benefit(gamma * (clv - delta * clv - f))  # when churner accepts offer
    .add_tp_benefit((1 - gamma) * -f)  # when churner does not accept offer
    .add_fp_cost(delta * clv + f)  # when you send an offer to a non-churner
    .alias({'incentive_fraction': 'delta', 'contact_cost': 'f', 'accept_rate': 'gamma'})
    .build()
)

y_true = [1, 0, 1, 0, 1]
y_proba = [0.9, 0.1, 0.8, 0.2, 0.7]

cost_loss(
    y_true, y_proba, clv=100, incentive_fraction=0.05, contact_cost=1, accept_rate=0.3
)

Using the Metric class as a context manager (automatically builds after assembling the components). Also adding default values for some of the parameters.

import sympy as sp
from empulse.metrics import Metric

clv, delta, f, gamma = sp.symbols('clv delta f gamma')

with Metric(kind='cost') as cost_loss:
    cost_loss.add_tp_benefit(gamma * (clv - delta * clv - f))
    cost_loss.add_tp_benefit((1 - gamma) * -f)
    cost_loss.add_fp_cost(delta * clv + f)
    cost_loss.alias('incentive_fraction', delta)
    cost_loss.alias('contact_cost', f)
    cost_loss.alias('accept_rate', gamma)
    cost_loss.set_default(incentive_fraction=0.05, contact_cost=1, accept_rate=0.3)

y_true = [1, 0, 1, 0, 1]
y_proba = [0.9, 0.1, 0.8, 0.2, 0.7]

cost_loss(y_true, y_proba, clv=100)
__call__(y_true, y_score, **parameters)[source]#

Compute the metric score or loss.

The empulse.metrics.Metric.build method should be called before calling this method.

Parameters:
y_true: array-like of shape (n_samples,)

The ground truth labels.

y_score: array-like of shape (n_samples,)

The predicted labels, probabilities or decision scores (based on the chosen metric).

  • If kind='max profit', the predicted labels are the decision scores.

  • If kind='cost', the predicted labels are the (calibrated) probabilities.

  • If kind='savings', the predicted labels are the (calibrated) probabilities.

parameters: float or array-like of shape (n_samples,)

The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.

  • If float, the same value is used for all samples (class-dependent).

  • If array-like, the values are used for each sample (instance-dependent).

Returns:
score: float

The computed metric score or loss.

add_fn_benefit(term)[source]#

Add a term to the benefit of classifying a false negative.

Parameters:
term: sympy.Expr | str

The term to add to the benefit of classifying a false negative.

Returns:
Metric
add_fn_cost(term)[source]#

Add a term to the cost of classifying a false negative.

Parameters:
term: sympy.Expr | str

The term to add to the cost of classifying a false negative.

Returns:
Metric
add_fp_benefit(term)[source]#

Add a term to the benefit of classifying a false positive.

Parameters:
term: sympy.Expr | str

The term to add to the benefit of classifying a false positive.

Returns:
Metric
add_fp_cost(term)[source]#

Add a term to the cost of classifying a false positive.

Parameters:
term: sympy.Expr | str

The term to add to the cost of classifying a false positive.

Returns:
Metric
add_tn_benefit(term)[source]#

Add a term to the benefit of classifying a true negative.

Parameters:
term: sympy.Expr | str

The term to add to the benefit of classifying a true negative.

Returns:
Metric
add_tn_cost(term)[source]#

Add a term to the cost of classifying a true negative.

Parameters:
term: sympy.Expr | str

The term to add to the cost of classifying a true negative.

Returns:
Metric
add_tp_benefit(term)[source]#

Add a term to the benefit of classifying a true positive.

Parameters:
term: sympy.Expr | str

The term to add to the benefit of classifying a true positive.

Returns:
Metric
add_tp_cost(term)[source]#

Add a term to the cost of classifying a true positive.

Parameters:
term: sympy.Expr | str

The term to add to the cost of classifying a true positive.

Returns:
Metric
alias(alias, symbol=None)[source]#

Add an alias for a symbol.

Parameters:
alias: str | MutableMapping[str, sympy.Symbol | str]

The alias to add. If a MutableMapping (.e.g, dictionary) is passed, the keys are the aliases and the values are the symbols.

symbol: sympy.Symbol, optional

The symbol to alias to.

Returns:
Metric

Examples

import sympy as sp
from empulse.metrics import Metric

clv, delta, f, gamma = sp.symbols('clv delta f gamma')
cost_loss = (
    Metric(kind='cost')
    .add_tp_benefit(gamma * (clv - delta * clv - f))  # when churner accepts offer
    .add_tp_benefit((1 - gamma) * -f)  # when churner does not accept offer
    .add_fp_cost(delta * clv + f)  # when you send an offer to a non-churner
    .alias({'incentive_fraction': 'delta', 'contact_cost': 'f', 'accept_rate': 'gamma'})
    .build()
)

y_true = [1, 0, 1, 0, 1]
y_proba = [0.9, 0.1, 0.8, 0.2, 0.7]
cost_loss(
    y_true, y_proba, clv=100, incentive_fraction=0.05, contact_cost=1, accept_rate=0.3
)
build()[source]#

Build the metric function.

This function should be called last after adding all the cost-benefit terms. After calling this function, the metric function can be called with the true labels and predicted probabilities.

This function is automatically called when using the Metric class as a context manager.

Returns:
Metric
gradient_boost_objective(y_true, y_score, **parameters)[source]#

Compute the gradient and hessian of the metric loss with respect to the gradient boosting weights.

Parameters:
y_trueNDArray of shape (n_samples,)

The ground truth labels.

y_scoreNDArray of shape (n_samples,)

The predicted probabilities or decision scores.

parametersfloat or NDArray of shape (n_samples,)

The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.

  • If float, the same value is used for all samples (class-dependent).

  • If array-like, the values are used for each sample (instance-dependent).

Returns:
gradientNDArray of shape (n_samples,)

The gradient of the metric loss with respect to the gradient boosting weights.

hessianNDArray of shape (n_samples,)

The hessian of the metric loss with respect to the gradient boosting weights.

logit_objective(features, weights, y_true, **parameters)[source]#

Compute the metric loss and its gradient with respect to the logistic regression weights.

Parameters:
featuresNDArray of shape (n_samples, n_features)

The features of the samples.

weightsNDArray of shape (n_features,)

The weights of the logistic regression model.

y_trueNDArray of shape (n_samples,)

The ground truth labels.

parametersfloat or NDArray of shape (n_samples,)

The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.

  • If float, the same value is used for all samples (class-dependent).

  • If array-like, the values are used for each sample (instance-dependent).

Returns:
valuefloat

The metric loss to be minimized.

gradientNDArray of shape (n_features,)

The gradient of the metric loss with respect to the logistic regression weights.

set_default(**defaults)[source]#

Set default values for symbols or their aliases.

Parameters:
defaults: float

Default values for symbols or their aliases. These default values will be used if not provided in __call__.

Returns:
Metric

Examples

import sympy as sp
from empulse.metrics import Metric

clv, delta, f, gamma = sp.symbols('clv delta f gamma')
cost_loss = (
    Metric(kind='cost')
    .add_tp_benefit(gamma * (clv - delta * clv - f))  # when churner accepts offer
    .add_tp_benefit((1 - gamma) * -f)  # when churner does not accept offer
    .add_fp_cost(delta * clv + f)  # when you send an offer to a non-churner
    .alias({'incentive_fraction': 'delta', 'contact_cost': 'f', 'accept_rate': 'gamma'})
    .set_default(incentive_fraction=0.05, contact_cost=1, accept_rate=0.3)
    .build()
)

y_true = [1, 0, 1, 0, 1]
y_proba = [0.9, 0.1, 0.8, 0.2, 0.7]
cost_loss(y_true, y_proba, clv=100, incentive_fraction=0.1)
set_integration_method(integration_method)[source]#

Set the integration method to use when the metric has stochastic variables.

Parameters:
integration_method: {‘auto’, ‘quad’, ‘monte-carlo’, ‘quasi-monte-carlo’}, default=’auto’

The integration method to use when the metric has stochastic variables.

  • If 'auto', the integration method is automatically chosen based on the number of stochastic variables, balancing accuracy with execution speed. For a single stochastic variables, piecewise integration is used. This is the most accurate method. For two stochastic variables, ‘quad’ is used, and for more than two stochastic variables, ‘quasi-monte-carlo’ is used if all distribution are supported. Otherwise, ‘monte-carlo’ is used.

  • If 'quad', the metric is integrated using the quad function from scipy. Be careful, as this can be slow for more than 2 stochastic variables.

  • If 'monte-carlo', the metric is integrated using a Monte Carlo simulation. The monte-carlo simulation is less accurate but faster than quad for many stochastic variables.

  • If 'quasi-monte-carlo', the metric is integrated using a Quasi Monte Carlo simulation. The quasi-monte-carlo simulation is more accurate than monte-carlo but only supports a few distributions present in sympy.stats:

    • sympy.stats.Arcsin

    • sympy.stats.Beta

    • sympy.stats.BetaPrime

    • sympy.stats.Chi

    • sympy.stats.ChiSquared

    • sympy.stats.Erlang

    • sympy.stats.Exponential

    • sympy.stats.ExGaussian

    • sympy.stats.F

    • sympy.stats.Gamma

    • sympy.stats.GammaInverse

    • sympy.stats.GaussianInverse

    • sympy.stats.Gompertz

    • sympy.stats.Laplace

    • sympy.stats.Levy

    • sympy.stats.Logistic

    • sympy.stats.LogNormal

    • sympy.stats.Lomax

    • sympy.stats.Normal

    • sympy.stats.Maxwell

    • sympy.stats.Moyal

    • sympy.stats.Nakagami

    • sympy.stats.Pareto

    • sympy.stats.PowerFunction

    • sympy.stats.StudentT

    • sympy.stats.Trapezoidal

    • sympy.stats.Triangular

    • sympy.stats.Uniform

    • sympy.stats.VonMises

Returns:
Metric

Examples

import sympy as sp
from empulse.metrics import Metric

clv, d, f, alpha, beta = sp.symbols('clv d f alpha beta')
gamma = sp.stats.Beta('gamma', alpha, beta)
empc_score = (
    Metric(kind='max profit')
    .add_tp_benefit(gamma * (clv - d - f))
    .add_tp_benefit((1 - gamma) * -f)
    .add_fp_cost(d + f)
    .set_integration_method('quad')
    .build()
)
y_true = [1, 0, 1, 0, 1]
y_proba = [0.9, 0.1, 0.8, 0.2, 0.7]
empc_score(y_true, y_proba, clv=100, d=10, f=1, alpha=6, beta=14)
set_n_mc_samples_exp(n_mc_samples_exp)[source]#

Set the number of (Quasi-) Monte Carlo samples to use.

This is ignored when is integration_technique='quad'.

Parameters:
n_mc_samples_exp: int

2**n_mc_samples_exp is the number of (Quasi-) Monte Carlo samples to use when integration_technique'monte-carlo'. Increasing the number of samples improves the accuracy of the metric estimation, but slows down the speed. This argument is ignored when the integration_technique='quad'.

Returns:
Metric

Examples

import sympy as sp
from empulse.metrics import Metric

clv, d, f, alpha, beta = sp.symbols('clv d f alpha beta')
gamma = sp.stats.Beta('gamma', alpha, beta)
empc_score = (
    Metric(kind='max profit')
    .add_tp_benefit(gamma * (clv - d - f))
    .add_tp_benefit((1 - gamma) * -f)
    .add_fp_cost(d + f)
    .set_integration_method('monte-carlo')
    .set_n_mc_samples_exp(15)
    .build()
)
y_true = [1, 0, 1, 0, 1]
y_proba = [0.9, 0.1, 0.8, 0.2, 0.7]
empc_score(y_true, y_proba, clv=100, d=10, f=1, alpha=6, beta=14)
set_random_state(random_state)[source]#

Set the random state to use when using monte-carlo methods to estimate the metric value.

Parameters:
random_state: int | np.random.RandomState

The random state to use when integration_technique='monte-carlo' or integration_technique='quasi-monte-carlo'. Determines the points sampled from the distribution of the stochastic variables. This argument is ignored when integration_technique='quad'.

Returns:
Metric

Examples

import sympy as sp
from empulse.metrics import Metric

clv, d, f, alpha, beta = sp.symbols('clv d f alpha beta')
gamma = sp.stats.Beta('gamma', alpha, beta)
empc_score = (
    Metric(kind='max profit')
    .add_tp_benefit(gamma * (clv - d - f))
    .add_tp_benefit((1 - gamma) * -f)
    .add_fp_cost(d + f)
    .set_integration_method('quasi-monte-carlo')
    .set_random_state(42)
    .build()
)
y_true = [1, 0, 1, 0, 1]
y_proba = [0.9, 0.1, 0.8, 0.2, 0.7]
empc_score(y_true, y_proba, clv=100, d=10, f=1, alpha=6, beta=14)