MaxProfit#

class empulse.metrics.MaxProfit(integration_method='auto', n_mc_samples_exp=16, random_state=None)[source]#

Strategy for the Expected Maximum Profit (EMP) metric.

Parameters:
integration_method: {‘auto’, ‘quad’, ‘monte-carlo’, ‘quasi-monte-carlo’}, default=’auto’

The integration method to use when the metric has stochastic variables.

  • If 'auto', the integration method is automatically chosen based on the number of stochastic variables, balancing accuracy with execution speed. For a single stochastic variable, piecewise integration is used. This is the most accurate method. For two stochastic variables, ‘quad’ is used, and for more than two stochastic variables, ‘quasi-monte-carlo’ is used if all distribution are supported. Otherwise, ‘monte-carlo’ is used.

  • If 'quad', the metric is integrated using the quad function from scipy. Be careful, as this can be slow for more than 2 stochastic variables.

  • If 'monte-carlo', the metric is integrated using a Monte Carlo simulation. The monte-carlo simulation is less accurate but faster than quad for many stochastic variables.

  • If 'quasi-monte-carlo', the metric is integrated using a Quasi Monte Carlo simulation. The quasi-monte-carlo simulation is more accurate than monte-carlo but only supports a few distributions present in sympy.stats:

    • sympy.stats.Arcsin

    • sympy.stats.Beta

    • sympy.stats.BetaPrime

    • sympy.stats.Chi

    • sympy.stats.ChiSquared

    • sympy.stats.Erlang

    • sympy.stats.Exponential

    • sympy.stats.ExGaussian

    • sympy.stats.F

    • sympy.stats.Gamma

    • sympy.stats.GammaInverse

    • sympy.stats.GaussianInverse

    • sympy.stats.Laplace

    • sympy.stats.Logistic

    • sympy.stats.LogNormal

    • sympy.stats.Lomax

    • sympy.stats.Normal

    • sympy.stats.Maxwell

    • sympy.stats.Moyal

    • sympy.stats.Nakagami

    • sympy.stats.PowerFunction

    • sympy.stats.StudentT

    • sympy.stats.Trapezoidal

    • sympy.stats.Triangular

    • sympy.stats.Uniform

n_mc_samples_exp: int

2**n_mc_samples_exp is the number of (Quasi-) Monte Carlo samples to use when integration_technique'monte-carlo'. Increasing the number of samples improves the accuracy of the metric estimation but slows down the speed. This argument is ignored when the integration_technique='quad'.

random_state: int | np.random.RandomState | None, default=None

The random state to use when integration_technique='monte-carlo' or integration_technique='quasi-monte-carlo'. Determines the points sampled from the distribution of the stochastic variables. This argument is ignored when integration_technique='quad'.

build(tp_benefit, tn_benefit, fp_cost, fn_cost)[source]#

Build the metric strategy.

build_logit_objective(features, y_true, C, l1_ratio, soft_threshold, fit_intercept, **loss_params)#

Build a logit objective function for optimization.

This function constructs a callable that calculates logistic loss and its gradient for a given dataset. The function takes into account various regularization parameters and thresholds to customize the loss function. Optimization parameters passed to this function are critical for model fitting and performance.

Parameters:
featuresFloatNDArray

Feature matrix with shape (n_samples, n_features).

y_trueFloatNDArray

Target values corresponding to the input samples, of shape (n_samples,).

Cfloat

Regularization strength parameter. Smaller values specify stronger regularization.

l1_ratiofloat

The Elastic-Net mixing parameter, with range 0 <= l1_ratio <= 1. l1_ratio=0 corresponds to L2 penalty, l1_ratio=1 to L1 penalty.

soft_thresholdbool

Indicator of whether soft thresholding is applied during optimization.

fit_interceptbool

Specifies if an intercept should be included in the model.

**loss_paramsFloatNDArray or float

Additional parameters for customizing the loss function calculation, if needed.

Returns:
logit_objective

The callable logistic loss function with its gradient pre-configured for optimization.

gradient_boost_objective(y_true, y_score, **parameters)#

Compute the gradient of the metric with respect to gradient boosting instances.

Parameters:
y_true: array-like of shape (n_samples,)

The ground truth labels.

y_score: array-like of shape (n_samples,)

The predicted labels, probabilities, or decision scores (based on the chosen metric).

parameters: float or array-like of shape (n_samples,)

The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.

  • If float, the same value is used for all samples (class-dependent).

  • If array-like, the values are used for each sample (instance-dependent).

Returns:
gradientNDArray of shape (n_samples,)

The gradient of the metric loss with respect to the gradient boosting weights.

hessianNDArray of shape (n_samples,)

The hessian of the metric loss with respect to the gradient boosting weights.

logit_objective(features, y_true, C, l1_ratio, soft_threshold, fit_intercept, **parameters)#

Build a function which computes the metric value and the gradient of the metric w.r.t logistic coefficients.

Parameters:
featuresNDArray of shape (n_samples, n_features)

The features of the samples.

y_trueNDArray of shape (n_samples,)

The ground truth labels.

Cfloat

Regularization strength parameter. Smaller values specify stronger regularization.

l1_ratiofloat

The Elastic-Net mixing parameter, with range 0 <= l1_ratio <= 1. l1_ratio=0 corresponds to L2 penalty, l1_ratio=1 to L1 penalty.

soft_thresholdbool

Indicator of whether soft thresholding is applied during optimization.

fit_interceptbool

Specifies if an intercept should be included in the model.

parametersfloat or NDArray of shape (n_samples,)

The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.

  • If float, the same value is used for all samples (class-dependent).

  • If array-like, the values are used for each sample (instance-dependent).

Returns:
logistic_objectiveCallable[[NDArray], tuple[float, NDArray]]

A function that takes logistic regression weights as input and returns the metric value and its gradient. The function signature is: logistic_objective(weights) -> (value, gradient)

optimal_rate(y_true, y_score, **parameters)[source]#

Compute the predicted positive rate to optimize the metric value.

Parameters:
y_true: array-like of shape (n_samples,)

The ground truth labels.

y_score: array-like of shape (n_samples,)

The predicted labels, probabilities, or decision scores (based on the chosen metric).

parameters: float or array-like of shape (n_samples,)

The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.

  • If float, the same value is used for all samples (class-dependent).

  • If array-like, the values are used for each sample (instance-dependent).

Returns:
optimal_rate: float

The optimal predicted positive rate.

optimal_threshold(y_true, y_score, **parameters)[source]#

Compute the classification threshold(s) to optimize the metric value.

i.e., the score threshold at which an observation should be classified as positive to optimize the metric. For instance-dependent costs and benefits, this will return an array of thresholds, one for each sample. For class-dependent costs and benefits, this will return a single threshold value.

Parameters:
y_true: array-like of shape (n_samples,)

The ground truth labels.

y_score: array-like of shape (n_samples,)

The predicted labels, probabilities, or decision scores (based on the chosen metric).

parameters: float or array-like of shape (n_samples,)

The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.

  • If float, the same value is used for all samples (class-dependent).

  • If array-like, the values are used for each sample (instance-dependent).

Returns:
optimal_threshold: float | FloatNDArray

The optimal classification threshold(s).

prepare_boost_objective(y_true, **parameters)#

Compute the gradient’s constant term of the metric wrt gradient boost.

Parameters:
y_trueNDArray of shape (n_samples,)

The ground truth labels.

parametersfloat or NDArray of shape (n_samples,)

The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.

  • If float, the same value is used for all samples (class-dependent).

  • If array-like, the values are used for each sample (instance-dependent).

Returns:
gradient_constNDArray of shape (n_samples, n_features)

The constant term of the gradient.

prepare_logit_objective(features, y_true, **parameters)#

Compute the constant term of the loss and gradient of the metric wrt logistic regression coefficients.

Parameters:
featuresNDArray of shape (n_samples, n_features)

The features of the samples.

y_trueNDArray of shape (n_samples,)

The ground truth labels.

parametersfloat or NDArray of shape (n_samples,)

The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.

  • If float, the same value is used for all samples (class-dependent).

  • If array-like, the values are used for each sample (instance-dependent).

Returns:
gradient_constNDArray of shape (n_samples, n_features)

The constant term of the gradient.

loss_const1NDArray of shape (n_features,)

The first constant term of the loss function.

loss_const2NDArray of shape (n_features,)

The second constant term of the loss function.

score(y_true, y_score, **parameters)[source]#

Compute the maximum profit score.

Parameters:
y_true: array-like of shape (n_samples,)

The ground truth labels.

y_score: array-like of shape (n_samples,)

The predicted labels, probabilities, or decision scores (based on the chosen metric).

parameters: float or array-like of shape (n_samples,)

The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.

  • If float, the same value is used for all samples (class-dependent).

  • If array-like, the values are used for each sample (instance-dependent).

Returns:
score: float

The maximum profit score.

to_latex(tp_benefit, tn_benefit, fp_cost, fn_cost)[source]#

Return the LaTeX representation of the metric.