MaxProfit#

class empulse.metrics.MaxProfit(integration_method='auto', n_mc_samples_exp=16, random_state=None)[source]#

Strategy for the Expected Maximum Profit (EMP) metric.

Parameters:
integration_method: {‘auto’, ‘quad’, ‘monte-carlo’, ‘quasi-monte-carlo’}, default=’auto’

The integration method to use when the metric has stochastic variables.

  • If 'auto', the integration method is automatically chosen based on the number of stochastic variables, balancing accuracy with execution speed. For a single stochastic variable, piecewise integration is used. This is the most accurate method. For two stochastic variables, ‘quad’ is used, and for more than two stochastic variables, ‘quasi-monte-carlo’ is used if all distribution are supported. Otherwise, ‘monte-carlo’ is used.

  • If 'quad', the metric is integrated using the quad function from scipy. Be careful, as this can be slow for more than 2 stochastic variables.

  • If 'monte-carlo', the metric is integrated using a Monte Carlo simulation. The monte-carlo simulation is less accurate but faster than quad for many stochastic variables.

  • If 'quasi-monte-carlo', the metric is integrated using a Quasi Monte Carlo simulation. The quasi-monte-carlo simulation is more accurate than monte-carlo but only supports a few distributions present in sympy.stats:

    • sympy.stats.Arcsin

    • sympy.stats.Beta

    • sympy.stats.BetaPrime

    • sympy.stats.Chi

    • sympy.stats.ChiSquared

    • sympy.stats.Erlang

    • sympy.stats.Exponential

    • sympy.stats.ExGaussian

    • sympy.stats.F

    • sympy.stats.Gamma

    • sympy.stats.GammaInverse

    • sympy.stats.GaussianInverse

    • sympy.stats.Laplace

    • sympy.stats.Logistic

    • sympy.stats.LogNormal

    • sympy.stats.Lomax

    • sympy.stats.Normal

    • sympy.stats.Maxwell

    • sympy.stats.Moyal

    • sympy.stats.Nakagami

    • sympy.stats.PowerFunction

    • sympy.stats.StudentT

    • sympy.stats.Trapezoidal

    • sympy.stats.Triangular

    • sympy.stats.Uniform

n_mc_samples_exp: int

2**n_mc_samples_exp is the number of (Quasi-) Monte Carlo samples to use when integration_technique'monte-carlo'. Increasing the number of samples improves the accuracy of the metric estimation but slows down the speed. This argument is ignored when the integration_technique='quad'.

random_state: int | np.random.RandomState | None, default=None

The random state to use when integration_technique='monte-carlo' or integration_technique='quasi-monte-carlo'. Determines the points sampled from the distribution of the stochastic variables. This argument is ignored when integration_technique='quad'.

build(tp_benefit, tn_benefit, fp_cost, fn_cost)[source]#

Build the metric strategy.

gradient_boost_objective(y_true, y_score, **parameters)#

Compute the gradient of the metric with respect to gradient boosting instances.

Parameters:
y_true: array-like of shape (n_samples,)

The ground truth labels.

y_score: array-like of shape (n_samples,)

The predicted labels, probabilities, or decision scores (based on the chosen metric).

parameters: float or array-like of shape (n_samples,)

The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.

  • If float, the same value is used for all samples (class-dependent).

  • If array-like, the values are used for each sample (instance-dependent).

Returns:
gradientNDArray of shape (n_samples,)

The gradient of the metric loss with respect to the gradient boosting weights.

hessianNDArray of shape (n_samples,)

The hessian of the metric loss with respect to the gradient boosting weights.

logit_objective(features, weights, y_true, **parameters)#

Compute the metric value and the gradient of the metric with respect to logistic regression coefficients.

Parameters:
featuresNDArray of shape (n_samples, n_features)

The features of the samples.

weightsNDArray of shape (n_features,)

The weights of the logistic regression model.

y_trueNDArray of shape (n_samples,)

The ground truth labels.

parametersfloat or NDArray of shape (n_samples,)

The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.

  • If float, the same value is used for all samples (class-dependent).

  • If array-like, the values are used for each sample (instance-dependent).

Returns:
valuefloat

The metric loss to be minimized.

gradientNDArray of shape (n_features,)

The gradient of the metric loss with respect to the logistic regression weights.

optimal_rate(y_true, y_score, **parameters)[source]#

Compute the predicted positive rate to optimize the metric value.

Parameters:
y_true: array-like of shape (n_samples,)

The ground truth labels.

y_score: array-like of shape (n_samples,)

The predicted labels, probabilities, or decision scores (based on the chosen metric).

parameters: float or array-like of shape (n_samples,)

The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.

  • If float, the same value is used for all samples (class-dependent).

  • If array-like, the values are used for each sample (instance-dependent).

Returns:
optimal_rate: float

The optimal predicted positive rate.

optimal_threshold(y_true, y_score, **parameters)[source]#

Compute the classification threshold(s) to optimize the metric value.

i.e., the score threshold at which an observation should be classified as positive to optimize the metric. For instance-dependent costs and benefits, this will return an array of thresholds, one for each sample. For class-dependent costs and benefits, this will return a single threshold value.

Parameters:
y_true: array-like of shape (n_samples,)

The ground truth labels.

y_score: array-like of shape (n_samples,)

The predicted labels, probabilities, or decision scores (based on the chosen metric).

parameters: float or array-like of shape (n_samples,)

The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.

  • If float, the same value is used for all samples (class-dependent).

  • If array-like, the values are used for each sample (instance-dependent).

Returns:
optimal_threshold: float | FloatNDArray

The optimal classification threshold(s).

score(y_true, y_score, **parameters)[source]#

Compute the maximum profit score.

Parameters:
y_true: array-like of shape (n_samples,)

The ground truth labels.

y_score: array-like of shape (n_samples,)

The predicted labels, probabilities, or decision scores (based on the chosen metric).

parameters: float or array-like of shape (n_samples,)

The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.

  • If float, the same value is used for all samples (class-dependent).

  • If array-like, the values are used for each sample (instance-dependent).

Returns:
score: float

The maximum profit score.

to_latex(tp_benefit, tn_benefit, fp_cost, fn_cost)[source]#

Return the LaTeX representation of the metric.