MaxProfit#

class empulse.metrics.MaxProfit(integration_method='auto', n_mc_samples_exp=16, random_state=None)[source]#

Strategy for the Expected Maximum Profit (EMP) metric.

Parameters:

integration_method: {‘auto’, ‘quad’, ‘monte-carlo’, ‘quasi-monte-carlo’}, default=’auto’

The integration method to use when the metric has stochastic variables.

If 'auto', the integration method is automatically chosen based on the number of stochastic variables, balancing accuracy with execution speed. For a single stochastic variable, piecewise integration is used. This is the most accurate method. For two stochastic variables, ‘quad’ is used, and for more than two stochastic variables, ‘quasi-monte-carlo’ is used if all distribution are supported. Otherwise, ‘monte-carlo’ is used.
If 'quad', the metric is integrated using the quad function from scipy. Be careful, as this can be slow for more than 2 stochastic variables.
If 'monte-carlo', the metric is integrated using a Monte Carlo simulation. The monte-carlo simulation is less accurate but faster than quad for many stochastic variables.
If 'quasi-monte-carlo', the metric is integrated using a Quasi Monte Carlo simulation. The quasi-monte-carlo simulation is more accurate than monte-carlo but only supports a few distributions present in sympy.stats:
- sympy.stats.Arcsin
- sympy.stats.Beta
- sympy.stats.BetaPrime
- sympy.stats.Chi
- sympy.stats.ChiSquared
- sympy.stats.Erlang
- sympy.stats.Exponential
- sympy.stats.ExGaussian
- sympy.stats.F
- sympy.stats.Gamma
- sympy.stats.GammaInverse
- sympy.stats.GaussianInverse
- sympy.stats.Laplace
- sympy.stats.Logistic
- sympy.stats.LogNormal
- sympy.stats.Lomax
- sympy.stats.Normal
- sympy.stats.Maxwell
- sympy.stats.Moyal
- sympy.stats.Nakagami
- sympy.stats.PowerFunction
- sympy.stats.StudentT
- sympy.stats.Trapezoidal
- sympy.stats.Triangular
- sympy.stats.Uniform

n_mc_samples_exp: int

2**n_mc_samples_exp is the number of (Quasi-) Monte Carlo samples to use when integration_technique'monte-carlo'. Increasing the number of samples improves the accuracy of the metric estimation but slows down the speed. This argument is ignored when the integration_technique='quad'.

random_state: int | np.random.RandomState | None, default=None

The random state to use when integration_technique='monte-carlo' or integration_technique='quasi-monte-carlo'. Determines the points sampled from the distribution of the stochastic variables. This argument is ignored when integration_technique='quad'.

build(tp_benefit, tn_benefit, fp_cost, fn_cost)[source]#: Build the metric strategy.

gradient_boost_objective(y_true, y_score, **parameters)#

Compute the gradient of the metric with respect to gradient boosting instances.

Parameters:

y_true: array-like of shape (n_samples,)

The ground truth labels.

y_score: array-like of shape (n_samples,)

The predicted labels, probabilities, or decision scores (based on the chosen metric).

parameters: float or array-like of shape (n_samples,)

The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.

If float, the same value is used for all samples (class-dependent).
If array-like, the values are used for each sample (instance-dependent).

Returns:

gradientNDArray of shape (n_samples,): The gradient of the metric loss with respect to the gradient boosting weights.
hessianNDArray of shape (n_samples,): The hessian of the metric loss with respect to the gradient boosting weights.

logit_objective(features, weights, y_true, **parameters)#

Compute the metric value and the gradient of the metric with respect to logistic regression coefficients.

Parameters:

featuresNDArray of shape (n_samples, n_features)

The features of the samples.

weightsNDArray of shape (n_features,)

The weights of the logistic regression model.

y_trueNDArray of shape (n_samples,)

The ground truth labels.

parametersfloat or NDArray of shape (n_samples,)

If float, the same value is used for all samples (class-dependent).
If array-like, the values are used for each sample (instance-dependent).

Returns:

valuefloat: The metric loss to be minimized.
gradientNDArray of shape (n_features,): The gradient of the metric loss with respect to the logistic regression weights.

optimal_rate(y_true, y_score, **parameters)[source]#

Compute the predicted positive rate to optimize the metric value.

Parameters:

y_true: array-like of shape (n_samples,)

The ground truth labels.

y_score: array-like of shape (n_samples,)

The predicted labels, probabilities, or decision scores (based on the chosen metric).

parameters: float or array-like of shape (n_samples,)

If float, the same value is used for all samples (class-dependent).
If array-like, the values are used for each sample (instance-dependent).

Returns:

optimal_rate: float: The optimal predicted positive rate.

optimal_threshold(y_true, y_score, **parameters)[source]#

Compute the classification threshold(s) to optimize the metric value.

i.e., the score threshold at which an observation should be classified as positive to optimize the metric. For instance-dependent costs and benefits, this will return an array of thresholds, one for each sample. For class-dependent costs and benefits, this will return a single threshold value.

Parameters:

y_true: array-like of shape (n_samples,)

The ground truth labels.

y_score: array-like of shape (n_samples,)

The predicted labels, probabilities, or decision scores (based on the chosen metric).

parameters: float or array-like of shape (n_samples,)

If float, the same value is used for all samples (class-dependent).
If array-like, the values are used for each sample (instance-dependent).

Returns:

optimal_threshold: float | FloatNDArray: The optimal classification threshold(s).

score(y_true, y_score, **parameters)[source]#

Compute the maximum profit score.

Parameters:

y_true: array-like of shape (n_samples,)

The ground truth labels.

y_score: array-like of shape (n_samples,)

The predicted labels, probabilities, or decision scores (based on the chosen metric).

parameters: float or array-like of shape (n_samples,)

If float, the same value is used for all samples (class-dependent).
If array-like, the values are used for each sample (instance-dependent).

Returns:

score: float: The maximum profit score.

to_latex(tp_benefit, tn_benefit, fp_cost, fn_cost)[source]#: Return the LaTeX representation of the metric.