MaxProfit#
- class empulse.metrics.MaxProfit(integration_method='auto', n_mc_samples_exp=16, random_state=None)[source]#
Strategy for the Expected Maximum Profit (EMP) metric.
- Parameters:
- integration_method: {‘auto’, ‘quad’, ‘monte-carlo’, ‘quasi-monte-carlo’}, default=’auto’
The integration method to use when the metric has stochastic variables.
If
'auto'
, the integration method is automatically chosen based on the number of stochastic variables, balancing accuracy with execution speed. For a single stochastic variable, piecewise integration is used. This is the most accurate method. For two stochastic variables, ‘quad’ is used, and for more than two stochastic variables, ‘quasi-monte-carlo’ is used if all distribution are supported. Otherwise, ‘monte-carlo’ is used.If
'quad'
, the metric is integrated using the quad function from scipy. Be careful, as this can be slow for more than 2 stochastic variables.If
'monte-carlo'
, the metric is integrated using a Monte Carlo simulation. The monte-carlo simulation is less accurate but faster than quad for many stochastic variables.If
'quasi-monte-carlo'
, the metric is integrated using a Quasi Monte Carlo simulation. The quasi-monte-carlo simulation is more accurate than monte-carlo but only supports a few distributions present insympy.stats
:sympy.stats.Arcsin
sympy.stats.Beta
sympy.stats.BetaPrime
sympy.stats.Chi
sympy.stats.ChiSquared
sympy.stats.Erlang
sympy.stats.Exponential
sympy.stats.ExGaussian
sympy.stats.F
sympy.stats.Gamma
sympy.stats.GammaInverse
sympy.stats.GaussianInverse
sympy.stats.Laplace
sympy.stats.Logistic
sympy.stats.LogNormal
sympy.stats.Lomax
sympy.stats.Normal
sympy.stats.Maxwell
sympy.stats.Moyal
sympy.stats.Nakagami
sympy.stats.PowerFunction
sympy.stats.StudentT
sympy.stats.Trapezoidal
sympy.stats.Triangular
sympy.stats.Uniform
- n_mc_samples_exp: int
2**n_mc_samples_exp
is the number of (Quasi-) Monte Carlo samples to use whenintegration_technique'monte-carlo'
. Increasing the number of samples improves the accuracy of the metric estimation but slows down the speed. This argument is ignored when theintegration_technique='quad'
.- random_state: int | np.random.RandomState | None, default=None
The random state to use when
integration_technique='monte-carlo'
orintegration_technique='quasi-monte-carlo'
. Determines the points sampled from the distribution of the stochastic variables. This argument is ignored whenintegration_technique='quad'
.
- gradient_boost_objective(y_true, y_score, **parameters)#
Compute the gradient of the metric with respect to gradient boosting instances.
- Parameters:
- y_true: array-like of shape (n_samples,)
The ground truth labels.
- y_score: array-like of shape (n_samples,)
The predicted labels, probabilities, or decision scores (based on the chosen metric).
- parameters: float or array-like of shape (n_samples,)
The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.
If
float
, the same value is used for all samples (class-dependent).If
array-like
, the values are used for each sample (instance-dependent).
- Returns:
- gradientNDArray of shape (n_samples,)
The gradient of the metric loss with respect to the gradient boosting weights.
- hessianNDArray of shape (n_samples,)
The hessian of the metric loss with respect to the gradient boosting weights.
- logit_objective(features, weights, y_true, **parameters)#
Compute the metric value and the gradient of the metric with respect to logistic regression coefficients.
- Parameters:
- featuresNDArray of shape (n_samples, n_features)
The features of the samples.
- weightsNDArray of shape (n_features,)
The weights of the logistic regression model.
- y_trueNDArray of shape (n_samples,)
The ground truth labels.
- parametersfloat or NDArray of shape (n_samples,)
The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.
If
float
, the same value is used for all samples (class-dependent).If
array-like
, the values are used for each sample (instance-dependent).
- Returns:
- valuefloat
The metric loss to be minimized.
- gradientNDArray of shape (n_features,)
The gradient of the metric loss with respect to the logistic regression weights.
- optimal_rate(y_true, y_score, **parameters)[source]#
Compute the predicted positive rate to optimize the metric value.
- Parameters:
- y_true: array-like of shape (n_samples,)
The ground truth labels.
- y_score: array-like of shape (n_samples,)
The predicted labels, probabilities, or decision scores (based on the chosen metric).
- parameters: float or array-like of shape (n_samples,)
The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.
If
float
, the same value is used for all samples (class-dependent).If
array-like
, the values are used for each sample (instance-dependent).
- Returns:
- optimal_rate: float
The optimal predicted positive rate.
- optimal_threshold(y_true, y_score, **parameters)[source]#
Compute the classification threshold(s) to optimize the metric value.
i.e., the score threshold at which an observation should be classified as positive to optimize the metric. For instance-dependent costs and benefits, this will return an array of thresholds, one for each sample. For class-dependent costs and benefits, this will return a single threshold value.
- Parameters:
- y_true: array-like of shape (n_samples,)
The ground truth labels.
- y_score: array-like of shape (n_samples,)
The predicted labels, probabilities, or decision scores (based on the chosen metric).
- parameters: float or array-like of shape (n_samples,)
The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.
If
float
, the same value is used for all samples (class-dependent).If
array-like
, the values are used for each sample (instance-dependent).
- Returns:
- optimal_threshold: float | FloatNDArray
The optimal classification threshold(s).
- score(y_true, y_score, **parameters)[source]#
Compute the maximum profit score.
- Parameters:
- y_true: array-like of shape (n_samples,)
The ground truth labels.
- y_score: array-like of shape (n_samples,)
The predicted labels, probabilities, or decision scores (based on the chosen metric).
- parameters: float or array-like of shape (n_samples,)
The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.
If
float
, the same value is used for all samples (class-dependent).If
array-like
, the values are used for each sample (instance-dependent).
- Returns:
- score: float
The maximum profit score.