MaxProfit#
- class empulse.metrics.MaxProfit(integration_method='auto', n_mc_samples_exp=16, random_state=None)[source]#
Strategy for the Expected Maximum Profit (EMP) metric.
- Parameters:
- integration_method: {‘auto’, ‘quad’, ‘monte-carlo’, ‘quasi-monte-carlo’}, default=’auto’
The integration method to use when the metric has stochastic variables.
If
'auto', the integration method is automatically chosen based on the number of stochastic variables, balancing accuracy with execution speed. For a single stochastic variable, piecewise integration is used. This is the most accurate method. For two stochastic variables, ‘quad’ is used, and for more than two stochastic variables, ‘quasi-monte-carlo’ is used if all distribution are supported. Otherwise, ‘monte-carlo’ is used.If
'quad', the metric is integrated using the quad function from scipy. Be careful, as this can be slow for more than 2 stochastic variables.If
'monte-carlo', the metric is integrated using a Monte Carlo simulation. The monte-carlo simulation is less accurate but faster than quad for many stochastic variables.If
'quasi-monte-carlo', the metric is integrated using a Quasi Monte Carlo simulation. The quasi-monte-carlo simulation is more accurate than monte-carlo but only supports a few distributions present insympy.stats:sympy.stats.Arcsinsympy.stats.Betasympy.stats.BetaPrimesympy.stats.Chisympy.stats.ChiSquaredsympy.stats.Erlangsympy.stats.Exponentialsympy.stats.ExGaussiansympy.stats.Fsympy.stats.Gammasympy.stats.GammaInversesympy.stats.GaussianInversesympy.stats.Laplacesympy.stats.Logisticsympy.stats.LogNormalsympy.stats.Lomaxsympy.stats.Normalsympy.stats.Maxwellsympy.stats.Moyalsympy.stats.Nakagamisympy.stats.PowerFunctionsympy.stats.StudentTsympy.stats.Trapezoidalsympy.stats.Triangularsympy.stats.Uniform
- n_mc_samples_exp: int
2**n_mc_samples_expis the number of (Quasi-) Monte Carlo samples to use whenintegration_technique'monte-carlo'. Increasing the number of samples improves the accuracy of the metric estimation but slows down the speed. This argument is ignored when theintegration_technique='quad'.- random_state: int | np.random.RandomState | None, default=None
The random state to use when
integration_technique='monte-carlo'orintegration_technique='quasi-monte-carlo'. Determines the points sampled from the distribution of the stochastic variables. This argument is ignored whenintegration_technique='quad'.
- build_logit_objective(features, y_true, C, l1_ratio, soft_threshold, fit_intercept, **loss_params)#
Build a logit objective function for optimization.
This function constructs a callable that calculates logistic loss and its gradient for a given dataset. The function takes into account various regularization parameters and thresholds to customize the loss function. Optimization parameters passed to this function are critical for model fitting and performance.
- Parameters:
- featuresFloatNDArray
Feature matrix with shape (n_samples, n_features).
- y_trueFloatNDArray
Target values corresponding to the input samples, of shape (n_samples,).
- Cfloat
Regularization strength parameter. Smaller values specify stronger regularization.
- l1_ratiofloat
The Elastic-Net mixing parameter, with range 0 <= l1_ratio <= 1. l1_ratio=0 corresponds to L2 penalty, l1_ratio=1 to L1 penalty.
- soft_thresholdbool
Indicator of whether soft thresholding is applied during optimization.
- fit_interceptbool
Specifies if an intercept should be included in the model.
- **loss_paramsFloatNDArray or float
Additional parameters for customizing the loss function calculation, if needed.
- Returns:
- logit_objective
The callable logistic loss function with its gradient pre-configured for optimization.
- gradient_boost_objective(y_true, y_score, **parameters)#
Compute the gradient of the metric with respect to gradient boosting instances.
- Parameters:
- y_true: array-like of shape (n_samples,)
The ground truth labels.
- y_score: array-like of shape (n_samples,)
The predicted labels, probabilities, or decision scores (based on the chosen metric).
- parameters: float or array-like of shape (n_samples,)
The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.
If
float, the same value is used for all samples (class-dependent).If
array-like, the values are used for each sample (instance-dependent).
- Returns:
- gradientNDArray of shape (n_samples,)
The gradient of the metric loss with respect to the gradient boosting weights.
- hessianNDArray of shape (n_samples,)
The hessian of the metric loss with respect to the gradient boosting weights.
- logit_objective(features, y_true, C, l1_ratio, soft_threshold, fit_intercept, **parameters)#
Build a function which computes the metric value and the gradient of the metric w.r.t logistic coefficients.
- Parameters:
- featuresNDArray of shape (n_samples, n_features)
The features of the samples.
- y_trueNDArray of shape (n_samples,)
The ground truth labels.
- Cfloat
Regularization strength parameter. Smaller values specify stronger regularization.
- l1_ratiofloat
The Elastic-Net mixing parameter, with range 0 <= l1_ratio <= 1. l1_ratio=0 corresponds to L2 penalty, l1_ratio=1 to L1 penalty.
- soft_thresholdbool
Indicator of whether soft thresholding is applied during optimization.
- fit_interceptbool
Specifies if an intercept should be included in the model.
- parametersfloat or NDArray of shape (n_samples,)
The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.
If
float, the same value is used for all samples (class-dependent).If
array-like, the values are used for each sample (instance-dependent).
- Returns:
- logistic_objectiveCallable[[NDArray], tuple[float, NDArray]]
A function that takes logistic regression weights as input and returns the metric value and its gradient. The function signature is:
logistic_objective(weights) -> (value, gradient)
- optimal_rate(y_true, y_score, **parameters)[source]#
Compute the predicted positive rate to optimize the metric value.
- Parameters:
- y_true: array-like of shape (n_samples,)
The ground truth labels.
- y_score: array-like of shape (n_samples,)
The predicted labels, probabilities, or decision scores (based on the chosen metric).
- parameters: float or array-like of shape (n_samples,)
The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.
If
float, the same value is used for all samples (class-dependent).If
array-like, the values are used for each sample (instance-dependent).
- Returns:
- optimal_rate: float
The optimal predicted positive rate.
- optimal_threshold(y_true, y_score, **parameters)[source]#
Compute the classification threshold(s) to optimize the metric value.
i.e., the score threshold at which an observation should be classified as positive to optimize the metric. For instance-dependent costs and benefits, this will return an array of thresholds, one for each sample. For class-dependent costs and benefits, this will return a single threshold value.
- Parameters:
- y_true: array-like of shape (n_samples,)
The ground truth labels.
- y_score: array-like of shape (n_samples,)
The predicted labels, probabilities, or decision scores (based on the chosen metric).
- parameters: float or array-like of shape (n_samples,)
The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.
If
float, the same value is used for all samples (class-dependent).If
array-like, the values are used for each sample (instance-dependent).
- Returns:
- optimal_threshold: float | FloatNDArray
The optimal classification threshold(s).
- prepare_boost_objective(y_true, **parameters)#
Compute the gradient’s constant term of the metric wrt gradient boost.
- Parameters:
- y_trueNDArray of shape (n_samples,)
The ground truth labels.
- parametersfloat or NDArray of shape (n_samples,)
The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.
If
float, the same value is used for all samples (class-dependent).If
array-like, the values are used for each sample (instance-dependent).
- Returns:
- gradient_constNDArray of shape (n_samples, n_features)
The constant term of the gradient.
- prepare_logit_objective(features, y_true, **parameters)#
Compute the constant term of the loss and gradient of the metric wrt logistic regression coefficients.
- Parameters:
- featuresNDArray of shape (n_samples, n_features)
The features of the samples.
- y_trueNDArray of shape (n_samples,)
The ground truth labels.
- parametersfloat or NDArray of shape (n_samples,)
The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.
If
float, the same value is used for all samples (class-dependent).If
array-like, the values are used for each sample (instance-dependent).
- Returns:
- gradient_constNDArray of shape (n_samples, n_features)
The constant term of the gradient.
- loss_const1NDArray of shape (n_features,)
The first constant term of the loss function.
- loss_const2NDArray of shape (n_features,)
The second constant term of the loss function.
- score(y_true, y_score, **parameters)[source]#
Compute the maximum profit score.
- Parameters:
- y_true: array-like of shape (n_samples,)
The ground truth labels.
- y_score: array-like of shape (n_samples,)
The predicted labels, probabilities, or decision scores (based on the chosen metric).
- parameters: float or array-like of shape (n_samples,)
The parameter values for the costs and benefits defined in the metric. If any parameter is a stochastic variable, you should pass values for their distribution parameters. You can set the parameter values for either the symbol names or their aliases.
If
float, the same value is used for all samples (class-dependent).If
array-like, the values are used for each sample (instance-dependent).
- Returns:
- score: float
The maximum profit score.