CSBoostClassifier#
- class empulse.models.CSBoostClassifier(estimator=None, *, tp_cost=0.0, tn_cost=0.0, fn_cost=0.0, fp_cost=0.0, loss=None)[source]#
Cost-sensitive gradient boosting classifier.
CSBoostClassifier supports
xgboost.XGBClassifier,lightgbm.LGBMClassifierandcatboost.CatBoostClassifieras base estimators. By default, it uses XGBoost classifier with default hyperparameters.Read more in the User Guide.
See also
CSLogitClassifier: Cost-sensitive logistic regression classifier.CSTreeClassifier: Cost-sensitive decision tree classifier.CSForestClassifier: Cost-sensitive random forest classifier.- Parameters:
- estimator
xgboost.XGBClassifier,lightgbm.LGBMClassifierorcatboost.CatBoostClassifier, optional XGBoost or LightGBM classifier to be fit with desired hyperparameters. If not provided, a XGBoost classifier with default hyperparameters is used.
- tp_costfloat or array-like, shape=(n_samples,), default=0.0
Cost of true positives. If
float, then all true positives have the same cost. If array-like, then it is the cost of each true positive classification. Is overwritten if another tp_cost is passed to thefitmethod.Note
It is not recommended to pass instance-dependent costs to the
__init__method. Instead, pass them to thefitmethod.- fp_costfloat or array-like, shape=(n_samples,), default=0.0
Cost of false positives. If
float, then all false positives have the same cost. If array-like, then it is the cost of each false positive classification. Is overwritten if another fp_cost is passed to thefitmethod.Note
It is not recommended to pass instance-dependent costs to the
__init__method. Instead, pass them to thefitmethod.- tn_costfloat or array-like, shape=(n_samples,), default=0.0
Cost of true negatives. If
float, then all true negatives have the same cost. If array-like, then it is the cost of each true negative classification. Is overwritten if another tn_cost is passed to thefitmethod.Note
It is not recommended to pass instance-dependent costs to the
__init__method. Instead, pass them to thefitmethod.- fn_costfloat or array-like, shape=(n_samples,), default=0.0
Cost of false negatives. If
float, then all false negatives have the same cost. If array-like, then it is the cost of each false negative classification. Is overwritten if another fn_cost is passed to thefitmethod.Note
It is not recommended to pass instance-dependent costs to the
__init__method. Instead, pass them to thefitmethod.- loss
empulse.metrics.Metric, default=None - Loss function to optimize. Metric parameters are passed as
loss_params to the
fitmethod.
- Loss function to optimize. Metric parameters are passed as
- estimator
- Attributes:
- classes_numpy.ndarray, shape=(n_classes,)
Unique classes in the target.
- estimator_
xgboost.XGBClassifier Fitted XGBoost classifier.
References
[1]Höppner, S., Baesens, B., Verbeke, W., & Verdonck, T. (2022). Instance-dependent cost-sensitive learning for detecting transfer fraud. European Journal of Operational Research, 297(1), 291-300.
Examples
import numpy as np from empulse.models import CSBoostClassifier from sklearn.datasets import make_classification X, y = make_classification() fn_cost = np.random.rand(y.size) # instance-dependent cost fp_cost = 5 # constant cost model = CSBoostClassifier() model.fit(X, y, fn_cost=fn_cost, fp_cost=fp_cost) y_proba = model.predict_proba(X)
Example with passing instance-dependent costs through cross-validation:
import numpy as np from empulse.models import CSBoostClassifier from sklearn import set_config from sklearn.datasets import make_classification from sklearn.model_selection import cross_val_score from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler set_config(enable_metadata_routing=True) X, y = make_classification() fn_cost = np.random.rand(y.size) fp_cost = 5 pipeline = Pipeline([ ('scaler', StandardScaler()), ('model', CSBoostClassifier().set_fit_request(fn_cost=True, fp_cost=True)) ]) cross_val_score(pipeline, X, y, params={'fn_cost': fn_cost, 'fp_cost': fp_cost})
Example with passing instance-dependent costs through a grid search:
import numpy as np from empulse.metrics import expected_cost_loss from empulse.models import CSBoostClassifier from sklearn import set_config from sklearn.datasets import make_classification from sklearn.model_selection import GridSearchCV from sklearn.metrics import make_scorer from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from xgboost import XGBClassifier set_config(enable_metadata_routing=True) X, y = make_classification(n_samples=50) fn_cost = np.random.rand(y.size) fp_cost = 5 pipeline = Pipeline([ ('scaler', StandardScaler()), ('model', CSBoostClassifier( XGBClassifier(n_jobs=2, n_estimators=10) ).set_fit_request(fn_cost=True, fp_cost=True)) ]) param_grid = { 'model__estimator__learning_rate': np.logspace(-5, 0, 5), } scorer = make_scorer( expected_cost_loss, response_method='predict_proba', greater_is_better=False, normalize=True ) scorer = scorer.set_score_request(fn_cost=True, fp_cost=True) grid_search = GridSearchCV(pipeline, param_grid=param_grid, scoring=scorer) grid_search.fit(X, y, fn_cost=fn_cost, fp_cost=fp_cost)
- fit(X, y, *, tp_cost=Parameter.UNCHANGED, tn_cost=Parameter.UNCHANGED, fn_cost=Parameter.UNCHANGED, fp_cost=Parameter.UNCHANGED, fit_params=None, **loss_params)[source]#
Fit the model.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
- yarray-like of shape (n_samples,)
- tp_costfloat or array-like, shape=(n_samples,), default=$UNCHANGED$
Cost of true positives. If
float, then all true positives have the same cost. If array-like, then it is the cost of each true positive classification.- fp_costfloat or array-like, shape=(n_samples,), default=$UNCHANGED$
Cost of false positives. If
float, then all false positives have the same cost. If array-like, then it is the cost of each false positive classification.- tn_costfloat or array-like, shape=(n_samples,), default=$UNCHANGED$
Cost of true negatives. If
float, then all true negatives have the same cost. If array-like, then it is the cost of each true negative classification.- fn_costfloat or array-like, shape=(n_samples,), default=$UNCHANGED$
Cost of false negatives. If
float, then all false negatives have the same cost. If array-like, then it is the cost of each false negative classification.- fit_paramsdict
Additional keyword arguments to pass to the estimator’s fit method.
- loss_paramsdict
Additional keyword arguments to pass to the loss function if using a custom loss function.
- Returns:
- selfCSBoostClassifier
Fitted CSBoost model.
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequestencapsulating routing information.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(X)#
Predict class labels for samples in X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Features.
- Returns:
- y_predndarray of shape (n_samples,)
Predicted labels for each sample.
- predict_proba(X)[source]#
Predict class probabilities for X.
- Parameters:
- X2D numpy.ndarray, shape=(n_samples, n_features)
- Returns:
- y_pred2D numpy.ndarray, shape=(n_samples, n_classes)
Predicted class probabilities.
- score(X, y, sample_weight=None)#
Return accuracy on provided data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)w.r.t. y.
- set_fit_request(*, fit_params='$UNCHANGED$', fn_cost='$UNCHANGED$', fp_cost='$UNCHANGED$', tn_cost='$UNCHANGED$', tp_cost='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- fit_paramsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
fit_paramsparameter infit.- fn_coststr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
fn_costparameter infit.- fp_coststr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
fp_costparameter infit.- tn_coststr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
tn_costparameter infit.- tp_coststr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
tp_costparameter infit.
- Returns:
- selfobject
The updated object.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- set_score_request(*, sample_weight='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inscore.
- Returns:
- selfobject
The updated object.